[petsc-dev] [Ideas-team] Seeking OLCF users complaining about poor build times

Todd Gamblin tgamblin at llnl.gov
Fri Feb 27 10:29:09 CST 2015


I remember that ALCF attempted to address this problem at one point or
another with "tmpicc" compiler wrappers.  As I remember the idea was that
they stored the compiler's tmp files in some local storage on the login
node.  I think that was back when ANL's main machine was Intrepid, and I
don't know where those compilers went on Mira.  Do you remember this?

In general I'm not sure that just moving the compiler temp files is going
to cut it.  I think you really want to do the build out of /tmp or some
other filesystem.  Spack does this automatically for its builds -- on LLNL
machines I build much faster by just finding the local tmp space and using
it for all the builds.  Spack is also able to put the entire build out in
tmp space, because you just tell it the software name, and it handles the
details of where it is downloaded and expanded.  It's not perfect, because
it looks at $TMP, $TMPDIR, and some other LLNL-specific places.

If it turns out that configuring NFS (or in ANL's case, I think it's GPFS)
to be fast on a set of loaded login nodes is not feasible, it might be
nice to have some kind of recommendations for build staging.


On 2/27/15, 8:09 AM, "David E. Bernholdt" <bernholdtde at ornl.gov> wrote:

>Barry, thanks, this is extremely helpful.  I'll have the OLCF folks
>contact Nathan if they need any further info or have other experiments
>to try.
>On 02/27/2015 11:03 AM, Barry Smith wrote:
>>   Same text also in the attachment.
>>    Barry
>> David,
>>     Nathan Collier has kindly run a test on Titan, Satish on Mira and
>>Hopper, and Victor on Ranger with a basic optimized build of PETSc (all
>>C code)
>>     Please find below some configure and make timings from the latest
>>PETSc master.
>>      The Titan times for both configure and make are unacceptable. For
>>total build time Titan is 3.5 times slower than Mira and Hopper and at
>>least 10 times slower than laptops. The "time" results on Titan are
>> configure 
>> real	14m32.169s   (since the user + sys time is much less than real
>>time, what is it waiting on?)
>> user	1m51.527s
>> sys	3m40.734s
>> make
>> real	15m56.004s
>> user	8m8.971s
>> sys	52m42.734s  (why so much?)
>> which I read as either the filesystem or the compiler system (location
>>of the compilers, license server of the compilers, ...) is really badly
>>    The Hopper configure time with the default
>>TMPDIR=/scratch/scratchdirs/balay is is unacceptable but if you actually
>>use the real /tmp it becomes somewhat reasonable.
>> Feel free to share this information with local experts,
>> I suggest you view the below table in a fixed width font editor like
>>Emacs or Vi so the columns line up.
>>                     configure time    make time   Total      compilers
>>   filesystem
>> Titan                14m32s         15m56s        30m28s      Intel 14
>>  /lustre/atlas1/geo103/proj-shared/
>>                      41m38s          9m5s         50m43s
>> /ccs/home/  (no load on login node)
>> 		     13m      
>>(no load on a different login node)
>> Mira                  6m59s          1m49s         8m48s       IBM
>>  /gpfs/mira-home/
>> Hopper               23m17           1m45s        25m2s
>>  /global/u2/b/balay/petsc.clone default
>> 		      6m17s          1m39s         7m57s                   manually
>>set TMPDIR=/tmp
>> NSF Ranger UT Austin  5m10s          1m28s         6m38s
>>     default, whatever it is
>> Linux laptop            53s          1m13s         2m6s         Gnu
>>      compile and compiler local
>> Apple laptop          1m14s            54s         2m8s         clang
>>      compile and compiler local
>> Linux workstation     1m11s            22s         1m33s        Gnu
>>    compile and compiler local
>>                       1m37s            29s         2m6s         Gnu
>>    compile directory local; compiler directory remote
>>                       3m11s            25s         3m36s       Intel 13
>>    compile directory local; compiler directory remote
>> PETSc has about 1000 source files that need compiling
>> The configure is essentially sequential, the make extremely parallel.
>> During configure the source code is on the listed file system, all .o
>>and executables  are on /tmp
>> During the make the source code and all .o are on the listed file system
>>> On Feb 25, 2015, at 11:23 AM, David E. Bernholdt
>>><bernholdtde at ornl.gov> wrote:
>>> At the kick-off meetings, one of the general complaints I heard
>>> expressed about the facilities was the slow build times compared to
>>> personal systems.
>>> If you have this complaint and are an OLCF user, and are willing to
>>> with us a little to try to understand your experience in more detail,
>>> please contact me (individually, not reply-all).
>>> This is a facility thing, not an IDEAS thing, so I can't speak for the
>>> other facilities.  But we've recently received some other similar
>>> comments, and we're trying to dig into what's happening.
>>> Thanks
>>> -- 
>>> David E. Bernholdt                | Email: bernholdtde at ornl.gov
>>> Oak Ridge National Laboratory     | Phone: +1 865-574-3147
>>> http://www.csm.ornl.gov/~bernhold | Fax:   +1 865-576-5491
>>> _______________________________________________
>>> Ideas-team mailing list
>>> Ideas-team at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/ideas-team
>David E. Bernholdt                | Email: bernholdtde at ornl.gov
>Oak Ridge National Laboratory     | Phone: +1 865-574-3147
>http://www.csm.ornl.gov/~bernhold | Fax:   +1 865-576-5491
>Ideas-team mailing list
>Ideas-team at lists.mcs.anl.gov

More information about the petsc-dev mailing list