[petsc-dev] [Ideas-team] Seeking OLCF users complaining about poor build times
Todd Gamblin
tgamblin at llnl.gov
Fri Feb 27 10:29:09 CST 2015
Barry:
I remember that ALCF attempted to address this problem at one point or
another with "tmpicc" compiler wrappers. As I remember the idea was that
they stored the compiler's tmp files in some local storage on the login
node. I think that was back when ANL's main machine was Intrepid, and I
don't know where those compilers went on Mira. Do you remember this?
In general I'm not sure that just moving the compiler temp files is going
to cut it. I think you really want to do the build out of /tmp or some
other filesystem. Spack does this automatically for its builds -- on LLNL
machines I build much faster by just finding the local tmp space and using
it for all the builds. Spack is also able to put the entire build out in
tmp space, because you just tell it the software name, and it handles the
details of where it is downloaded and expanded. It's not perfect, because
it looks at $TMP, $TMPDIR, and some other LLNL-specific places.
If it turns out that configuring NFS (or in ANL's case, I think it's GPFS)
to be fast on a set of loaded login nodes is not feasible, it might be
nice to have some kind of recommendations for build staging.
-Todd
On 2/27/15, 8:09 AM, "David E. Bernholdt" <bernholdtde at ornl.gov> wrote:
>Barry, thanks, this is extremely helpful. I'll have the OLCF folks
>contact Nathan if they need any further info or have other experiments
>to try.
>
>On 02/27/2015 11:03 AM, Barry Smith wrote:
>>
>> Same text also in the attachment.
>>
>> Barry
>>
>> David,
>>
>> Nathan Collier has kindly run a test on Titan, Satish on Mira and
>>Hopper, and Victor on Ranger with a basic optimized build of PETSc (all
>>C code)
>>
>> Please find below some configure and make timings from the latest
>>PETSc master.
>>
>> The Titan times for both configure and make are unacceptable. For
>>total build time Titan is 3.5 times slower than Mira and Hopper and at
>>least 10 times slower than laptops. The "time" results on Titan are
>>disturbing
>>
>> configure
>> real 14m32.169s (since the user + sys time is much less than real
>>time, what is it waiting on?)
>> user 1m51.527s
>> sys 3m40.734s
>>
>> make
>> real 15m56.004s
>> user 8m8.971s
>> sys 52m42.734s (why so much?)
>>
>> which I read as either the filesystem or the compiler system (location
>>of the compilers, license server of the compilers, ...) is really badly
>>configured.
>>
>> The Hopper configure time with the default
>>TMPDIR=/scratch/scratchdirs/balay is is unacceptable but if you actually
>>use the real /tmp it becomes somewhat reasonable.
>>
>> Feel free to share this information with local experts,
>>
>>
>>
>>
>> I suggest you view the below table in a fixed width font editor like
>>Emacs or Vi so the columns line up.
>>
>> configure time make time Total compilers
>> filesystem
>>
>> Titan 14m32s 15m56s 30m28s Intel 14
>> /lustre/atlas1/geo103/proj-shared/
>> 41m38s 9m5s 50m43s
>> /ccs/home/ (no load on login node)
>> 13m
>>(no load on a different login node)
>>
>> Mira 6m59s 1m49s 8m48s IBM
>> /gpfs/mira-home/
>>
>> Hopper 23m17 1m45s 25m2s
>> /global/u2/b/balay/petsc.clone default
>>TMPDIR=/scratch/scratchdirs/balay
>> 6m17s 1m39s 7m57s manually
>>set TMPDIR=/tmp
>>
>> NSF Ranger UT Austin 5m10s 1m28s 6m38s
>> default, whatever it is
>>
>> Linux laptop 53s 1m13s 2m6s Gnu
>> compile and compiler local
>>
>> Apple laptop 1m14s 54s 2m8s clang
>> compile and compiler local
>>
>> Linux workstation 1m11s 22s 1m33s Gnu
>> compile and compiler local
>> 1m37s 29s 2m6s Gnu
>> compile directory local; compiler directory remote
>> 3m11s 25s 3m36s Intel 13
>> compile directory local; compiler directory remote
>>
>> PETSc has about 1000 source files that need compiling
>>
>> The configure is essentially sequential, the make extremely parallel.
>>
>> During configure the source code is on the listed file system, all .o
>>and executables are on /tmp
>>
>> During the make the source code and all .o are on the listed file system
>>
>>
>>> On Feb 25, 2015, at 11:23 AM, David E. Bernholdt
>>><bernholdtde at ornl.gov> wrote:
>>>
>>> At the kick-off meetings, one of the general complaints I heard
>>> expressed about the facilities was the slow build times compared to
>>> personal systems.
>>>
>>> If you have this complaint and are an OLCF user, and are willing to
>>>work
>>> with us a little to try to understand your experience in more detail,
>>> please contact me (individually, not reply-all).
>>>
>>> This is a facility thing, not an IDEAS thing, so I can't speak for the
>>> other facilities. But we've recently received some other similar
>>> comments, and we're trying to dig into what's happening.
>>>
>>> Thanks
>>> --
>>> David E. Bernholdt | Email: bernholdtde at ornl.gov
>>> Oak Ridge National Laboratory | Phone: +1 865-574-3147
>>> http://www.csm.ornl.gov/~bernhold | Fax: +1 865-576-5491
>>> _______________________________________________
>>> Ideas-team mailing list
>>> Ideas-team at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/ideas-team
>
>
>--
>David E. Bernholdt | Email: bernholdtde at ornl.gov
>Oak Ridge National Laboratory | Phone: +1 865-574-3147
>http://www.csm.ornl.gov/~bernhold | Fax: +1 865-576-5491
>_______________________________________________
>Ideas-team mailing list
>Ideas-team at lists.mcs.anl.gov
>https://lists.mcs.anl.gov/mailman/listinfo/ideas-team
More information about the petsc-dev
mailing list