[Darshan-users] Instrumenting statically-linked applications
Dragos Constantin
dragos.constantin at stanford.edu
Wed Nov 28 21:50:20 CST 2012
Hi Bill,
Here is the story.
I am using Geant4 toolkit to perform parametric studies of a medical
imaging detector. Geant4 is a serial code written in C++ which uses the
Monte Carlo method to simulate the interaction between elementary
particles and matter. To achieve good statistics I have to use many
particles in my simulation. Hence, I divide the number of particles and
I start many individual simulations with different seeds for the random
number generator. To perform this kind of simulations I have used the
TACC launcher module. Now, Geant4 is a great toolkit but it was not
written for super clusters like Ranger and Lonestar. I did not know that
and I can tell you I had my TACC account temporarily suspended because
my runs were generating huge I/O loads. Later, I figured out that my
runs, which were using less than 2% of the computing capacity of the
machine, were generating more than 2 million IOPS which far exceeds the
I/O limit of the controller buffer of the storage device (DataDirect
SFA10000). This high I/O was generated because Geant4 uses a lot of data
files which contain all the physics related to elementary particles
interaction with matter. Of course the data files ware available to the
compute nodes through $SCRATCH (Lustre file system) but all the
instances (a few hundred) were accessing one location at a very high
rate at the same time. So, I have modified the toolkit and I have
created static libs out of these data files and now I link them when I
compile my application. Thus I can distribute and practically eliminate
the I/O load of my application. I have reduced the I/O load for one
instance from ~10,000 IOPS to only 8 IOPS. Yaakoub from TACC, helped me
and I have tested the new configuration on Lonestar during the last
maintenance cycle and I did not have problems running my application on
21,000 cores.
I have benchmarked the I/O load of my application on my workstation with
inotifywait from notify-tools:
https://github.com/rvoicilas/inotify-tools
Unfortunately, this application does not work on TACC machines and I
also believe it is not suitable for HPC. Yaakoub told me to use darshan
and this is how I have reached this point. I mean, I want to write at
least a technical note about Geant4 scalability on Ranger and Lonestar
but I need some numbers for the I/O load and I think darshan can help me
here. I had to include MPI to my Geant4 application and I have linked
the seed to the MPI process rank. Today I have had successful runs which
generated darshan logs.
To come back to your question. My application is statically linked
because I want to avoid any I/O overload. From this perspective it make
sense to have all the local libraries statically linked so I can
distribute(eliminate) the I/O load. I fee I can better control the data
flow with scripts like 'cache_binary' (this script is executed before
ibrun if thousand of cores are used in the simulation). This is the only
reason I prefer static over dynamic libraries. In any case I will test
darshan with dynamic libs as well but the aim is to have all the local
libraries statically linked in my final application.
Sorry for the extremely long e-mail. I hope it makes sense.
Thanks,
Dragos
Dragos Constantin, PhD
Research Associate
Department of Radiology
Stanford University
Lucas MRS Center
1201 Welch Rd., PS-055
Stanford CA 94305
Office: (650) 736-9961
Fax: (650) 723-5795
On 11/28/2012 5:31 PM, Bill Barth wrote:
> Dragos,
>
> Your directories are available on all the compute nodes on Ranger, so if
> your darshan dynamic libs are in any of your directories, you should be
> able to set your LD_LIBRARYPATH or the executable rpath appropriately to
> point at your version of the darshan dynamic libraries.
>
> Is there a reason you prefer the static version?
>
> Bill.
> --
> Bill Barth, Ph.D., Director, HPC
> bbarth at tacc.utexas.edu | Phone: (512) 232-7069
> Office: ROC 1.435 | Fax: (512) 475-9445
>
>
>
>
>
>
>
> On 11/28/12 6:20 PM, "Dragos Constantin" <dragos.constantin at stanford.edu>
> wrote:
>
>> Hi Phil,
>> So, v2.2.4-pre6 works on both Ranger and Lonestar. I can confirm that
>> darshan generates a log file. However, I am a bit confused because when
>> I parse the log file it says my application did not open any file and in
>> fact I am at least generating several output files. Maybe I have to
>> configure something or supply some flags at compilation time so I can
>> capture the full I/O load generated by my application.
>>
>> Do you think I should use the darshan-test application and see what the
>> output looks like?
>>
>> You are right one cannot build 100% static executables on Ranger and
>> Lonestar. However, the dynamic libs such as libverbs are installed on
>> each compute node so, it is not an issue. What is more important is that
>> the darshan lib and all my other libs are statically linked because they
>> are not deployed system wide. In any case, I am thinking I should have
>> seen in the darshan log files that some I/O activity occurred because of
>> these dynamic libs.
>>
>>
>> Thanks,
>> Dragos
>>
>>
>>
>> Dragos Constantin, PhD
>>
>> Research Associate
>> Department of Radiology
>> Stanford University
>> Lucas MRS Center
>> 1201 Welch Rd., PS-055
>> Stanford CA 94305
>>
>> Office: (650) 736-9961
>> Fax: (650) 723-5795
>>
>> On 11/28/2012 11:25 AM, Phil Carns wrote:
>>> Hi Dragos,
>>>
>>> Could you try this pre-release version of Darshan and us know if it
>>> works for you?
>>>
>>> ftp://ftp.mcs.anl.gov/pub/darshan/releases/darshan-2.2.4-pre6.tar.gz
>>>
>>> The darshan-gen-* scripts will only work with mvapich2.
>>>
>>> I noticed an unrelated issue when trying to test this release on
>>> Ranger, however. I was not able to build a static executable using
>>> mvapich2 (with or without darshan) because it could not find a static
>>> version of the libverbs library. I was trying to generate a static
>>> executable by just adding -static to the mpicc command line. Maybe
>>> there is an additional step needed to get a fully static executable?
>>>
>>> thanks,
>>> -Phil
>>>
>>> On 11/27/2012 10:25 AM, Phil Carns wrote:
>>>> Hi Dragos,
>>>>
>>>> Thanks for the bug report. It looks like the
>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc is just ordering the link
>>>> arguments differently than darshan-gen-cc.pl expected. We should be
>>>> able to work around this without too much trouble. In terms of the
>>>> perl code I think we just need to modify the regular expression to
>>>> collect a "$link_cmd_prefix" in addition to a "$link_cmd_suffix" if
>>>> anything appears in the link command line from the first '-'
>>>> character up to the object name. We can then just pass those
>>>> arguments as is into the generated script. In this example the
>>>> link_cmd_prefix would be:
>>>>
>>>> -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared -Wl,-rpath-link
>>>> -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib
>>>>
>>>> I would like to see that particular mpicc script before making any
>>>> changes, though, to make sure that we don't accidentally break
>>>> something, but as (bad) luck would have it Ranger is in maintenance
>>>> today. We'll have a look at it tomorrow.
>>>>
>>>> thanks,
>>>> -Phil
>>>>
>>>> On 11/26/2012 03:27 PM, Dragos Constantin wrote:
>>>>> Hi Kevin,
>>>>> The problem is not with the argument parsing. This is what I get in
>>>>> both cases:
>>>>>
>>>>> login4$ ./darshan-gen-cc.pl /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc
>>>>> --output mpicc.darshan
>>>>> CC_from_link = gcc
>>>>> -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared -Wl,-rpath-link
>>>>> -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib
>>>>> CC_from_compile = gcc
>>>>> Error: cannot find matching CC from: gcc -c foo.c
>>>>> -I/opt/apps/gcc4_4/mvapich/1.0.1/include
>>>>> and: gcc -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -Wl,-rpath-link -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib foo.o -o foo -lmpich
>>>>> -L/opt/ofed//lib64/ -libverbs -libumad -lpthread -lpthread -lrt
>>>>>
>>>>> login4$ ./darshan-gen-cc.pl --output mpicc.darshan
>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc
>>>>> CC_from_link = gcc
>>>>> -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared -Wl,-rpath-link
>>>>> -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib
>>>>> CC_from_compile = gcc
>>>>> Error: cannot find matching CC from: gcc -c foo.c
>>>>> -I/opt/apps/gcc4_4/mvapich/1.0.1/include
>>>>> and: gcc -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -Wl,-rpath-link -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib foo.o -o foo -lmpich
>>>>> -L/opt/ofed//lib64/ -libverbs -libumad -lpthread -lpthread -lrt
>>>>>
>>>>> As you can see:
>>>>>
>>>>> CC_from_compile = gcc
>>>>>
>>>>> but CC_from_link is not gcc and if I am not mistaken it should be
>>>>> gcc. I just started to look at the script and you might know better
>>>>> what is going on here.
>>>>>
>>>>> Thanks,
>>>>> Dragos
>>>>>
>>>>>
>>>>> Dragos Constantin, PhD
>>>>>
>>>>> Research Associate
>>>>> Department of Radiology
>>>>> Stanford University
>>>>> Lucas MRS Center
>>>>> 1201 Welch Rd., PS-055
>>>>> Stanford CA 94305
>>>>>
>>>>> Office: (650) 736-9961
>>>>> Fax: (650) 723-5795
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Kevin Harms" <harms at alcf.anl.gov>
>>>>> To: "Dragos Constantin" <dragos.constantin at stanford.edu>
>>>>> Cc: darshan-users at lists.mcs.anl.gov
>>>>> Sent: Monday, November 26, 2012 12:23:00 PM
>>>>> Subject: Re: [Darshan-users] Instrumenting statically-linked
>>>>> applications
>>>>>
>>>>>
>>>>> I think this might be a simple issue with argument parsing. Try
>>>>> this instead:
>>>>>
>>>>>> ./darshan-gen-cc.pl --output mpicc.darshan
>>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc
>>>>> kevin
>>>>>
>>>>> On Nov 26, 2012, at 2:16 PM, Dragos Constantin wrote:
>>>>>
>>>>>> Hi,
>>>>>> I've installed and configured darshan-2.2.3 on TACC Ranger in my
>>>>>> user space. I have used gcc-4.4.5 (and mvapich-1.0.1).
>>>>>>
>>>>>> When I try to generate the MPI compiler scripts for
>>>>>> statically-linked applications I get the following error:
>>>>>>
>>>>>> login4$ ./darshan-gen-cc.pl
>>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc --output mpicc.darshan
>>>>>> Error: cannot find matching CC from: gcc -c foo.c
>>>>>> -I/opt/apps/gcc4_4/mvapich/1.0.1/include
>>>>>> and: gcc -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -Wl,-rpath-link -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib foo.o -o foo -lmpich
>>>>>> -L/opt/ofed//lib64/ -libverbs -libumad -lpthread -lpthread -lrt
>>>>>>
>>>>>> I am not quite sure what triggered this. Any ideas how to quickly
>>>>>> fix the issue? I will look at the perl script to see what is going
>>>>>> on there.
>>>>>>
>>>>>> Thanks,
>>>>>> Dragos
>>>>>>
>>>>>>
>>>>>> Dragos Constantin, PhD
>>>>>>
>>>>>> Research Associate
>>>>>> Department of Radiology
>>>>>> Stanford University
>>>>>> Lucas MRS Center
>>>>>> 1201 Welch Rd., PS-055
>>>>>> Stanford CA 94305
>>>>>>
>>>>>> Office: (650) 736-9961
>>>>>> Fax: (650) 723-5795
>>>>>>
>>>>>> _______________________________________________
>>>>>> Darshan-users mailing list
>>>>>> Darshan-users at lists.mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>>> _______________________________________________
>>>>> Darshan-users mailing list
>>>>> Darshan-users at lists.mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>> _______________________________________________
>>>> Darshan-users mailing list
>>>> Darshan-users at lists.mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
More information about the Darshan-users
mailing list