[Darshan-users] Instrumenting statically-linked applications

Dragos Constantin dragos.constantin at stanford.edu
Wed Nov 28 21:50:20 CST 2012


Hi Bill,
Here is the story.

I am using Geant4 toolkit to perform parametric studies of a medical 
imaging detector. Geant4 is a serial code written in C++ which uses the 
Monte Carlo method to simulate the interaction between elementary 
particles and matter. To achieve good statistics I have to use many 
particles in my simulation. Hence, I divide the number of particles and 
I start many individual simulations with different seeds for the random 
number generator. To perform this kind of simulations I have used the 
TACC launcher module. Now, Geant4 is a great toolkit but it was not 
written for super clusters like Ranger and Lonestar. I did not know that 
and I can tell you I had my TACC account temporarily suspended because 
my runs were generating huge I/O loads. Later, I figured out that my 
runs, which were using less than 2% of the computing capacity of the 
machine, were generating more than 2 million IOPS which far exceeds the 
I/O limit of the controller buffer of the storage device (DataDirect 
SFA10000). This high I/O was generated because Geant4 uses a lot of data 
files which contain all the physics related to elementary particles 
interaction with matter. Of course the data files ware available to the 
compute nodes through $SCRATCH (Lustre file system) but all the 
instances (a few hundred) were accessing one location at a very high 
rate at the same time. So, I have modified the toolkit and I have 
created static libs out of these data files and now I link them when I 
compile my application. Thus I can distribute and practically eliminate 
the I/O load of my application. I have reduced the I/O load for one 
instance from ~10,000 IOPS to only 8 IOPS. Yaakoub from TACC, helped me 
and I have tested the new configuration on Lonestar during the last 
maintenance cycle and I did not have problems running my application on 
21,000 cores.

I have benchmarked the I/O load of my application on my workstation with 
inotifywait from notify-tools:

https://github.com/rvoicilas/inotify-tools

Unfortunately, this application does not work on TACC machines and I 
also believe it is not suitable for HPC. Yaakoub told me to use darshan 
and this is how I have reached this point. I mean, I want to write at 
least a technical note about Geant4 scalability on Ranger and Lonestar 
but I need some numbers for the I/O load and I think darshan can help me 
here. I had to include MPI to my Geant4 application and I have linked 
the seed to the MPI process rank. Today I have had successful runs which 
generated darshan logs.

To come back to your question. My application is statically linked 
because I want to avoid any I/O overload. From this perspective it make 
sense to have all the local libraries statically linked so I can 
distribute(eliminate) the I/O load. I fee I can better control the data 
flow with scripts like 'cache_binary' (this script is executed before 
ibrun if thousand of cores are used in the simulation). This is the only 
reason I prefer static over dynamic libraries. In any case I will test 
darshan with dynamic libs as well but the aim is to have all the local 
libraries statically linked in my final application.

Sorry for the extremely long e-mail. I hope it makes sense.

Thanks,
Dragos



Dragos Constantin, PhD

Research Associate
Department of Radiology
Stanford University
Lucas MRS Center
1201 Welch Rd., PS-055
Stanford CA 94305

Office: (650) 736-9961
Fax: (650) 723-5795

On 11/28/2012 5:31 PM, Bill Barth wrote:
> Dragos,
>
> Your directories are available on all the compute nodes on Ranger, so if
> your darshan dynamic libs are in any of your directories, you should be
> able to set your LD_LIBRARYPATH or the executable rpath appropriately to
> point at your version of the darshan dynamic libraries.
>
> Is there a reason you prefer the static version?
>
> Bill.
> --
> Bill Barth, Ph.D., Director, HPC
> bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
> Office: ROC 1.435             |   Fax:   (512) 475-9445
>
>
>
>
>
>
>
> On 11/28/12 6:20 PM, "Dragos Constantin" <dragos.constantin at stanford.edu>
> wrote:
>
>> Hi Phil,
>> So, v2.2.4-pre6 works on both Ranger and Lonestar. I can confirm that
>> darshan generates a log file. However, I am a bit confused because when
>> I parse the log file it says my application did not open any file and in
>> fact I am at least generating several output files. Maybe I have to
>> configure something or supply some flags at compilation time so I can
>> capture the full I/O load generated by my application.
>>
>> Do you think I should use the darshan-test application and see what the
>> output looks like?
>>
>> You are right one cannot build 100% static executables on Ranger and
>> Lonestar. However, the dynamic libs such as libverbs are installed on
>> each compute node so, it is not an issue. What is more important is that
>> the darshan lib and all my other libs are statically linked because they
>> are not deployed system wide. In any case, I am thinking I should have
>> seen in the darshan log files that some I/O activity occurred because of
>> these dynamic libs.
>>
>>
>> Thanks,
>> Dragos
>>
>>
>>
>> Dragos Constantin, PhD
>>
>> Research Associate
>> Department of Radiology
>> Stanford University
>> Lucas MRS Center
>> 1201 Welch Rd., PS-055
>> Stanford CA 94305
>>
>> Office: (650) 736-9961
>> Fax: (650) 723-5795
>>
>> On 11/28/2012 11:25 AM, Phil Carns wrote:
>>> Hi Dragos,
>>>
>>> Could you try this pre-release version of Darshan and us know if it
>>> works for you?
>>>
>>> ftp://ftp.mcs.anl.gov/pub/darshan/releases/darshan-2.2.4-pre6.tar.gz
>>>
>>> The darshan-gen-* scripts will only work with mvapich2.
>>>
>>> I noticed an unrelated issue when trying to test this release on
>>> Ranger, however.  I was not able to build a static executable using
>>> mvapich2 (with or without darshan) because it could not find a static
>>> version of the libverbs library. I was trying to generate a static
>>> executable by just adding -static to the mpicc command line.  Maybe
>>> there is an additional step needed to get a fully static executable?
>>>
>>> thanks,
>>> -Phil
>>>
>>> On 11/27/2012 10:25 AM, Phil Carns wrote:
>>>> Hi Dragos,
>>>>
>>>> Thanks for the bug report.  It looks like the
>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc is just ordering the link
>>>> arguments differently than darshan-gen-cc.pl expected.  We should be
>>>> able to work around this without too much trouble. In terms of the
>>>> perl code I think we just need to modify the regular expression to
>>>> collect a "$link_cmd_prefix" in addition to a "$link_cmd_suffix" if
>>>> anything appears in the link command line from the first '-'
>>>> character up to the object name.  We can then just pass those
>>>> arguments as is into the generated script. In this example the
>>>> link_cmd_prefix would be:
>>>>
>>>> -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared -Wl,-rpath-link
>>>> -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib
>>>>
>>>> I would like to see that particular mpicc script before making any
>>>> changes, though, to make sure that we don't accidentally break
>>>> something, but as (bad) luck would have it Ranger is in maintenance
>>>> today.  We'll have a look at it tomorrow.
>>>>
>>>> thanks,
>>>> -Phil
>>>>
>>>> On 11/26/2012 03:27 PM, Dragos Constantin wrote:
>>>>> Hi Kevin,
>>>>> The problem is not with the argument parsing. This is what I get in
>>>>> both cases:
>>>>>
>>>>> login4$ ./darshan-gen-cc.pl /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc
>>>>> --output mpicc.darshan
>>>>> CC_from_link = gcc
>>>>> -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared -Wl,-rpath-link
>>>>> -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib
>>>>> CC_from_compile = gcc
>>>>> Error: cannot find matching CC from: gcc -c foo.c
>>>>> -I/opt/apps/gcc4_4/mvapich/1.0.1/include
>>>>> and: gcc -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -Wl,-rpath-link -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib foo.o -o foo -lmpich
>>>>> -L/opt/ofed//lib64/ -libverbs -libumad -lpthread -lpthread -lrt
>>>>>
>>>>> login4$ ./darshan-gen-cc.pl --output mpicc.darshan
>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc
>>>>> CC_from_link = gcc
>>>>> -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared -Wl,-rpath-link
>>>>> -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib
>>>>> CC_from_compile = gcc
>>>>> Error: cannot find matching CC from: gcc -c foo.c
>>>>> -I/opt/apps/gcc4_4/mvapich/1.0.1/include
>>>>> and: gcc -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -Wl,-rpath-link -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib foo.o -o foo -lmpich
>>>>> -L/opt/ofed//lib64/ -libverbs -libumad -lpthread -lpthread -lrt
>>>>>
>>>>> As you can see:
>>>>>
>>>>> CC_from_compile = gcc
>>>>>
>>>>> but CC_from_link is not gcc and if I am not mistaken it should be
>>>>> gcc. I just started to look at the script and you might know better
>>>>> what is going on here.
>>>>>
>>>>> Thanks,
>>>>> Dragos
>>>>>
>>>>>
>>>>> Dragos Constantin, PhD
>>>>>
>>>>> Research Associate
>>>>> Department of Radiology
>>>>> Stanford University
>>>>> Lucas MRS Center
>>>>> 1201 Welch Rd., PS-055
>>>>> Stanford CA 94305
>>>>>
>>>>> Office: (650) 736-9961
>>>>> Fax: (650) 723-5795
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Kevin Harms" <harms at alcf.anl.gov>
>>>>> To: "Dragos Constantin" <dragos.constantin at stanford.edu>
>>>>> Cc: darshan-users at lists.mcs.anl.gov
>>>>> Sent: Monday, November 26, 2012 12:23:00 PM
>>>>> Subject: Re: [Darshan-users] Instrumenting statically-linked
>>>>> applications
>>>>>
>>>>>
>>>>>     I think this might be a simple issue with argument parsing. Try
>>>>> this instead:
>>>>>
>>>>>> ./darshan-gen-cc.pl --output mpicc.darshan
>>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc
>>>>> kevin
>>>>>
>>>>> On Nov 26, 2012, at 2:16 PM, Dragos Constantin wrote:
>>>>>
>>>>>> Hi,
>>>>>> I've installed and configured darshan-2.2.3 on TACC Ranger in my
>>>>>> user space. I have used gcc-4.4.5 (and mvapich-1.0.1).
>>>>>>
>>>>>> When I try to generate the MPI compiler scripts for
>>>>>> statically-linked applications I get the following error:
>>>>>>
>>>>>> login4$ ./darshan-gen-cc.pl
>>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc --output mpicc.darshan
>>>>>> Error: cannot find matching CC from: gcc -c foo.c
>>>>>> -I/opt/apps/gcc4_4/mvapich/1.0.1/include
>>>>>> and: gcc -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -Wl,-rpath-link -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib foo.o -o foo -lmpich
>>>>>> -L/opt/ofed//lib64/ -libverbs -libumad -lpthread -lpthread -lrt
>>>>>>
>>>>>> I am not quite sure what triggered this. Any ideas how to quickly
>>>>>> fix the issue? I will look at the perl script to see what is going
>>>>>> on there.
>>>>>>
>>>>>> Thanks,
>>>>>> Dragos
>>>>>>
>>>>>>
>>>>>> Dragos Constantin, PhD
>>>>>>
>>>>>> Research Associate
>>>>>> Department of Radiology
>>>>>> Stanford University
>>>>>> Lucas MRS Center
>>>>>> 1201 Welch Rd., PS-055
>>>>>> Stanford CA 94305
>>>>>>
>>>>>> Office: (650) 736-9961
>>>>>> Fax: (650) 723-5795
>>>>>>
>>>>>> _______________________________________________
>>>>>> Darshan-users mailing list
>>>>>> Darshan-users at lists.mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>>> _______________________________________________
>>>>> Darshan-users mailing list
>>>>> Darshan-users at lists.mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>> _______________________________________________
>>>> Darshan-users mailing list
>>>> Darshan-users at lists.mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>



More information about the Darshan-users mailing list