[Darshan-users] Instrumenting statically-linked applications

Bill Barth bbarth at tacc.utexas.edu
Thu Nov 29 08:11:19 CST 2012


Thanks for the explanation, Dragos.

If your code is serial and you are using our launcher, and I understand
correctly, you will not get any information by Darshan if it is not
calling MPI_Init and MPI_Finalize. Can you say a little more about how
your code is structured?

Best,
Bill.
--
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
Office: ROC 1.435             |   Fax:   (512) 475-9445







On 11/28/12 9:50 PM, "Dragos Constantin" <dragos.constantin at stanford.edu>
wrote:

>Hi Bill,
>Here is the story.
>
>I am using Geant4 toolkit to perform parametric studies of a medical
>imaging detector. Geant4 is a serial code written in C++ which uses the
>Monte Carlo method to simulate the interaction between elementary
>particles and matter. To achieve good statistics I have to use many
>particles in my simulation. Hence, I divide the number of particles and
>I start many individual simulations with different seeds for the random
>number generator. To perform this kind of simulations I have used the
>TACC launcher module. Now, Geant4 is a great toolkit but it was not
>written for super clusters like Ranger and Lonestar. I did not know that
>and I can tell you I had my TACC account temporarily suspended because
>my runs were generating huge I/O loads. Later, I figured out that my
>runs, which were using less than 2% of the computing capacity of the
>machine, were generating more than 2 million IOPS which far exceeds the
>I/O limit of the controller buffer of the storage device (DataDirect
>SFA10000). This high I/O was generated because Geant4 uses a lot of data
>files which contain all the physics related to elementary particles
>interaction with matter. Of course the data files ware available to the
>compute nodes through $SCRATCH (Lustre file system) but all the
>instances (a few hundred) were accessing one location at a very high
>rate at the same time. So, I have modified the toolkit and I have
>created static libs out of these data files and now I link them when I
>compile my application. Thus I can distribute and practically eliminate
>the I/O load of my application. I have reduced the I/O load for one
>instance from ~10,000 IOPS to only 8 IOPS. Yaakoub from TACC, helped me
>and I have tested the new configuration on Lonestar during the last
>maintenance cycle and I did not have problems running my application on
>21,000 cores.
>
>I have benchmarked the I/O load of my application on my workstation with
>inotifywait from notify-tools:
>
>https://github.com/rvoicilas/inotify-tools
>
>Unfortunately, this application does not work on TACC machines and I
>also believe it is not suitable for HPC. Yaakoub told me to use darshan
>and this is how I have reached this point. I mean, I want to write at
>least a technical note about Geant4 scalability on Ranger and Lonestar
>but I need some numbers for the I/O load and I think darshan can help me
>here. I had to include MPI to my Geant4 application and I have linked
>the seed to the MPI process rank. Today I have had successful runs which
>generated darshan logs.
>
>To come back to your question. My application is statically linked
>because I want to avoid any I/O overload. From this perspective it make
>sense to have all the local libraries statically linked so I can
>distribute(eliminate) the I/O load. I fee I can better control the data
>flow with scripts like 'cache_binary' (this script is executed before
>ibrun if thousand of cores are used in the simulation). This is the only
>reason I prefer static over dynamic libraries. In any case I will test
>darshan with dynamic libs as well but the aim is to have all the local
>libraries statically linked in my final application.
>
>Sorry for the extremely long e-mail. I hope it makes sense.
>
>Thanks,
>Dragos
>
>
>
>Dragos Constantin, PhD
>
>Research Associate
>Department of Radiology
>Stanford University
>Lucas MRS Center
>1201 Welch Rd., PS-055
>Stanford CA 94305
>
>Office: (650) 736-9961
>Fax: (650) 723-5795
>
>On 11/28/2012 5:31 PM, Bill Barth wrote:
>> Dragos,
>>
>> Your directories are available on all the compute nodes on Ranger, so if
>> your darshan dynamic libs are in any of your directories, you should be
>> able to set your LD_LIBRARYPATH or the executable rpath appropriately to
>> point at your version of the darshan dynamic libraries.
>>
>> Is there a reason you prefer the static version?
>>
>> Bill.
>> --
>> Bill Barth, Ph.D., Director, HPC
>> bbarth at tacc.utexas.edu        |   Phone: (512) 232-7069
>> Office: ROC 1.435             |   Fax:   (512) 475-9445
>>
>>
>>
>>
>>
>>
>>
>> On 11/28/12 6:20 PM, "Dragos Constantin"
>><dragos.constantin at stanford.edu>
>> wrote:
>>
>>> Hi Phil,
>>> So, v2.2.4-pre6 works on both Ranger and Lonestar. I can confirm that
>>> darshan generates a log file. However, I am a bit confused because when
>>> I parse the log file it says my application did not open any file and
>>>in
>>> fact I am at least generating several output files. Maybe I have to
>>> configure something or supply some flags at compilation time so I can
>>> capture the full I/O load generated by my application.
>>>
>>> Do you think I should use the darshan-test application and see what the
>>> output looks like?
>>>
>>> You are right one cannot build 100% static executables on Ranger and
>>> Lonestar. However, the dynamic libs such as libverbs are installed on
>>> each compute node so, it is not an issue. What is more important is
>>>that
>>> the darshan lib and all my other libs are statically linked because
>>>they
>>> are not deployed system wide. In any case, I am thinking I should have
>>> seen in the darshan log files that some I/O activity occurred because
>>>of
>>> these dynamic libs.
>>>
>>>
>>> Thanks,
>>> Dragos
>>>
>>>
>>>
>>> Dragos Constantin, PhD
>>>
>>> Research Associate
>>> Department of Radiology
>>> Stanford University
>>> Lucas MRS Center
>>> 1201 Welch Rd., PS-055
>>> Stanford CA 94305
>>>
>>> Office: (650) 736-9961
>>> Fax: (650) 723-5795
>>>
>>> On 11/28/2012 11:25 AM, Phil Carns wrote:
>>>> Hi Dragos,
>>>>
>>>> Could you try this pre-release version of Darshan and us know if it
>>>> works for you?
>>>>
>>>> ftp://ftp.mcs.anl.gov/pub/darshan/releases/darshan-2.2.4-pre6.tar.gz
>>>>
>>>> The darshan-gen-* scripts will only work with mvapich2.
>>>>
>>>> I noticed an unrelated issue when trying to test this release on
>>>> Ranger, however.  I was not able to build a static executable using
>>>> mvapich2 (with or without darshan) because it could not find a static
>>>> version of the libverbs library. I was trying to generate a static
>>>> executable by just adding -static to the mpicc command line.  Maybe
>>>> there is an additional step needed to get a fully static executable?
>>>>
>>>> thanks,
>>>> -Phil
>>>>
>>>> On 11/27/2012 10:25 AM, Phil Carns wrote:
>>>>> Hi Dragos,
>>>>>
>>>>> Thanks for the bug report.  It looks like the
>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc is just ordering the link
>>>>> arguments differently than darshan-gen-cc.pl expected.  We should be
>>>>> able to work around this without too much trouble. In terms of the
>>>>> perl code I think we just need to modify the regular expression to
>>>>> collect a "$link_cmd_prefix" in addition to a "$link_cmd_suffix" if
>>>>> anything appears in the link command line from the first '-'
>>>>> character up to the object name.  We can then just pass those
>>>>> arguments as is into the generated script. In this example the
>>>>> link_cmd_prefix would be:
>>>>>
>>>>> -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared -Wl,-rpath-link
>>>>> -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib
>>>>>
>>>>> I would like to see that particular mpicc script before making any
>>>>> changes, though, to make sure that we don't accidentally break
>>>>> something, but as (bad) luck would have it Ranger is in maintenance
>>>>> today.  We'll have a look at it tomorrow.
>>>>>
>>>>> thanks,
>>>>> -Phil
>>>>>
>>>>> On 11/26/2012 03:27 PM, Dragos Constantin wrote:
>>>>>> Hi Kevin,
>>>>>> The problem is not with the argument parsing. This is what I get in
>>>>>> both cases:
>>>>>>
>>>>>> login4$ ./darshan-gen-cc.pl /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc
>>>>>> --output mpicc.darshan
>>>>>> CC_from_link = gcc
>>>>>> -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared -Wl,-rpath-link
>>>>>> -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib
>>>>>> CC_from_compile = gcc
>>>>>> Error: cannot find matching CC from: gcc -c foo.c
>>>>>> -I/opt/apps/gcc4_4/mvapich/1.0.1/include
>>>>>> and: gcc -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -Wl,-rpath-link -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib foo.o -o foo -lmpich
>>>>>> -L/opt/ofed//lib64/ -libverbs -libumad -lpthread -lpthread -lrt
>>>>>>
>>>>>> login4$ ./darshan-gen-cc.pl --output mpicc.darshan
>>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc
>>>>>> CC_from_link = gcc
>>>>>> -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared -Wl,-rpath-link
>>>>>> -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib
>>>>>> CC_from_compile = gcc
>>>>>> Error: cannot find matching CC from: gcc -c foo.c
>>>>>> -I/opt/apps/gcc4_4/mvapich/1.0.1/include
>>>>>> and: gcc -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -Wl,-rpath-link -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib foo.o -o foo -lmpich
>>>>>> -L/opt/ofed//lib64/ -libverbs -libumad -lpthread -lpthread -lrt
>>>>>>
>>>>>> As you can see:
>>>>>>
>>>>>> CC_from_compile = gcc
>>>>>>
>>>>>> but CC_from_link is not gcc and if I am not mistaken it should be
>>>>>> gcc. I just started to look at the script and you might know better
>>>>>> what is going on here.
>>>>>>
>>>>>> Thanks,
>>>>>> Dragos
>>>>>>
>>>>>>
>>>>>> Dragos Constantin, PhD
>>>>>>
>>>>>> Research Associate
>>>>>> Department of Radiology
>>>>>> Stanford University
>>>>>> Lucas MRS Center
>>>>>> 1201 Welch Rd., PS-055
>>>>>> Stanford CA 94305
>>>>>>
>>>>>> Office: (650) 736-9961
>>>>>> Fax: (650) 723-5795
>>>>>>
>>>>>> ----- Original Message -----
>>>>>> From: "Kevin Harms" <harms at alcf.anl.gov>
>>>>>> To: "Dragos Constantin" <dragos.constantin at stanford.edu>
>>>>>> Cc: darshan-users at lists.mcs.anl.gov
>>>>>> Sent: Monday, November 26, 2012 12:23:00 PM
>>>>>> Subject: Re: [Darshan-users] Instrumenting statically-linked
>>>>>> applications
>>>>>>
>>>>>>
>>>>>>     I think this might be a simple issue with argument parsing. Try
>>>>>> this instead:
>>>>>>
>>>>>>> ./darshan-gen-cc.pl --output mpicc.darshan
>>>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc
>>>>>> kevin
>>>>>>
>>>>>> On Nov 26, 2012, at 2:16 PM, Dragos Constantin wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> I've installed and configured darshan-2.2.3 on TACC Ranger in my
>>>>>>> user space. I have used gcc-4.4.5 (and mvapich-1.0.1).
>>>>>>>
>>>>>>> When I try to generate the MPI compiler scripts for
>>>>>>> statically-linked applications I get the following error:
>>>>>>>
>>>>>>> login4$ ./darshan-gen-cc.pl
>>>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc --output mpicc.darshan
>>>>>>> Error: cannot find matching CC from: gcc -c foo.c
>>>>>>> -I/opt/apps/gcc4_4/mvapich/1.0.1/include
>>>>>>> and: gcc -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -Wl,-rpath-link -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib foo.o -o foo -lmpich
>>>>>>> -L/opt/ofed//lib64/ -libverbs -libumad -lpthread -lpthread -lrt
>>>>>>>
>>>>>>> I am not quite sure what triggered this. Any ideas how to quickly
>>>>>>> fix the issue? I will look at the perl script to see what is going
>>>>>>> on there.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dragos
>>>>>>>
>>>>>>>
>>>>>>> Dragos Constantin, PhD
>>>>>>>
>>>>>>> Research Associate
>>>>>>> Department of Radiology
>>>>>>> Stanford University
>>>>>>> Lucas MRS Center
>>>>>>> 1201 Welch Rd., PS-055
>>>>>>> Stanford CA 94305
>>>>>>>
>>>>>>> Office: (650) 736-9961
>>>>>>> Fax: (650) 723-5795
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Darshan-users mailing list
>>>>>>> Darshan-users at lists.mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>>>> _______________________________________________
>>>>>> Darshan-users mailing list
>>>>>> Darshan-users at lists.mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>>> _______________________________________________
>>>>> Darshan-users mailing list
>>>>> Darshan-users at lists.mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>>
>>> _______________________________________________
>>> Darshan-users mailing list
>>> Darshan-users at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>
>



More information about the Darshan-users mailing list