[Darshan-users] Instrumenting statically-linked applications
Dragos Constantin
dragos.constantin at stanford.edu
Thu Nov 29 10:13:49 CST 2012
Hi Bill,
I have already modified the code and now it uses MPI to pass the seed
and I am no longer using the launcher module to submit my jobs. In this
sense I have related the seed of the random number generator to the
process rank.
MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);
int theSeed = mpi_rank;
Before, I was using the $TACC_LAUNCHER_TSK_ID variable to initialize
the seed. The application is already generating darshan logs but I have
to figure out why the logs do not report file access (I am creating
several output files which are not reported).
Thanks,
Dragos
Dragos Constantin, PhD
Research Associate
Department of Radiology
Stanford University
Lucas MRS Center
1201 Welch Rd., PS-055
Stanford CA 94305
Office: (650) 736-9961
Fax: (650) 723-5795
On 11/29/2012 6:11 AM, Bill Barth wrote:
> Thanks for the explanation, Dragos.
>
> If your code is serial and you are using our launcher, and I understand
> correctly, you will not get any information by Darshan if it is not
> calling MPI_Init and MPI_Finalize. Can you say a little more about how
> your code is structured?
>
> Best,
> Bill.
> --
> Bill Barth, Ph.D., Director, HPC
> bbarth at tacc.utexas.edu | Phone: (512) 232-7069
> Office: ROC 1.435 | Fax: (512) 475-9445
>
>
>
>
>
>
>
> On 11/28/12 9:50 PM, "Dragos Constantin" <dragos.constantin at stanford.edu>
> wrote:
>
>> Hi Bill,
>> Here is the story.
>>
>> I am using Geant4 toolkit to perform parametric studies of a medical
>> imaging detector. Geant4 is a serial code written in C++ which uses the
>> Monte Carlo method to simulate the interaction between elementary
>> particles and matter. To achieve good statistics I have to use many
>> particles in my simulation. Hence, I divide the number of particles and
>> I start many individual simulations with different seeds for the random
>> number generator. To perform this kind of simulations I have used the
>> TACC launcher module. Now, Geant4 is a great toolkit but it was not
>> written for super clusters like Ranger and Lonestar. I did not know that
>> and I can tell you I had my TACC account temporarily suspended because
>> my runs were generating huge I/O loads. Later, I figured out that my
>> runs, which were using less than 2% of the computing capacity of the
>> machine, were generating more than 2 million IOPS which far exceeds the
>> I/O limit of the controller buffer of the storage device (DataDirect
>> SFA10000). This high I/O was generated because Geant4 uses a lot of data
>> files which contain all the physics related to elementary particles
>> interaction with matter. Of course the data files ware available to the
>> compute nodes through $SCRATCH (Lustre file system) but all the
>> instances (a few hundred) were accessing one location at a very high
>> rate at the same time. So, I have modified the toolkit and I have
>> created static libs out of these data files and now I link them when I
>> compile my application. Thus I can distribute and practically eliminate
>> the I/O load of my application. I have reduced the I/O load for one
>> instance from ~10,000 IOPS to only 8 IOPS. Yaakoub from TACC, helped me
>> and I have tested the new configuration on Lonestar during the last
>> maintenance cycle and I did not have problems running my application on
>> 21,000 cores.
>>
>> I have benchmarked the I/O load of my application on my workstation with
>> inotifywait from notify-tools:
>>
>> https://github.com/rvoicilas/inotify-tools
>>
>> Unfortunately, this application does not work on TACC machines and I
>> also believe it is not suitable for HPC. Yaakoub told me to use darshan
>> and this is how I have reached this point. I mean, I want to write at
>> least a technical note about Geant4 scalability on Ranger and Lonestar
>> but I need some numbers for the I/O load and I think darshan can help me
>> here. I had to include MPI to my Geant4 application and I have linked
>> the seed to the MPI process rank. Today I have had successful runs which
>> generated darshan logs.
>>
>> To come back to your question. My application is statically linked
>> because I want to avoid any I/O overload. From this perspective it make
>> sense to have all the local libraries statically linked so I can
>> distribute(eliminate) the I/O load. I fee I can better control the data
>> flow with scripts like 'cache_binary' (this script is executed before
>> ibrun if thousand of cores are used in the simulation). This is the only
>> reason I prefer static over dynamic libraries. In any case I will test
>> darshan with dynamic libs as well but the aim is to have all the local
>> libraries statically linked in my final application.
>>
>> Sorry for the extremely long e-mail. I hope it makes sense.
>>
>> Thanks,
>> Dragos
>>
>>
>>
>> Dragos Constantin, PhD
>>
>> Research Associate
>> Department of Radiology
>> Stanford University
>> Lucas MRS Center
>> 1201 Welch Rd., PS-055
>> Stanford CA 94305
>>
>> Office: (650) 736-9961
>> Fax: (650) 723-5795
>>
>> On 11/28/2012 5:31 PM, Bill Barth wrote:
>>> Dragos,
>>>
>>> Your directories are available on all the compute nodes on Ranger, so if
>>> your darshan dynamic libs are in any of your directories, you should be
>>> able to set your LD_LIBRARYPATH or the executable rpath appropriately to
>>> point at your version of the darshan dynamic libraries.
>>>
>>> Is there a reason you prefer the static version?
>>>
>>> Bill.
>>> --
>>> Bill Barth, Ph.D., Director, HPC
>>> bbarth at tacc.utexas.edu | Phone: (512) 232-7069
>>> Office: ROC 1.435 | Fax: (512) 475-9445
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 11/28/12 6:20 PM, "Dragos Constantin"
>>> <dragos.constantin at stanford.edu>
>>> wrote:
>>>
>>>> Hi Phil,
>>>> So, v2.2.4-pre6 works on both Ranger and Lonestar. I can confirm that
>>>> darshan generates a log file. However, I am a bit confused because when
>>>> I parse the log file it says my application did not open any file and
>>>> in
>>>> fact I am at least generating several output files. Maybe I have to
>>>> configure something or supply some flags at compilation time so I can
>>>> capture the full I/O load generated by my application.
>>>>
>>>> Do you think I should use the darshan-test application and see what the
>>>> output looks like?
>>>>
>>>> You are right one cannot build 100% static executables on Ranger and
>>>> Lonestar. However, the dynamic libs such as libverbs are installed on
>>>> each compute node so, it is not an issue. What is more important is
>>>> that
>>>> the darshan lib and all my other libs are statically linked because
>>>> they
>>>> are not deployed system wide. In any case, I am thinking I should have
>>>> seen in the darshan log files that some I/O activity occurred because
>>>> of
>>>> these dynamic libs.
>>>>
>>>>
>>>> Thanks,
>>>> Dragos
>>>>
>>>>
>>>>
>>>> Dragos Constantin, PhD
>>>>
>>>> Research Associate
>>>> Department of Radiology
>>>> Stanford University
>>>> Lucas MRS Center
>>>> 1201 Welch Rd., PS-055
>>>> Stanford CA 94305
>>>>
>>>> Office: (650) 736-9961
>>>> Fax: (650) 723-5795
>>>>
>>>> On 11/28/2012 11:25 AM, Phil Carns wrote:
>>>>> Hi Dragos,
>>>>>
>>>>> Could you try this pre-release version of Darshan and us know if it
>>>>> works for you?
>>>>>
>>>>> ftp://ftp.mcs.anl.gov/pub/darshan/releases/darshan-2.2.4-pre6.tar.gz
>>>>>
>>>>> The darshan-gen-* scripts will only work with mvapich2.
>>>>>
>>>>> I noticed an unrelated issue when trying to test this release on
>>>>> Ranger, however. I was not able to build a static executable using
>>>>> mvapich2 (with or without darshan) because it could not find a static
>>>>> version of the libverbs library. I was trying to generate a static
>>>>> executable by just adding -static to the mpicc command line. Maybe
>>>>> there is an additional step needed to get a fully static executable?
>>>>>
>>>>> thanks,
>>>>> -Phil
>>>>>
>>>>> On 11/27/2012 10:25 AM, Phil Carns wrote:
>>>>>> Hi Dragos,
>>>>>>
>>>>>> Thanks for the bug report. It looks like the
>>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc is just ordering the link
>>>>>> arguments differently than darshan-gen-cc.pl expected. We should be
>>>>>> able to work around this without too much trouble. In terms of the
>>>>>> perl code I think we just need to modify the regular expression to
>>>>>> collect a "$link_cmd_prefix" in addition to a "$link_cmd_suffix" if
>>>>>> anything appears in the link command line from the first '-'
>>>>>> character up to the object name. We can then just pass those
>>>>>> arguments as is into the generated script. In this example the
>>>>>> link_cmd_prefix would be:
>>>>>>
>>>>>> -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared -Wl,-rpath-link
>>>>>> -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib
>>>>>>
>>>>>> I would like to see that particular mpicc script before making any
>>>>>> changes, though, to make sure that we don't accidentally break
>>>>>> something, but as (bad) luck would have it Ranger is in maintenance
>>>>>> today. We'll have a look at it tomorrow.
>>>>>>
>>>>>> thanks,
>>>>>> -Phil
>>>>>>
>>>>>> On 11/26/2012 03:27 PM, Dragos Constantin wrote:
>>>>>>> Hi Kevin,
>>>>>>> The problem is not with the argument parsing. This is what I get in
>>>>>>> both cases:
>>>>>>>
>>>>>>> login4$ ./darshan-gen-cc.pl /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc
>>>>>>> --output mpicc.darshan
>>>>>>> CC_from_link = gcc
>>>>>>> -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared -Wl,-rpath-link
>>>>>>> -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib
>>>>>>> CC_from_compile = gcc
>>>>>>> Error: cannot find matching CC from: gcc -c foo.c
>>>>>>> -I/opt/apps/gcc4_4/mvapich/1.0.1/include
>>>>>>> and: gcc -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -Wl,-rpath-link -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib foo.o -o foo -lmpich
>>>>>>> -L/opt/ofed//lib64/ -libverbs -libumad -lpthread -lpthread -lrt
>>>>>>>
>>>>>>> login4$ ./darshan-gen-cc.pl --output mpicc.darshan
>>>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc
>>>>>>> CC_from_link = gcc
>>>>>>> -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared -Wl,-rpath-link
>>>>>>> -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib
>>>>>>> CC_from_compile = gcc
>>>>>>> Error: cannot find matching CC from: gcc -c foo.c
>>>>>>> -I/opt/apps/gcc4_4/mvapich/1.0.1/include
>>>>>>> and: gcc -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -Wl,-rpath-link -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib foo.o -o foo -lmpich
>>>>>>> -L/opt/ofed//lib64/ -libverbs -libumad -lpthread -lpthread -lrt
>>>>>>>
>>>>>>> As you can see:
>>>>>>>
>>>>>>> CC_from_compile = gcc
>>>>>>>
>>>>>>> but CC_from_link is not gcc and if I am not mistaken it should be
>>>>>>> gcc. I just started to look at the script and you might know better
>>>>>>> what is going on here.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dragos
>>>>>>>
>>>>>>>
>>>>>>> Dragos Constantin, PhD
>>>>>>>
>>>>>>> Research Associate
>>>>>>> Department of Radiology
>>>>>>> Stanford University
>>>>>>> Lucas MRS Center
>>>>>>> 1201 Welch Rd., PS-055
>>>>>>> Stanford CA 94305
>>>>>>>
>>>>>>> Office: (650) 736-9961
>>>>>>> Fax: (650) 723-5795
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>> From: "Kevin Harms" <harms at alcf.anl.gov>
>>>>>>> To: "Dragos Constantin" <dragos.constantin at stanford.edu>
>>>>>>> Cc: darshan-users at lists.mcs.anl.gov
>>>>>>> Sent: Monday, November 26, 2012 12:23:00 PM
>>>>>>> Subject: Re: [Darshan-users] Instrumenting statically-linked
>>>>>>> applications
>>>>>>>
>>>>>>>
>>>>>>> I think this might be a simple issue with argument parsing. Try
>>>>>>> this instead:
>>>>>>>
>>>>>>>> ./darshan-gen-cc.pl --output mpicc.darshan
>>>>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc
>>>>>>> kevin
>>>>>>>
>>>>>>> On Nov 26, 2012, at 2:16 PM, Dragos Constantin wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I've installed and configured darshan-2.2.3 on TACC Ranger in my
>>>>>>>> user space. I have used gcc-4.4.5 (and mvapich-1.0.1).
>>>>>>>>
>>>>>>>> When I try to generate the MPI compiler scripts for
>>>>>>>> statically-linked applications I get the following error:
>>>>>>>>
>>>>>>>> login4$ ./darshan-gen-cc.pl
>>>>>>>> /opt/apps/gcc4_4/mvapich/1.0.1/bin/mpicc --output mpicc.darshan
>>>>>>>> Error: cannot find matching CC from: gcc -c foo.c
>>>>>>>> -I/opt/apps/gcc4_4/mvapich/1.0.1/include
>>>>>>>> and: gcc -Wl,-rpath,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>>> -Wl,-rpath-link -Wl,/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib/shared
>>>>>>>> -L/opt/apps/gcc4_4/mvapich/1.0.1/lib foo.o -o foo -lmpich
>>>>>>>> -L/opt/ofed//lib64/ -libverbs -libumad -lpthread -lpthread -lrt
>>>>>>>>
>>>>>>>> I am not quite sure what triggered this. Any ideas how to quickly
>>>>>>>> fix the issue? I will look at the perl script to see what is going
>>>>>>>> on there.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Dragos
>>>>>>>>
>>>>>>>>
>>>>>>>> Dragos Constantin, PhD
>>>>>>>>
>>>>>>>> Research Associate
>>>>>>>> Department of Radiology
>>>>>>>> Stanford University
>>>>>>>> Lucas MRS Center
>>>>>>>> 1201 Welch Rd., PS-055
>>>>>>>> Stanford CA 94305
>>>>>>>>
>>>>>>>> Office: (650) 736-9961
>>>>>>>> Fax: (650) 723-5795
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Darshan-users mailing list
>>>>>>>> Darshan-users at lists.mcs.anl.gov
>>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>>>>> _______________________________________________
>>>>>>> Darshan-users mailing list
>>>>>>> Darshan-users at lists.mcs.anl.gov
>>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>>>> _______________________________________________
>>>>>> Darshan-users mailing list
>>>>>> Darshan-users at lists.mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>>> _______________________________________________
>>>> Darshan-users mailing list
>>>> Darshan-users at lists.mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
More information about the Darshan-users
mailing list