[Darshan-users] Fwd: Re: LD_PRELOAD fails with OpenMPI

Phil Carns carns at mcs.anl.gov
Mon Jan 7 09:08:32 CST 2013


Thanks for testing that out and reporting back, Myriam.  I'm glad to 
hear that your example is working now.

-Phil

Subject:
Re: [Darshan-users] LD_PRELOAD fails with OpenMPI
From:
myriam.botalla at bull.net
Date:
01/04/2013 12:03 PM

To:
darshan-users-bounces at lists.mcs.anl.gov


Hi Phil,

Thanks for investigating this case.
Using your patch OR patching Open MPI malloc wrapper as you suggested we 
were able to instrument the application at run time with LD_PRELOAD.
We could provide the Open MPI team with a patch converting stat() calls 
to access() calls.
But whatever their decision to accept it these one can be part of the 
patch list we apply for our particular use.

Many thanks for your help,
Myriam


-------- Original Message --------
Subject: 	Re: [Darshan-users] LD_PRELOAD fails with OpenMPI
Date: 	Thu, 03 Jan 2013 15:12:25 -0500
From: 	Phil Carns <carns at mcs.anl.gov>
To: 	darshan-users at lists.mcs.anl.gov



Hi Myriam,

I was able to reproduce the problem here and confirm my initial 
suspicion.  The malloc initialization hooks in Open MPI are calling 
stat(), which conflicts with the Darshan stat() wrapper because it needs 
calloc() to work in order to dynamically load function symbols.

I've attached a patch that works around this problem if you can try it 
out and report back.

I'm not sure what to do about this in the long run.  I would be nervous 
about integrating that particular patch into the official code base 
because it will be quite fragile if the Open MPI malloc init hooks ever 
change in the future.  I'll keep thinking about it and see if I can come 
up with any other solution.  Our best bet might be to provide a patch to 
the Open MPI team that converts those stat() calls into access() calls. 
Darshan does not intercept access() calls, and the Open MPI code doesn't 
really need the results of the stat operation.  They are just checking 
for the existence of particular files.

-Phil

On 01/02/2013 01:14 PM, Phil Carns wrote:
> Hi Myriam,
>
> Thank you for the detailed bug report.  We'll try to reproduce this 
> and get back to you.  I assume that Open MPI is configured to use IB 
> in this environment?
>
> I think the issue here is that Open MPI very early on is setting up 
> its own wrappers for malloc, and it happens to make a stat() or 
> fstat() call as part of that process.  This is problematic because 
> Darshan wants to intercept the stat() calls, but it needs malloc 
> working (as part of the symbol resolution process) before it can 
> intercept any functions via LD_PRELOAD.  I'm not yet sure how to 
> handle this but we'll have a look at it.
>
> -Phil
>
>
> On 12/28/2012 06:00 AM, myriam.botalla at bull.net wrote:
>> Hi,
>> When I use LD_PRELOAD to get an application instrumented with Darshan 
>> at run time, a SEGMENTATION FAULT raises.
>> Also, the behaviour is the same when simply running mpicc to get the 
>> version - and the generated coredump displays the same stack - which 
>> seems to point the MPI wrappers as being the potential suspect.
>>
>> # which mpicc
>> /home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.1/bin/mpicc 
>>
>> # mpicc --showme:version
>> Erreur de segmentation (core dumped)
>> # gdb mpicc core.14109
>> GNU gdb (GDB) bullx Linux (7.2-50.bl6.Bull.1.20120306)
>> Copyright (C) 2010 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later 
>> <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> Reading symbols from 
>> /home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.1/bin/mpicc...done.
>> [New Thread 14109]
>> Missing separate debuginfo for 
>> /home_nfs/botallam/install/darshan.4/lib/libdarshan.so
>> ....
>> Reading symbols from 
>> /home_nfs/papaureg/workspace/openmpi-1.6.2//current_AE2__2/x86_64_bullxlinux6.1.1/lib/libmpi.so.1...done.
>> Loaded symbols for 
>> /home_nfs/papaureg/workspace/openmpi-1.6.2//current_AE2__2/x86_64_bullxlinux6.1.1/lib/libmpi.so.1
>> Core was generated by 
>> `/home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x00000036b7879446 in calloc () from /lib64/libc.so.6
>> Missing separate debuginfos, use: debuginfo-install 
>> glibc-2.12-1.47.bl6_2.9.x86_64 libgcc-4.4.5-6.bl6.x86_64 
>> numactl-2.0.3-9.bl6.x86_64 zlib-1.2.3-25.bl6.x86_64
>> (gdb) where
>> #0  0x00000036b7879446 in calloc () from /lib64/libc.so.6
>> #1  0x00000036b7c01310 in _dlerror_run () from /lib64/libdl.so.2
>> #2  0x00000036b7c0107a in dlsym () from /lib64/libdl.so.2
>> #3  0x00007f50f98c3487 in __xstat (vers=1, path=0x7f50f966ffac 
>> "/dev/ummunotify", buf=0x7fff6b6f08c0) at lib/darshan-posix.c:711
>> #4  0x00007f50f9661a64 in opal_memory_linux_malloc_init_hook () at 
>> hooks.c:756
>> #5  0x00000036b7875b63 in ptmalloc_init () from /lib64/libc.so.6
>> #6  0x00000036b7879987 in malloc_hook_ini () from /lib64/libc.so.6
>> #7  0x00000036b78a6da1 in __alloc_dir () from /lib64/libc.so.6
>> #8  0x00000036b94053cd in ?? () from /usr/lib64/libnuma.so.1
>> #9  0x00000036b740e515 in _dl_init_internal () from 
>> /lib64/ld-linux-x86-64.so.2
>> #10 0x00000036b7400b3a in _dl_start_user () from 
>> /lib64/ld-linux-x86-64.so.2
>> #11 0x0000000000000002 in ?? ()
>> #12 0x00007fff6b6f1a43 in ?? ()
>> #13 0x00007fff6b6f1a9e in ?? ()
>> #14 0x0000000000000000 in ?? ()
>> (gdb)
>>
>> Can someone help me to understand the issue?
>> Thanks,
>> Myriam.
>>
>>
>>
>> HERE IS the environment:
>>
>> LD_PRELOAD=/home_nfs/botallam/install/darshan.4/lib/libdarshan.so
>>
>> The Darshan library was configured and generated with environment 
>> variable OMPI_CC=gcc
>> using mpicc: Open MPI 1.6.2 (Language: C)
>>
>> # which mpicc
>> /home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.1/bin/mpicc 
>>
>> # CC=mpicc CFLAGS=-g ./configure 
>> --prefix=/home_nfs/botallam/install/darshan.4 --with-mem-align=16 
>> --with-log-path-by-env=DARSHAN_LOGPATH --with-jobid-env=SLURM_JOBID
>>
>> # ldd /home_nfs/botallam/install/darshan.4/lib/libdarshan.so
>>         linux-vdso.so.1 =>  (0x00007fffe9e93000)
>>         libdl.so.2 => /lib64/libdl.so.2 (0x00007f8a753bd000)
>>         libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8a751a0000)
>>         librt.so.1 => /lib64/librt.so.1 (0x00007f8a74f98000)
>>         libz.so.1 => /lib64/libz.so.1 (0x00007f8a74d83000)
>>         libmpi.so.1 => 
>> /home_nfs/papaureg/workspace/openmpi-1.6.2//current_AE2__2/x86_64_bullxlinux6.1.1/lib/libmpi.so.1(0x00007f8a74967000) 
>>
>>         libm.so.6 => /lib64/libm.so.6 (0x00007f8a746e3000)
>>         libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f8a744db000)
>>         libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f8a742c1000)
>>         libutil.so.1 => /lib64/libutil.so.1 (0x00007f8a740be000)
>>         libc.so.6 => /lib64/libc.so.6 (0x00007f8a73d2e000)
>> /lib64/ld-linux-x86-64.so.2 (0x00000036b7400000)
>>         libimf.so => 
>> /opt/intel/composer_xe_2013.1.117/compiler/lib/intel64/libimf.so 
>> (0x00007f8a73871000)
>>         libsvml.so => 
>> /opt/intel/composer_xe_2013.1.117/compiler/lib/intel64/libsvml.so 
>> (0x00007f8a72fa3000)
>>         libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8a72d8d000)
>>         libintlc.so.5 => 
>> /opt/intel/composer_xe_2013.1.117/compiler/lib/intel64/libintlc.so.5 
>> (0x00007f8a72b3e000)
>> #
>> #
>> # nm /home_nfs/botallam/install/darshan.4/lib/libdarshan.so|grep mpi
>> 0000000000227020 B __real_ncmpi_close
>> 0000000000227010 B __real_ncmpi_create
>> 0000000000227018 B __real_ncmpi_open
>> 00000000000048cc T darshan_mpi_initialize
>> 00000000000179d2 T ncmpi_close
>> 000000000001763c T ncmpi_create
>> 0000000000017807 T ncmpi_open
>>                  U ompi_mpi_byte
>>                  U ompi_mpi_char
>>                  U ompi_mpi_comm_world
>>                  U ompi_mpi_double
>>                  U ompi_mpi_info_null
>>                  U ompi_mpi_int
>>                  U ompi_mpi_long
>>                  U ompi_mpi_op_land
>>                  U ompi_mpi_op_lor
>>                  U ompi_mpi_op_max
>>                  U ompi_mpi_op_null
>>                  U ompi_mpi_op_sum
>> 0000000000016444 T resolve_mpi_symbols
>> #
>>
>>
>>
>>
>>
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
>
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20130107/7998fccd/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: darshan-openmpi-ldpreload-stat.patch
Type: text/x-patch
Size: 2100 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20130107/7998fccd/attachment-0001.bin>
-------------- next part --------------
_______________________________________________
Darshan-users mailing list
Darshan-users at lists.mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/darshan-users



More information about the Darshan-users mailing list