[Darshan-users] Fwd: Re: LD_PRELOAD fails with OpenMPI

Phil Carns carns at mcs.anl.gov
Thu Jan 10 08:38:00 CST 2013


I have one more quick update on this thread.  I suggested the Open MPI 
modification on their developer's mailing list yesterday, and Jeff 
Squyres has already responded with a patch that addresses the issue.  
Darshan should work in LD_PRELOAD mode without modification with future 
Open MPI releases.

http://www.open-mpi.org/community/lists/devel/2013/01/11916.php

-Phil

On 01/07/2013 10:08 AM, Phil Carns wrote:
> Thanks for testing that out and reporting back, Myriam.  I'm glad to 
> hear that your example is working now.
>
> -Phil
>
> Subject:
> Re: [Darshan-users] LD_PRELOAD fails with OpenMPI
> From:
> myriam.botalla at bull.net
> Date:
> 01/04/2013 12:03 PM
>
> To:
> darshan-users-bounces at lists.mcs.anl.gov
>
>
> Hi Phil,
>
> Thanks for investigating this case.
> Using your patch OR patching Open MPI malloc wrapper as you suggested 
> we were able to instrument the application at run time with LD_PRELOAD.
> We could provide the Open MPI team with a patch converting stat() 
> calls to access() calls.
> But whatever their decision to accept it these one can be part of the 
> patch list we apply for our particular use.
>
> Many thanks for your help,
> Myriam
>
>
> -------- Original Message --------
> Subject: 	Re: [Darshan-users] LD_PRELOAD fails with OpenMPI
> Date: 	Thu, 03 Jan 2013 15:12:25 -0500
> From: 	Phil Carns <carns at mcs.anl.gov>
> To: 	darshan-users at lists.mcs.anl.gov
>
>
>
> Hi Myriam,
>
> I was able to reproduce the problem here and confirm my initial 
> suspicion.  The malloc initialization hooks in Open MPI are calling 
> stat(), which conflicts with the Darshan stat() wrapper because it 
> needs calloc() to work in order to dynamically load function symbols.
>
> I've attached a patch that works around this problem if you can try it 
> out and report back.
>
> I'm not sure what to do about this in the long run.  I would be 
> nervous about integrating that particular patch into the official code 
> base because it will be quite fragile if the Open MPI malloc init 
> hooks ever change in the future.  I'll keep thinking about it and see 
> if I can come up with any other solution.  Our best bet might be to 
> provide a patch to the Open MPI team that converts those stat() calls 
> into access() calls.  Darshan does not intercept access() calls, and 
> the Open MPI code doesn't really need the results of the stat 
> operation.  They are just checking for the existence of particular files.
>
> -Phil
>
> On 01/02/2013 01:14 PM, Phil Carns wrote:
>> Hi Myriam,
>>
>> Thank you for the detailed bug report.  We'll try to reproduce this 
>> and get back to you.  I assume that Open MPI is configured to use IB 
>> in this environment?
>>
>> I think the issue here is that Open MPI very early on is setting up 
>> its own wrappers for malloc, and it happens to make a stat() or 
>> fstat() call as part of that process.  This is problematic because 
>> Darshan wants to intercept the stat() calls, but it needs malloc 
>> working (as part of the symbol resolution process) before it can 
>> intercept any functions via LD_PRELOAD.  I'm not yet sure how to 
>> handle this but we'll have a look at it.
>>
>> -Phil
>>
>>
>> On 12/28/2012 06:00 AM, myriam.botalla at bull.net wrote:
>>> Hi,
>>> When I use LD_PRELOAD to get an application instrumented with 
>>> Darshan at run time, a SEGMENTATION FAULT raises.
>>> Also, the behaviour is the same when simply running mpicc to get the 
>>> version - and the generated coredump displays the same stack - which 
>>> seems to point the MPI wrappers as being the potential suspect.
>>>
>>> # which mpicc
>>> /home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.1/bin/mpicc 
>>>
>>> # mpicc --showme:version
>>> Erreur de segmentation (core dumped)
>>> # gdb mpicc core.14109
>>> GNU gdb (GDB) bullx Linux (7.2-50.bl6.Bull.1.20120306)
>>> Copyright (C) 2010 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later 
>>> <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> Reading symbols from 
>>> /home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.1/bin/mpicc...done.
>>> [New Thread 14109]
>>> Missing separate debuginfo for 
>>> /home_nfs/botallam/install/darshan.4/lib/libdarshan.so
>>> ....
>>> Reading symbols from 
>>> /home_nfs/papaureg/workspace/openmpi-1.6.2//current_AE2__2/x86_64_bullxlinux6.1.1/lib/libmpi.so.1...done.
>>> Loaded symbols for 
>>> /home_nfs/papaureg/workspace/openmpi-1.6.2//current_AE2__2/x86_64_bullxlinux6.1.1/lib/libmpi.so.1
>>> Core was generated by 
>>> `/home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0  0x00000036b7879446 in calloc () from /lib64/libc.so.6
>>> Missing separate debuginfos, use: debuginfo-install 
>>> glibc-2.12-1.47.bl6_2.9.x86_64 libgcc-4.4.5-6.bl6.x86_64 
>>> numactl-2.0.3-9.bl6.x86_64 zlib-1.2.3-25.bl6.x86_64
>>> (gdb) where
>>> #0  0x00000036b7879446 in calloc () from /lib64/libc.so.6
>>> #1  0x00000036b7c01310 in _dlerror_run () from /lib64/libdl.so.2
>>> #2  0x00000036b7c0107a in dlsym () from /lib64/libdl.so.2
>>> #3  0x00007f50f98c3487 in __xstat (vers=1, path=0x7f50f966ffac 
>>> "/dev/ummunotify", buf=0x7fff6b6f08c0) at lib/darshan-posix.c:711
>>> #4  0x00007f50f9661a64 in opal_memory_linux_malloc_init_hook () at 
>>> hooks.c:756
>>> #5  0x00000036b7875b63 in ptmalloc_init () from /lib64/libc.so.6
>>> #6  0x00000036b7879987 in malloc_hook_ini () from /lib64/libc.so.6
>>> #7  0x00000036b78a6da1 in __alloc_dir () from /lib64/libc.so.6
>>> #8  0x00000036b94053cd in ?? () from /usr/lib64/libnuma.so.1
>>> #9  0x00000036b740e515 in _dl_init_internal () from 
>>> /lib64/ld-linux-x86-64.so.2
>>> #10 0x00000036b7400b3a in _dl_start_user () from 
>>> /lib64/ld-linux-x86-64.so.2
>>> #11 0x0000000000000002 in ?? ()
>>> #12 0x00007fff6b6f1a43 in ?? ()
>>> #13 0x00007fff6b6f1a9e in ?? ()
>>> #14 0x0000000000000000 in ?? ()
>>> (gdb)
>>>
>>> Can someone help me to understand the issue?
>>> Thanks,
>>> Myriam.
>>>
>>>
>>>
>>> HERE IS the environment:
>>>
>>> LD_PRELOAD=/home_nfs/botallam/install/darshan.4/lib/libdarshan.so
>>>
>>> The Darshan library was configured and generated with environment 
>>> variable OMPI_CC=gcc
>>> using mpicc: Open MPI 1.6.2 (Language: C)
>>>
>>> # which mpicc
>>> /home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.1/bin/mpicc 
>>>
>>> # CC=mpicc CFLAGS=-g ./configure 
>>> --prefix=/home_nfs/botallam/install/darshan.4 --with-mem-align=16 
>>> --with-log-path-by-env=DARSHAN_LOGPATH --with-jobid-env=SLURM_JOBID
>>>
>>> # ldd /home_nfs/botallam/install/darshan.4/lib/libdarshan.so
>>>         linux-vdso.so.1 =>  (0x00007fffe9e93000)
>>>         libdl.so.2 => /lib64/libdl.so.2 (0x00007f8a753bd000)
>>>         libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8a751a0000)
>>>         librt.so.1 => /lib64/librt.so.1 (0x00007f8a74f98000)
>>>         libz.so.1 => /lib64/libz.so.1 (0x00007f8a74d83000)
>>>         libmpi.so.1 => 
>>> /home_nfs/papaureg/workspace/openmpi-1.6.2//current_AE2__2/x86_64_bullxlinux6.1.1/lib/libmpi.so.1(0x00007f8a74967000) 
>>>
>>>         libm.so.6 => /lib64/libm.so.6 (0x00007f8a746e3000)
>>>         libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f8a744db000)
>>>         libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f8a742c1000)
>>>         libutil.so.1 => /lib64/libutil.so.1 (0x00007f8a740be000)
>>>         libc.so.6 => /lib64/libc.so.6 (0x00007f8a73d2e000)
>>> /lib64/ld-linux-x86-64.so.2 (0x00000036b7400000)
>>>         libimf.so => 
>>> /opt/intel/composer_xe_2013.1.117/compiler/lib/intel64/libimf.so 
>>> (0x00007f8a73871000)
>>>         libsvml.so => 
>>> /opt/intel/composer_xe_2013.1.117/compiler/lib/intel64/libsvml.so 
>>> (0x00007f8a72fa3000)
>>>         libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8a72d8d000)
>>>         libintlc.so.5 => 
>>> /opt/intel/composer_xe_2013.1.117/compiler/lib/intel64/libintlc.so.5 
>>> (0x00007f8a72b3e000)
>>> #
>>> #
>>> # nm /home_nfs/botallam/install/darshan.4/lib/libdarshan.so|grep mpi
>>> 0000000000227020 B __real_ncmpi_close
>>> 0000000000227010 B __real_ncmpi_create
>>> 0000000000227018 B __real_ncmpi_open
>>> 00000000000048cc T darshan_mpi_initialize
>>> 00000000000179d2 T ncmpi_close
>>> 000000000001763c T ncmpi_create
>>> 0000000000017807 T ncmpi_open
>>>                  U ompi_mpi_byte
>>>                  U ompi_mpi_char
>>>                  U ompi_mpi_comm_world
>>>                  U ompi_mpi_double
>>>                  U ompi_mpi_info_null
>>>                  U ompi_mpi_int
>>>                  U ompi_mpi_long
>>>                  U ompi_mpi_op_land
>>>                  U ompi_mpi_op_lor
>>>                  U ompi_mpi_op_max
>>>                  U ompi_mpi_op_null
>>>                  U ompi_mpi_op_sum
>>> 0000000000016444 T resolve_mpi_symbols
>>> #
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Darshan-users mailing list
>>> Darshan-users at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>
>>
>>
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
>
>
>
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20130110/2f3e0193/attachment-0001.html>


More information about the Darshan-users mailing list