[Darshan-users] Fwd: Re: LD_PRELOAD fails with OpenMPI
Phil Carns
carns at mcs.anl.gov
Thu Jan 10 08:38:00 CST 2013
I have one more quick update on this thread. I suggested the Open MPI
modification on their developer's mailing list yesterday, and Jeff
Squyres has already responded with a patch that addresses the issue.
Darshan should work in LD_PRELOAD mode without modification with future
Open MPI releases.
http://www.open-mpi.org/community/lists/devel/2013/01/11916.php
-Phil
On 01/07/2013 10:08 AM, Phil Carns wrote:
> Thanks for testing that out and reporting back, Myriam. I'm glad to
> hear that your example is working now.
>
> -Phil
>
> Subject:
> Re: [Darshan-users] LD_PRELOAD fails with OpenMPI
> From:
> myriam.botalla at bull.net
> Date:
> 01/04/2013 12:03 PM
>
> To:
> darshan-users-bounces at lists.mcs.anl.gov
>
>
> Hi Phil,
>
> Thanks for investigating this case.
> Using your patch OR patching Open MPI malloc wrapper as you suggested
> we were able to instrument the application at run time with LD_PRELOAD.
> We could provide the Open MPI team with a patch converting stat()
> calls to access() calls.
> But whatever their decision to accept it these one can be part of the
> patch list we apply for our particular use.
>
> Many thanks for your help,
> Myriam
>
>
> -------- Original Message --------
> Subject: Re: [Darshan-users] LD_PRELOAD fails with OpenMPI
> Date: Thu, 03 Jan 2013 15:12:25 -0500
> From: Phil Carns <carns at mcs.anl.gov>
> To: darshan-users at lists.mcs.anl.gov
>
>
>
> Hi Myriam,
>
> I was able to reproduce the problem here and confirm my initial
> suspicion. The malloc initialization hooks in Open MPI are calling
> stat(), which conflicts with the Darshan stat() wrapper because it
> needs calloc() to work in order to dynamically load function symbols.
>
> I've attached a patch that works around this problem if you can try it
> out and report back.
>
> I'm not sure what to do about this in the long run. I would be
> nervous about integrating that particular patch into the official code
> base because it will be quite fragile if the Open MPI malloc init
> hooks ever change in the future. I'll keep thinking about it and see
> if I can come up with any other solution. Our best bet might be to
> provide a patch to the Open MPI team that converts those stat() calls
> into access() calls. Darshan does not intercept access() calls, and
> the Open MPI code doesn't really need the results of the stat
> operation. They are just checking for the existence of particular files.
>
> -Phil
>
> On 01/02/2013 01:14 PM, Phil Carns wrote:
>> Hi Myriam,
>>
>> Thank you for the detailed bug report. We'll try to reproduce this
>> and get back to you. I assume that Open MPI is configured to use IB
>> in this environment?
>>
>> I think the issue here is that Open MPI very early on is setting up
>> its own wrappers for malloc, and it happens to make a stat() or
>> fstat() call as part of that process. This is problematic because
>> Darshan wants to intercept the stat() calls, but it needs malloc
>> working (as part of the symbol resolution process) before it can
>> intercept any functions via LD_PRELOAD. I'm not yet sure how to
>> handle this but we'll have a look at it.
>>
>> -Phil
>>
>>
>> On 12/28/2012 06:00 AM, myriam.botalla at bull.net wrote:
>>> Hi,
>>> When I use LD_PRELOAD to get an application instrumented with
>>> Darshan at run time, a SEGMENTATION FAULT raises.
>>> Also, the behaviour is the same when simply running mpicc to get the
>>> version - and the generated coredump displays the same stack - which
>>> seems to point the MPI wrappers as being the potential suspect.
>>>
>>> # which mpicc
>>> /home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.1/bin/mpicc
>>>
>>> # mpicc --showme:version
>>> Erreur de segmentation (core dumped)
>>> # gdb mpicc core.14109
>>> GNU gdb (GDB) bullx Linux (7.2-50.bl6.Bull.1.20120306)
>>> Copyright (C) 2010 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later
>>> <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> Reading symbols from
>>> /home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.1/bin/mpicc...done.
>>> [New Thread 14109]
>>> Missing separate debuginfo for
>>> /home_nfs/botallam/install/darshan.4/lib/libdarshan.so
>>> ....
>>> Reading symbols from
>>> /home_nfs/papaureg/workspace/openmpi-1.6.2//current_AE2__2/x86_64_bullxlinux6.1.1/lib/libmpi.so.1...done.
>>> Loaded symbols for
>>> /home_nfs/papaureg/workspace/openmpi-1.6.2//current_AE2__2/x86_64_bullxlinux6.1.1/lib/libmpi.so.1
>>> Core was generated by
>>> `/home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0 0x00000036b7879446 in calloc () from /lib64/libc.so.6
>>> Missing separate debuginfos, use: debuginfo-install
>>> glibc-2.12-1.47.bl6_2.9.x86_64 libgcc-4.4.5-6.bl6.x86_64
>>> numactl-2.0.3-9.bl6.x86_64 zlib-1.2.3-25.bl6.x86_64
>>> (gdb) where
>>> #0 0x00000036b7879446 in calloc () from /lib64/libc.so.6
>>> #1 0x00000036b7c01310 in _dlerror_run () from /lib64/libdl.so.2
>>> #2 0x00000036b7c0107a in dlsym () from /lib64/libdl.so.2
>>> #3 0x00007f50f98c3487 in __xstat (vers=1, path=0x7f50f966ffac
>>> "/dev/ummunotify", buf=0x7fff6b6f08c0) at lib/darshan-posix.c:711
>>> #4 0x00007f50f9661a64 in opal_memory_linux_malloc_init_hook () at
>>> hooks.c:756
>>> #5 0x00000036b7875b63 in ptmalloc_init () from /lib64/libc.so.6
>>> #6 0x00000036b7879987 in malloc_hook_ini () from /lib64/libc.so.6
>>> #7 0x00000036b78a6da1 in __alloc_dir () from /lib64/libc.so.6
>>> #8 0x00000036b94053cd in ?? () from /usr/lib64/libnuma.so.1
>>> #9 0x00000036b740e515 in _dl_init_internal () from
>>> /lib64/ld-linux-x86-64.so.2
>>> #10 0x00000036b7400b3a in _dl_start_user () from
>>> /lib64/ld-linux-x86-64.so.2
>>> #11 0x0000000000000002 in ?? ()
>>> #12 0x00007fff6b6f1a43 in ?? ()
>>> #13 0x00007fff6b6f1a9e in ?? ()
>>> #14 0x0000000000000000 in ?? ()
>>> (gdb)
>>>
>>> Can someone help me to understand the issue?
>>> Thanks,
>>> Myriam.
>>>
>>>
>>>
>>> HERE IS the environment:
>>>
>>> LD_PRELOAD=/home_nfs/botallam/install/darshan.4/lib/libdarshan.so
>>>
>>> The Darshan library was configured and generated with environment
>>> variable OMPI_CC=gcc
>>> using mpicc: Open MPI 1.6.2 (Language: C)
>>>
>>> # which mpicc
>>> /home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.1/bin/mpicc
>>>
>>> # CC=mpicc CFLAGS=-g ./configure
>>> --prefix=/home_nfs/botallam/install/darshan.4 --with-mem-align=16
>>> --with-log-path-by-env=DARSHAN_LOGPATH --with-jobid-env=SLURM_JOBID
>>>
>>> # ldd /home_nfs/botallam/install/darshan.4/lib/libdarshan.so
>>> linux-vdso.so.1 => (0x00007fffe9e93000)
>>> libdl.so.2 => /lib64/libdl.so.2 (0x00007f8a753bd000)
>>> libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8a751a0000)
>>> librt.so.1 => /lib64/librt.so.1 (0x00007f8a74f98000)
>>> libz.so.1 => /lib64/libz.so.1 (0x00007f8a74d83000)
>>> libmpi.so.1 =>
>>> /home_nfs/papaureg/workspace/openmpi-1.6.2//current_AE2__2/x86_64_bullxlinux6.1.1/lib/libmpi.so.1(0x00007f8a74967000)
>>>
>>> libm.so.6 => /lib64/libm.so.6 (0x00007f8a746e3000)
>>> libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f8a744db000)
>>> libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f8a742c1000)
>>> libutil.so.1 => /lib64/libutil.so.1 (0x00007f8a740be000)
>>> libc.so.6 => /lib64/libc.so.6 (0x00007f8a73d2e000)
>>> /lib64/ld-linux-x86-64.so.2 (0x00000036b7400000)
>>> libimf.so =>
>>> /opt/intel/composer_xe_2013.1.117/compiler/lib/intel64/libimf.so
>>> (0x00007f8a73871000)
>>> libsvml.so =>
>>> /opt/intel/composer_xe_2013.1.117/compiler/lib/intel64/libsvml.so
>>> (0x00007f8a72fa3000)
>>> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8a72d8d000)
>>> libintlc.so.5 =>
>>> /opt/intel/composer_xe_2013.1.117/compiler/lib/intel64/libintlc.so.5
>>> (0x00007f8a72b3e000)
>>> #
>>> #
>>> # nm /home_nfs/botallam/install/darshan.4/lib/libdarshan.so|grep mpi
>>> 0000000000227020 B __real_ncmpi_close
>>> 0000000000227010 B __real_ncmpi_create
>>> 0000000000227018 B __real_ncmpi_open
>>> 00000000000048cc T darshan_mpi_initialize
>>> 00000000000179d2 T ncmpi_close
>>> 000000000001763c T ncmpi_create
>>> 0000000000017807 T ncmpi_open
>>> U ompi_mpi_byte
>>> U ompi_mpi_char
>>> U ompi_mpi_comm_world
>>> U ompi_mpi_double
>>> U ompi_mpi_info_null
>>> U ompi_mpi_int
>>> U ompi_mpi_long
>>> U ompi_mpi_op_land
>>> U ompi_mpi_op_lor
>>> U ompi_mpi_op_max
>>> U ompi_mpi_op_null
>>> U ompi_mpi_op_sum
>>> 0000000000016444 T resolve_mpi_symbols
>>> #
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Darshan-users mailing list
>>> Darshan-users at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>>
>>
>>
>> _______________________________________________
>> Darshan-users mailing list
>> Darshan-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
>
>
>
>
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20130110/2f3e0193/attachment-0001.html>
More information about the Darshan-users
mailing list