[Darshan-users] LD_PRELOAD fails with OpenMPI

Phil Carns carns at mcs.anl.gov
Wed Jan 2 12:14:47 CST 2013


Hi Myriam,

Thank you for the detailed bug report.  We'll try to reproduce this and 
get back to you.  I assume that Open MPI is configured to use IB in this 
environment?

I think the issue here is that Open MPI very early on is setting up its 
own wrappers for malloc, and it happens to make a stat() or fstat() call 
as part of that process.  This is problematic because Darshan wants to 
intercept the stat() calls, but it needs malloc working (as part of the 
symbol resolution process) before it can intercept any functions via 
LD_PRELOAD.  I'm not yet sure how to handle this but we'll have a look 
at it.

-Phil


On 12/28/2012 06:00 AM, myriam.botalla at bull.net wrote:
> Hi,
> When I use LD_PRELOAD to get an application instrumented with Darshan 
> at run time, a SEGMENTATION FAULT raises.
> Also, the behaviour is the same when simply running mpicc to get the 
> version - and the generated coredump displays the same stack - which 
> seems to point the MPI wrappers as being the potential suspect.
>
> # which mpicc
> /home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.1/bin/mpicc 
>
> # mpicc --showme:version
> Erreur de segmentation (core dumped)
> # gdb mpicc core.14109
> GNU gdb (GDB) bullx Linux (7.2-50.bl6.Bull.1.20120306)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> Reading symbols from 
> /home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.1/bin/mpicc...done.
> [New Thread 14109]
> Missing separate debuginfo for 
> /home_nfs/botallam/install/darshan.4/lib/libdarshan.so
> ....
> Reading symbols from 
> /home_nfs/papaureg/workspace/openmpi-1.6.2//current_AE2__2/x86_64_bullxlinux6.1.1/lib/libmpi.so.1...done.
> Loaded symbols for 
> /home_nfs/papaureg/workspace/openmpi-1.6.2//current_AE2__2/x86_64_bullxlinux6.1.1/lib/libmpi.so.1
> Core was generated by 
> `/home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00000036b7879446 in calloc () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install 
> glibc-2.12-1.47.bl6_2.9.x86_64 libgcc-4.4.5-6.bl6.x86_64 
> numactl-2.0.3-9.bl6.x86_64 zlib-1.2.3-25.bl6.x86_64
> (gdb) where
> #0  0x00000036b7879446 in calloc () from /lib64/libc.so.6
> #1  0x00000036b7c01310 in _dlerror_run () from /lib64/libdl.so.2
> #2  0x00000036b7c0107a in dlsym () from /lib64/libdl.so.2
> #3  0x00007f50f98c3487 in __xstat (vers=1, path=0x7f50f966ffac 
> "/dev/ummunotify", buf=0x7fff6b6f08c0) at lib/darshan-posix.c:711
> #4  0x00007f50f9661a64 in opal_memory_linux_malloc_init_hook () at 
> hooks.c:756
> #5  0x00000036b7875b63 in ptmalloc_init () from /lib64/libc.so.6
> #6  0x00000036b7879987 in malloc_hook_ini () from /lib64/libc.so.6
> #7  0x00000036b78a6da1 in __alloc_dir () from /lib64/libc.so.6
> #8  0x00000036b94053cd in ?? () from /usr/lib64/libnuma.so.1
> #9  0x00000036b740e515 in _dl_init_internal () from 
> /lib64/ld-linux-x86-64.so.2
> #10 0x00000036b7400b3a in _dl_start_user () from 
> /lib64/ld-linux-x86-64.so.2
> #11 0x0000000000000002 in ?? ()
> #12 0x00007fff6b6f1a43 in ?? ()
> #13 0x00007fff6b6f1a9e in ?? ()
> #14 0x0000000000000000 in ?? ()
> (gdb)
>
> Can someone help me to understand the issue?
> Thanks,
> Myriam.
>
>
>
> HERE IS the environment:
>
> LD_PRELOAD=/home_nfs/botallam/install/darshan.4/lib/libdarshan.so
>
> The Darshan library was configured and generated with environment 
> variable OMPI_CC=gcc
> using mpicc: Open MPI 1.6.2 (Language: C)
>
> # which mpicc
> /home_nfs/papaureg/workspace/openmpi-1.6.2/current_AE2__2/x86_64_bullxlinux6.1.1/bin/mpicc 
>
> # CC=mpicc CFLAGS=-g ./configure 
> --prefix=/home_nfs/botallam/install/darshan.4 --with-mem-align=16 
> --with-log-path-by-env=DARSHAN_LOGPATH --with-jobid-env=SLURM_JOBID
>
> # ldd /home_nfs/botallam/install/darshan.4/lib/libdarshan.so
>         linux-vdso.so.1 =>  (0x00007fffe9e93000)
>         libdl.so.2 => /lib64/libdl.so.2 (0x00007f8a753bd000)
>         libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8a751a0000)
>         librt.so.1 => /lib64/librt.so.1 (0x00007f8a74f98000)
>         libz.so.1 => /lib64/libz.so.1 (0x00007f8a74d83000)
>         libmpi.so.1 => 
> /home_nfs/papaureg/workspace/openmpi-1.6.2//current_AE2__2/x86_64_bullxlinux6.1.1/lib/libmpi.so.1(0x00007f8a74967000) 
>
>         libm.so.6 => /lib64/libm.so.6 (0x00007f8a746e3000)
>         libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f8a744db000)
>         libnsl.so.1 => /lib64/libnsl.so.1 (0x00007f8a742c1000)
>         libutil.so.1 => /lib64/libutil.so.1 (0x00007f8a740be000)
>         libc.so.6 => /lib64/libc.so.6 (0x00007f8a73d2e000)
> /lib64/ld-linux-x86-64.so.2 (0x00000036b7400000)
>         libimf.so => 
> /opt/intel/composer_xe_2013.1.117/compiler/lib/intel64/libimf.so 
> (0x00007f8a73871000)
>         libsvml.so => 
> /opt/intel/composer_xe_2013.1.117/compiler/lib/intel64/libsvml.so 
> (0x00007f8a72fa3000)
>         libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8a72d8d000)
>         libintlc.so.5 => 
> /opt/intel/composer_xe_2013.1.117/compiler/lib/intel64/libintlc.so.5 
> (0x00007f8a72b3e000)
> #
> #
> # nm /home_nfs/botallam/install/darshan.4/lib/libdarshan.so|grep mpi
> 0000000000227020 B __real_ncmpi_close
> 0000000000227010 B __real_ncmpi_create
> 0000000000227018 B __real_ncmpi_open
> 00000000000048cc T darshan_mpi_initialize
> 00000000000179d2 T ncmpi_close
> 000000000001763c T ncmpi_create
> 0000000000017807 T ncmpi_open
>      U ompi_mpi_byte
>      U ompi_mpi_char
>      U ompi_mpi_comm_world
>      U ompi_mpi_double
>      U ompi_mpi_info_null
>      U ompi_mpi_int
>      U ompi_mpi_long
>      U ompi_mpi_op_land
>      U ompi_mpi_op_lor
>      U ompi_mpi_op_max
>      U ompi_mpi_op_null
>      U ompi_mpi_op_sum
> 0000000000016444 T resolve_mpi_symbols
> #
>
>
>
>
>
> _______________________________________________
> Darshan-users mailing list
> Darshan-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/darshan-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20130102/6bd3b241/attachment.html>


More information about the Darshan-users mailing list