[mpich-discuss] runtime segfault: mpich2-1.3.2 with pgi v11.5 on rhel5.6 system

Dave Goodell goodell at mcs.anl.gov
Thu May 26 15:38:08 CDT 2011


Hmm.. I have no idea what's going on then.  Do other programs compiled with the newer compiler work for you?

-Dave

On May 26, 2011, at 3:21 PM CDT, Limin Gu wrote:

> Thanks Dave!
> 
> I tried "HYDRA_BINDLIB=bogus mpiexec", it still segfaults :(
> 
> I reconfigure and rebuild with "CFLAGS=-g", here is "gdb mpiexec" bt output:
> 
> (gdb) run
> Starting program: /home/lgu/mpich2_install/bin/mpiexec
> warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000
> [Thread debugging using libthread_db enabled]
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000003ef10e72bf in __vsnprintf_chk () from /lib64/libc.so.6
> (gdb) bt
> #0  0x0000003ef10e72bf in __vsnprintf_chk () from /lib64/libc.so.6
> #1  0x0000003ef10e722b in __snprintf_chk () from /lib64/libc.so.6
> #2  0x0000003ef0c0d1bb in call_init () from /lib64/ld-linux-x86-64.so.2
> #3  0x0000003ef0c0d2c5 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
> #4  0x0000003ef0c00aaa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
> #5  0x0000000000000001 in ?? ()
> #6  0x00007fffffffeae0 in ?? ()
> #7  0x0000000000000000 in ?? ()
> (gdb) 
> 
> 
> Thank you!
> 
> Limin
> 
> > Can you "gdb mpiexec" and find us a stack trace for the failing mmap?  You may need to reconfigure and rebuild with "CFLAGS=-g" in order to get meaningful information from the debugger?  That value (18446744073223036928) is suspicious, it's 0xFFFFFFFFE3006000 in hex or -486,514,688 decimal if interpreted as a signed value instead.  It may be that the compiler or the code is doing some math incorrectly on size_t types.
> >
> > AFAIK hydra does not mprotect at all, so if that mmap is coming from the same place then this error may be happening in a non-MPICH2 library.
> >
> > We do mmap in hydra indirectly in the hwloc package, in a fashion consistent with your strace output, and we have definitely had problems with PGI+hwloc in the past.  You might try running "HYDRA_BINDLIB=bogus mpiexec" to see if disabling hwloc will avoid the segfault.  If it does, you should be able to reconfigure and rebuild MPICH2 using "--without-hydra-bindlib" to get a working MPICH2, but lacking built-in process binding functionality.
> >
> > -Dave
> >
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list