[mpich-discuss] runtime segfault: mpich2-1.3.2 with pgi v11.5 on rhel5.6 system
Dave Goodell
goodell at mcs.anl.gov
Thu May 26 15:38:08 CDT 2011
Hmm.. I have no idea what's going on then. Do other programs compiled with the newer compiler work for you?
-Dave
On May 26, 2011, at 3:21 PM CDT, Limin Gu wrote:
> Thanks Dave!
>
> I tried "HYDRA_BINDLIB=bogus mpiexec", it still segfaults :(
>
> I reconfigure and rebuild with "CFLAGS=-g", here is "gdb mpiexec" bt output:
>
> (gdb) run
> Starting program: /home/lgu/mpich2_install/bin/mpiexec
> warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000
> [Thread debugging using libthread_db enabled]
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000003ef10e72bf in __vsnprintf_chk () from /lib64/libc.so.6
> (gdb) bt
> #0 0x0000003ef10e72bf in __vsnprintf_chk () from /lib64/libc.so.6
> #1 0x0000003ef10e722b in __snprintf_chk () from /lib64/libc.so.6
> #2 0x0000003ef0c0d1bb in call_init () from /lib64/ld-linux-x86-64.so.2
> #3 0x0000003ef0c0d2c5 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
> #4 0x0000003ef0c00aaa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
> #5 0x0000000000000001 in ?? ()
> #6 0x00007fffffffeae0 in ?? ()
> #7 0x0000000000000000 in ?? ()
> (gdb)
>
>
> Thank you!
>
> Limin
>
> > Can you "gdb mpiexec" and find us a stack trace for the failing mmap? You may need to reconfigure and rebuild with "CFLAGS=-g" in order to get meaningful information from the debugger? That value (18446744073223036928) is suspicious, it's 0xFFFFFFFFE3006000 in hex or -486,514,688 decimal if interpreted as a signed value instead. It may be that the compiler or the code is doing some math incorrectly on size_t types.
> >
> > AFAIK hydra does not mprotect at all, so if that mmap is coming from the same place then this error may be happening in a non-MPICH2 library.
> >
> > We do mmap in hydra indirectly in the hwloc package, in a fashion consistent with your strace output, and we have definitely had problems with PGI+hwloc in the past. You might try running "HYDRA_BINDLIB=bogus mpiexec" to see if disabling hwloc will avoid the segfault. If it does, you should be able to reconfigure and rebuild MPICH2 using "--without-hydra-bindlib" to get a working MPICH2, but lacking built-in process binding functionality.
> >
> > -Dave
> >
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list