[mpich-discuss] runtime segfault: mpich2-1.3.2 with pgi v11.5 on rhel5.6 system
Limin Gu
lgu at penguincomputing.com
Thu May 26 15:21:30 CDT 2011
Thanks Dave!
I tried "HYDRA_BINDLIB=bogus mpiexec", it still segfaults :(
I reconfigure and rebuild with "CFLAGS=-g", here is "gdb mpiexec" bt output:
(gdb) run
Starting program: /home/lgu/mpich2_install/bin/mpiexec
warning: no loadable sections found in added symbol-file system-supplied DSO
at 0x2aaaaaaab000
[Thread debugging using libthread_db enabled]
Program received signal SIGSEGV, Segmentation fault.
0x0000003ef10e72bf in __vsnprintf_chk () from /lib64/libc.so.6
(gdb) bt
#0 0x0000003ef10e72bf in __vsnprintf_chk () from /lib64/libc.so.6
#1 0x0000003ef10e722b in __snprintf_chk () from /lib64/libc.so.6
#2 0x0000003ef0c0d1bb in call_init () from /lib64/ld-linux-x86-64.so.2
#3 0x0000003ef0c0d2c5 in _dl_init_internal () from
/lib64/ld-linux-x86-64.so.2
#4 0x0000003ef0c00aaa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#5 0x0000000000000001 in ?? ()
#6 0x00007fffffffeae0 in ?? ()
#7 0x0000000000000000 in ?? ()
(gdb)
Thank you!
Limin
> Can you "gdb mpiexec" and find us a stack trace for the failing mmap? You
may need to reconfigure and rebuild with "CFLAGS=-g" in order to get
meaningful information from the debugger? That value (18446744073223036928)
is suspicious, it's 0xFFFFFFFFE3006000 in hex or -486,514,688 decimal if
interpreted as a signed value instead. It may be that the compiler or the
code is doing some math incorrectly on size_t types.
>
> AFAIK hydra does not mprotect at all, so if that mmap is coming from the
same place then this error may be happening in a non-MPICH2 library.
>
> We do mmap in hydra indirectly in the hwloc package, in a fashion
consistent with your strace output, and we have definitely had problems with
PGI+hwloc in the past. You might try running "HYDRA_BINDLIB=bogus mpiexec"
to see if disabling hwloc will avoid the segfault. If it does, you should
be able to reconfigure and rebuild MPICH2 using "--without-hydra-bindlib" to
get a working MPICH2, but lacking built-in process binding functionality.
>
> -Dave
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110526/57623f45/attachment.htm>
More information about the mpich-discuss
mailing list