[mpich-discuss] runtime segfault: mpich2-1.3.2 with pgi v11.5 on rhel5.6 system

Limin Gu lgu at penguincomputing.com
Thu May 26 12:46:34 CDT 2011


Hi Everybody,

I need some help on mpich2-1.3.2 with pgi v11.5 on rhel5.6 system.

Previously I have successfully compiled and run mpich2-1.3.2 with pgi
v10.0 on rhel5.6. Recently I upgraded the pgi compiler from v10.0 to
v11.5, and I followed the exact same steps to compile mpich2-1.3.2,
the compilation was fine, but I got segfault when I try to run mpiexec
or mpirun.

The steps are below:

[root at flatline mpich2-1.3.2]# env CC=pgcc FC=pgf90 F77=pgf77 CXX=pgCC
./configure --prefix=/home/lgu/mpich2_install --enable-shared
[root at flatline mpich2-1.3.2]# make
[root at flatline mpich2-1.3.2]# make install
[root at flatline mpich2-1.3.2]#
[root at flatline mpich2-1.3.2]# which mpiexec
/home/lgu/mpich2_install/bin/mpiexec
[root at flatline mpich2-1.3.2]# which mpicc
/home/lgu/mpich2_install/bin/mpicc
[root at flatline mpich2-1.3.2]# which pgcc
/usr/pgi/linux86-64/11.5/bin/pgcc
[root at flatline mpich2-1.3.2]#
[root at flatline mpich2-1.3.2]# cd examples/
[root at flatline examples]# mpicc -o cpi cpi.c
[root at flatline examples]# mpiexec -hosts master -np 1 ./cpi
Segmentation fault
[root at flatline examples]# mpiexec
Segmentation fault
[root at flatline examples]#

When I tried to run "strace mpiexec", it shows mmap tried to allocate
huge memory, and it failed.

open("/sys/devices/system/node", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 3
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
brk(0)                                  = 0x19125000
brk(0x1914e000)                         = 0x1914e000
getdents(3, /* 4 entries */, 32768)     = 112
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
mmap(NULL, 18446744073223036928, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 18446744073223168000, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 134217728, PROT_NONE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x2b4e88607000
munmap(0x2b4e88607000, 60788736)        = 0
munmap(0x2b4e90000000, 6320128)         = 0
mprotect(0x2b4e8c000000, 135168, PROT_READ|PROT_WRITE) = 0
mmap(NULL, 18446744073223036928, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++




I have also tried to configure with "CC=pgcc FC=pgfortran
F77=pgfortran CXX=pgcpp CFLAGS=-fast FCFLAGS=-fast FFLAGS=-fast
CXXFLAGS=-fast" like pgi guide suggests, I got the same segfault.

Does anybody know what possibly caused this problem?

BTW, this happens on rhel5.6, I have successfully built and run
mpich2-1.3.2 with the new pgi v11.5 on rhel4.9.


Thank you very much for any help on this!

Limin


More information about the mpich-discuss mailing list