[MPICH] How do I dump a core under MPICH-2?

Gus Correa gus at ldeo.columbia.edu
Tue Sep 25 09:06:32 CDT 2007


Dear MPICH experts

MPICH-2 seems to have a nice feature of catching signal 11 (segmentation 
fault)
when the program goes beyond its assigned memory sandbox.
However, I would like to get a core dump of a large atmospheric model
that uses MPICH-2 and fails with signal 11, and I don't know how this 
can be done.
The idea is to examine the core dump with gdb, and try to find out the 
point and reason of failure.

How can  I get a core dump under MPICH-2?

Please, pardon me if this is not a genuine MPICH question, or just a 
silly question.

In case this information matters:

1) The computer is a 64-bit dual-core dual-processor  3GHz Intel Xeon 
and has 4GB of memory.
2) The OS is Linux 2.6.22.5-76.fc7 (Fedora Core 7).
3) The program, AM2.1 from GFDL (see http://www.gfdl.noaa.gov/fms/), is 
written in Fortran 90/95
(it may have a few serial functions in C).
4) The compiler is Intel 10.0.023.
5) The version of MPICH is 1.0.5p4, but I can upgrade to the latest 
greatest, if this is required.
6) Everything was compiled in 64-bit mode.
7) I unlimited the core dump size ("limit coredumpsize unlimited"), but 
I still don't see any
core in the run directory after the program fails.
8) The very last error message I get, apparently issued by MPICH-2 is:

rank 0 in job 71  my_computer_55576   caused collective abort of all ranks
  exit status of rank 0: killed by signal 11


Thank you very much for any help.

Gus Correa




More information about the mpich-discuss mailing list