[MPICH] How do I dump a core under MPICH-2?
Gus Correa
gus at ldeo.columbia.edu
Tue Sep 25 09:06:32 CDT 2007
Dear MPICH experts
MPICH-2 seems to have a nice feature of catching signal 11 (segmentation
fault)
when the program goes beyond its assigned memory sandbox.
However, I would like to get a core dump of a large atmospheric model
that uses MPICH-2 and fails with signal 11, and I don't know how this
can be done.
The idea is to examine the core dump with gdb, and try to find out the
point and reason of failure.
How can I get a core dump under MPICH-2?
Please, pardon me if this is not a genuine MPICH question, or just a
silly question.
In case this information matters:
1) The computer is a 64-bit dual-core dual-processor 3GHz Intel Xeon
and has 4GB of memory.
2) The OS is Linux 2.6.22.5-76.fc7 (Fedora Core 7).
3) The program, AM2.1 from GFDL (see http://www.gfdl.noaa.gov/fms/), is
written in Fortran 90/95
(it may have a few serial functions in C).
4) The compiler is Intel 10.0.023.
5) The version of MPICH is 1.0.5p4, but I can upgrade to the latest
greatest, if this is required.
6) Everything was compiled in 64-bit mode.
7) I unlimited the core dump size ("limit coredumpsize unlimited"), but
I still don't see any
core in the run directory after the program fails.
8) The very last error message I get, apparently issued by MPICH-2 is:
rank 0 in job 71 my_computer_55576 caused collective abort of all ranks
exit status of rank 0: killed by signal 11
Thank you very much for any help.
Gus Correa
More information about the mpich-discuss
mailing list