[MPICH] How do I dump a core under MPICH-2?
Gus Correa
gus at ldeo.columbia.edu
Tue Sep 25 11:09:05 CDT 2007
Hello Robert Latham (and mpich-discuss list)
Thank you for your prompt answer, help, and good humor!
OK, I inserted "limit coredumpsize unlimited" and "limit stacksize
unlimited"
in my .tcshrc file, and sourced it.
(Sorry, but I can't survive in the bash world.)
I thought mpiexec would pass my login shell environment to the execution
shell.
However, I still don't get a core dump after the program fails.
As a clarification, I am launching the program on a single processor/core:
"mpiexec -n 1 program_name > log_file".
If this sounds awkward to you, all I can say is that this is exactly how
the test case in the program
distribution was setup.
(Hopefully sends and recvs from/to a single process work in a trivial way.)
When I try to use more processors things get even worse, and the code
fails earlier.
Any further thoughts on how to pass the coredumpsize across mpiexec?
Another clarification: I am not using a cluster, but a dual-core
dual-processor PC.
The MPICH communication device is ch3:sock,
but since the computer is a standalone PC,
I presume the actual communication is in shared memory (i.e. not through
Ethernet or equivalent).
Thank you,
Gus Correa
Robert Latham wrote:
>On Tue, Sep 25, 2007 at 10:06:32AM -0400, Gus Correa wrote:
>
>
>>The idea is to examine the core dump with gdb, and try to find out the
>>point and reason of failure.
>>
>>How can I get a core dump under MPICH-2?
>>
>>
>
>I know this is possible: I get core dumps from MPICH2 all the time :>
>
>
>
>>7) I unlimited the core dump size ("limit coredumpsize unlimited"),
>>but I still don't see any core in the run directory after the
>>program fails.
>>
>>
>
>You may be enabling core dumps on one process but not all of them?
>Also there might be a difference between the environment given to an
>interactive shell, a login shell, and that of a non-interactive shell.
>
>What does 'ulimit -a' show you? You might have to stick a 'ulimit -c
>unlimited' in your .zshenv or .bashrc
>
>In short, it's not an MPICH2 issue, but a distribution/linux issue.
>I'm afraid we don't have too many FC7 clusters to test on, so I can
>only offer you that advice above.
>
>==rob
>
>
>
More information about the mpich-discuss
mailing list