[MPICH] How do I dump a core under MPICH-2?

Gus Correa gus at ldeo.columbia.edu
Tue Sep 25 11:09:05 CDT 2007


Hello Robert Latham (and mpich-discuss list)

Thank you for your prompt answer, help, and good humor!

OK, I inserted "limit coredumpsize unlimited" and "limit stacksize 
unlimited"
in my .tcshrc file, and sourced it.
(Sorry, but I can't survive in the bash world.)
I thought mpiexec would pass my login shell environment to the execution 
shell.
However, I still don't get a core dump after the program fails.

As a clarification, I am launching the program on a single processor/core:
"mpiexec -n 1 program_name > log_file".
If this sounds awkward to you, all I can say is that this is exactly how 
the test case in the program
distribution was setup.
(Hopefully sends and recvs from/to a single process work in a trivial way.)
When I try to use more processors things get even worse, and the code 
fails earlier.

Any further thoughts on how to pass the coredumpsize across mpiexec?

Another clarification:  I am not using a cluster, but a dual-core 
dual-processor PC.
The MPICH communication device is ch3:sock,
but since the computer is a standalone PC,
I presume the actual communication is in shared memory (i.e. not through 
Ethernet or equivalent).

Thank you,
Gus Correa

Robert Latham wrote:

>On Tue, Sep 25, 2007 at 10:06:32AM -0400, Gus Correa wrote:
>  
>
>>The idea is to examine the core dump with gdb, and try to find out the 
>>point and reason of failure.
>>
>>How can  I get a core dump under MPICH-2?
>>    
>>
>
>I know this is possible: I get core dumps from MPICH2 all the time :>
>
>  
>
>>7) I unlimited the core dump size ("limit coredumpsize unlimited"),
>>but I still don't see any core in the run directory after the
>>program fails.
>>    
>>
>
>You may be enabling core dumps on one process but not all of them?
>Also there might be a difference between the environment given to an
>interactive shell, a login shell,  and that of a non-interactive shell.  
>
>What does 'ulimit -a' show you?  You might have to stick a 'ulimit -c
>unlimited' in your .zshenv or .bashrc
>
>In short, it's not an MPICH2 issue, but a distribution/linux issue.
>I'm afraid we don't have too many FC7 clusters to test on, so I can
>only offer you that advice above.
>
>==rob
>
>  
>




More information about the mpich-discuss mailing list