[MPICH] core dumps MPICH & Linux

Wolfram Brenig w.brenig at tu-bs.de
Thu Oct 26 11:11:07 CDT 2006


Let me be more precise.

I have no problem in running code on the
heterogeneous system. (I can also reduce
the MPI-ring to just a homogeneous section
of the cluster ... to be sure.)

What I want to do is, to get the core dump
from the slaves of a master/slave type of
MPI-code for debugging purpose in a particular
case.

Now, when I run the slaves as standalone
processes I can get core dumps from them.
But when I run them as MPI processes they
do not produce any core dump files.

I have set:

$> ulimit -c unlimited

in the .profile and .bashrc and when I do:

$> ssh node-whatever ulimit -c

I get:

$> unlimited

for any node-whatever of the cluster.
I checked for the core files in the directory which I
get when I do:

$> ssh node-whatever pwd

but I also searched over the whole home
file system ... there is no core

Any suggestion what I might be missing.


Wolfram


Darius Buntinas wrote:
> Note that MPICH2 does not (yet) run on heterogeneous clusters.  If you're
> getting crashes, this may be why.
> 
> Try running
>   ulimit -c
> using mpiexec (as if it were an mpi program).  That will show you what the
> limit is actually set at on each node.
> 
> -d
> 
> 
> On Thu, 26 Oct 2006, Wolfram Brenig wrote:
> 
>> I'm trying to force core dumps on
>> a heterogeneous linux cluster running
>> mpich2version: 1.0.2 and SuSE linux
>> versions 9.2 and 10.0.
>>
>> I have set "ulimit -c unlimited" on all
>> nodes.
>>
>> When I run non-parallel code I can get
>> core dumps. But no parallel program will
>> core dump.
>>
>> Any help, or hint where to get info
>> would be most appreciated.
>>
>> From searching the WWW I got the
>> impression that linux may not be able
>> to do core dumps with MPI. Is this so?
>>
>> Wolfram
>>
>>
>>




More information about the mpich-discuss mailing list