[MPICH] How do I dump a core under MPICH-2?

Jean-Marc Saffroy saffroy at gmail.com
Tue Sep 25 21:56:13 CDT 2007


On Tue, 25 Sep 2007, Gus Correa wrote:

> After I inserted the 'limit coredumpsize unlimited' on my .tcshrc,
> mpiexec propagates this limit to the parallel job execution environment.
> However, this only happens if I run anything as a tcsh command.

Actually you don't "propagate" the limit, rather you reset it to a fixed 
value (unlimited) in all new tcsh instances.

> By contrast, if I don't explicitly invoke tcsh, I get:
>
> 32-pokey% mpiexec -n 1 limit
> problem with execution of limit  on  pokey:  [Errno 2] No such file or 
> directory

This is because mpiexec/mpd try to run an external command "limit", which 
does not exist: it's a shell builtin, and it has to be.

> When I launch it directly from mpiexec no core dump is produced:
>
> 66-pokey% mpiexec -n 1 wrong_hellow
[...]
> However, as you clarified, if I launch it as a (tc)shell command, with "limit 
> coredumpsize unlimit" set,
> I do get a core dump:
>
> 68-pokey% grep coredumpsize ~/.tcshrc
> limit coredumpsize unlimited
>
> 69-pokey% mpiexec -n 1 tcsh -c 'wrong_hellow'
> Hello world from process 0 of 1
> Segmentation fault (core dumped)

Setting the limit in your tcshrc can be tedious, and some day you may 
leave it while a big job crashes, which can be painful. For me, a better 
alternative is to set the limit only for the job you want to debug, eg:

% mpiexec -n 1 tcsh -c 'limit coredumpsize unlimited; wrong_hellow'

-- 
saffroy at gmail.com




More information about the mpich-discuss mailing list