[MPICH] How do I dump a core under MPICH-2?
Jean-Marc Saffroy
saffroy at gmail.com
Tue Sep 25 21:56:13 CDT 2007
On Tue, 25 Sep 2007, Gus Correa wrote:
> After I inserted the 'limit coredumpsize unlimited' on my .tcshrc,
> mpiexec propagates this limit to the parallel job execution environment.
> However, this only happens if I run anything as a tcsh command.
Actually you don't "propagate" the limit, rather you reset it to a fixed
value (unlimited) in all new tcsh instances.
> By contrast, if I don't explicitly invoke tcsh, I get:
>
> 32-pokey% mpiexec -n 1 limit
> problem with execution of limit on pokey: [Errno 2] No such file or
> directory
This is because mpiexec/mpd try to run an external command "limit", which
does not exist: it's a shell builtin, and it has to be.
> When I launch it directly from mpiexec no core dump is produced:
>
> 66-pokey% mpiexec -n 1 wrong_hellow
[...]
> However, as you clarified, if I launch it as a (tc)shell command, with "limit
> coredumpsize unlimit" set,
> I do get a core dump:
>
> 68-pokey% grep coredumpsize ~/.tcshrc
> limit coredumpsize unlimited
>
> 69-pokey% mpiexec -n 1 tcsh -c 'wrong_hellow'
> Hello world from process 0 of 1
> Segmentation fault (core dumped)
Setting the limit in your tcshrc can be tedious, and some day you may
leave it while a big job crashes, which can be painful. For me, a better
alternative is to set the limit only for the job you want to debug, eg:
% mpiexec -n 1 tcsh -c 'limit coredumpsize unlimited; wrong_hellow'
--
saffroy at gmail.com
More information about the mpich-discuss
mailing list