[mpich-discuss] SIGx13 Intermittent error
Anthony Chan
chan at mcs.anl.gov
Thu Jun 11 10:20:38 CDT 2009
It looks like you are still using MPICH-1 which is a very old code.
MPICH2 is the replacement for MPICH-1 and is a lot more robust than MPICH-1,
e.g. diagnostic messages. MPICH2-1.1 was just released not long ago, I
strongly suggest you try the 1.1 instead.
In terms of the error in p4, you could try doing a google search on
"net_send: could not write to fd=4, errno = 32" ? BTW, can you run
a simple cpi program with your MPICH-1 installation ?
A.Chan
----- "Marc" <levesqm at emt.inrs.ca> wrote:
> Hi all,
>
> I'm running a simple MPICH program on a little cluster and from time
> to
> time I have this kind of error:
>
> rm_l_1_6096: p4_error: interrupt SIGx: 13
> p0_1281: p4_error: interrupt SIGx: 13
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> p0_1281: (16.272612) net_send: could not write to fd=4, errno = 32
>
> What are the possible causes for this kind of error? My code works
> well
> on my laptop with 2 CPUs and I don't see no trace of a bug in it...
>
> I would like to know if this error is in link with MPICH and the
> hardware/communications or the code in itself.
>
> Thank you.
>
> Marc
More information about the mpich-discuss
mailing list