[mpich-discuss] SIGx13 Intermittent error

Anthony Chan chan at mcs.anl.gov
Thu Jun 11 10:20:38 CDT 2009


It looks like you are still using MPICH-1 which is a very old code.
MPICH2 is the replacement for MPICH-1 and is a lot more robust than MPICH-1,
e.g. diagnostic messages.  MPICH2-1.1 was just released not long ago, I
strongly suggest you try the 1.1 instead.

In terms of the error in p4, you could try doing a google search on
"net_send: could not write to fd=4, errno = 32" ?  BTW, can you run
a simple cpi program with your MPICH-1 installation ?

A.Chan


----- "Marc" <levesqm at emt.inrs.ca> wrote:

> Hi all,
> 
> I'm running a simple MPICH program on a little cluster and from time
> to
> time I have this kind of error:
> 
> rm_l_1_6096:  p4_error: interrupt SIGx: 13
> p0_1281:  p4_error: interrupt SIGx: 13
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> Killed by signal 2.
> p0_1281: (16.272612) net_send: could not write to fd=4, errno = 32
> 
> What are the possible causes for this kind of error? My code works
> well
> on my laptop with 2 CPUs and I don't see no trace of a bug in it... 
> 
> I would like to know if this error is in link with MPICH and the
> hardware/communications or the code in itself.
> 
> Thank you.
> 
> Marc


More information about the mpich-discuss mailing list