[mpich-discuss] SIGx13 Intermittent error

Marc levesqm at emt.inrs.ca
Thu Jun 11 09:59:55 CDT 2009


Hi all,

I'm running a simple MPICH program on a little cluster and from time to
time I have this kind of error:

rm_l_1_6096:  p4_error: interrupt SIGx: 13
p0_1281:  p4_error: interrupt SIGx: 13
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
p0_1281: (16.272612) net_send: could not write to fd=4, errno = 32

What are the possible causes for this kind of error? My code works well
on my laptop with 2 CPUs and I don't see no trace of a bug in it... 

I would like to know if this error is in link with MPICH and the
hardware/communications or the code in itself.

Thank you.

Marc



More information about the mpich-discuss mailing list