[mpich-discuss] SIGx13 Intermittent error
Marc
levesqm at emt.inrs.ca
Thu Jun 11 09:59:55 CDT 2009
Hi all,
I'm running a simple MPICH program on a little cluster and from time to
time I have this kind of error:
rm_l_1_6096: p4_error: interrupt SIGx: 13
p0_1281: p4_error: interrupt SIGx: 13
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
p0_1281: (16.272612) net_send: could not write to fd=4, errno = 32
What are the possible causes for this kind of error? My code works well
on my laptop with 2 CPUs and I don't see no trace of a bug in it...
I would like to know if this error is in link with MPICH and the
hardware/communications or the code in itself.
Thank you.
Marc
More information about the mpich-discuss
mailing list