[MPICH] problem migrating from MPICH1 to MPICH2

Christian Zemlin zemlinc at upstate.edu
Tue May 22 21:24:06 CDT 2007


Dear MPICH - Experts

I just set up a Beowulf cluster (16 Dual cores @ 2.4 GHz), and is running, but I have a problem using the my MPICH programs.  My programs were developed on an older cluster that had MPICH1 and they run fine there.  They also run for some time on the new cluster, but eventually they terminate with a message like:

"rank 2 in job 120 master_4268  caused collective abort of all ranks exit status of rank 2: killed by signal 11"

This type of error occurs at a different point of the simulation every time I run it.  There is no problem if I use only one master and one slave node.

Do you have any suggestion what might be the problem?

Thank you and best wishes,

Christian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070522/a87301b1/attachment.htm>


More information about the mpich-discuss mailing list