[MPICH] mpirun timeout and killed by signal error for 64 processor option
Bala
cppbala at yahoo.com
Sat Mar 17 08:47:08 CDT 2007
Hi All,
we have installed mpich on 16 node Intel
X86_64
dual CPU and dual core cluster( blade servers).
when we try to run mpirun with cpi sample for
-np 32 option runs fine and gives the output also, but
after a while there is message like shown below
-----------------------------
pi is approximately 3.1416009869231249, Error is
0.0000083333333318
wall clock time = 0.003906
Timeout in waiting for processes to exit, 2 left.
This may be due to a defectie rsh program (Some
versions of Kerberos rsh have been observed to have
this problem).
This is not a problem with P4 or MPICH but a problem
with the operating
environment. For many applications, this problem will
only slow down process termination.
-----------------------------------
but when we try to run with -np 64 and above options
$mpirun -np 64 -machinefile machines ./cpi
we get fails with killed by signal 2 error, in our
other cluster we can run with -np 64 option.
pls let us know how to avoid these errors??
Is it cpi is too small for -np 64 option to run??
thanks in advance,
-bala-
____________________________________________________________________________________
Need Mail bonding?
Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
http://answers.yahoo.com/dir/?link=list&sid=396546091
More information about the mpich-discuss
mailing list