[MPICH] mpirun timeout and killed by signal error for 64 processor option

Bala cppbala at yahoo.com
Sat Mar 17 08:47:08 CDT 2007


Hi All,
        we have installed mpich on 16 node Intel
X86_64
dual CPU and dual core cluster( blade servers).

  when we try to run mpirun with cpi sample for
-np 32 option runs fine and gives the output also, but

after a while there is message like shown below

-----------------------------
pi is approximately 3.1416009869231249, Error is
0.0000083333333318
wall clock time = 0.003906
Timeout in waiting for processes to exit, 2 left. 
This may be due to a defectie rsh program (Some
versions of Kerberos rsh have been observed to have
this problem).
This is not a problem with P4 or MPICH but a problem
with the operating
environment.  For many applications, this problem will
only slow down process termination.
-----------------------------------

but when we try to run with -np 64 and above options

$mpirun -np 64 -machinefile machines ./cpi
we get fails with killed by signal 2 error, in our
other cluster we can run with -np 64 option.

pls let us know how to avoid these errors??

Is it cpi is too small for -np 64 option to run??

thanks in advance,
-bala-




 
____________________________________________________________________________________
Need Mail bonding?
Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
http://answers.yahoo.com/dir/?link=list&sid=396546091




More information about the mpich-discuss mailing list