[MPICH] mpirun timeout and killed by signal error for 64 processor option

Rajeev Thakur thakur at mcs.anl.gov
Sat Mar 17 10:34:16 CDT 2007


Can you try MPICH2 instead of MPICH-1? It is more robust. cpi should run
with any number of processes.

Rajeev 

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Bala
> Sent: Saturday, March 17, 2007 8:47 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] mpirun timeout and killed by signal error 
> for 64 processor option
> 
> Hi All,
>         we have installed mpich on 16 node Intel
> X86_64
> dual CPU and dual core cluster( blade servers).
> 
>   when we try to run mpirun with cpi sample for
> -np 32 option runs fine and gives the output also, but
> 
> after a while there is message like shown below
> 
> -----------------------------
> pi is approximately 3.1416009869231249, Error is
> 0.0000083333333318
> wall clock time = 0.003906
> Timeout in waiting for processes to exit, 2 left. 
> This may be due to a defectie rsh program (Some
> versions of Kerberos rsh have been observed to have
> this problem).
> This is not a problem with P4 or MPICH but a problem
> with the operating
> environment.  For many applications, this problem will
> only slow down process termination.
> -----------------------------------
> 
> but when we try to run with -np 64 and above options
> 
> $mpirun -np 64 -machinefile machines ./cpi
> we get fails with killed by signal 2 error, in our
> other cluster we can run with -np 64 option.
> 
> pls let us know how to avoid these errors??
> 
> Is it cpi is too small for -np 64 option to run??
> 
> thanks in advance,
> -bala-
> 
> 
> 
> 
>  
> ______________________________________________________________
> ______________________
> Need Mail bonding?
> Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
> http://answers.yahoo.com/dir/?link=list&sid=396546091
> 
> 




More information about the mpich-discuss mailing list