[MPICH] mpirun timeout and killed by signal error for 64 processor option
Rajeev Thakur
thakur at mcs.anl.gov
Sat Mar 17 10:34:16 CDT 2007
Can you try MPICH2 instead of MPICH-1? It is more robust. cpi should run
with any number of processes.
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Bala
> Sent: Saturday, March 17, 2007 8:47 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] mpirun timeout and killed by signal error
> for 64 processor option
>
> Hi All,
> we have installed mpich on 16 node Intel
> X86_64
> dual CPU and dual core cluster( blade servers).
>
> when we try to run mpirun with cpi sample for
> -np 32 option runs fine and gives the output also, but
>
> after a while there is message like shown below
>
> -----------------------------
> pi is approximately 3.1416009869231249, Error is
> 0.0000083333333318
> wall clock time = 0.003906
> Timeout in waiting for processes to exit, 2 left.
> This may be due to a defectie rsh program (Some
> versions of Kerberos rsh have been observed to have
> this problem).
> This is not a problem with P4 or MPICH but a problem
> with the operating
> environment. For many applications, this problem will
> only slow down process termination.
> -----------------------------------
>
> but when we try to run with -np 64 and above options
>
> $mpirun -np 64 -machinefile machines ./cpi
> we get fails with killed by signal 2 error, in our
> other cluster we can run with -np 64 option.
>
> pls let us know how to avoid these errors??
>
> Is it cpi is too small for -np 64 option to run??
>
> thanks in advance,
> -bala-
>
>
>
>
>
> ______________________________________________________________
> ______________________
> Need Mail bonding?
> Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
> http://answers.yahoo.com/dir/?link=list&sid=396546091
>
>
More information about the mpich-discuss
mailing list