[MPICH] mpirun timeout and killed by signal-2 error for 64 processor option

Rajeev Thakur thakur at mcs.anl.gov
Mon Mar 19 10:49:32 CDT 2007


This error message is from MPICH-1 (because it says P4). Can you make sure
that you are compiling the program with the mpicc from MPICH2 and running it
with the mpiexec from MPICH2. Give the full paths to those scripts if
necessary.

Rajeev


> -----Original Message-----
> From: Bala [mailto:cppbala at yahoo.com] 
> Sent: Monday, March 19, 2007 1:35 AM
> To: Rajeev Thakur; mpich-discuss at mcs.anl.gov
> Subject: RE: [MPICH] mpirun timeout and killed by signal-2 
> error for 64 processor option
> 
> Thanks Rajeev, for the reply, we are using 
> rocks cluster-4.2.1 that comes with mpich2 by default.
> 
>  But still we are getting this error, we are using 
> HP blade servers BL460C is tere any known issues
> with blades??
> 
> thanks,
> -bala-
> 
> 
> --- Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> 
> > Can you try MPICH2 instead of MPICH-1? It is more
> > robust. cpi should run
> > with any number of processes.
> > 
> > Rajeev 
> > 
> > > -----Original Message-----
> > > From: owner-mpich-discuss at mcs.anl.gov 
> > > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf
> > Of Bala
> > > Sent: Saturday, March 17, 2007 8:47 AM
> > > To: mpich-discuss at mcs.anl.gov
> > > Subject: [MPICH] mpirun timeout and killed by
> > signal error 
> > > for 64 processor option
> > > 
> > > Hi All,
> > >         we have installed mpich on 16 node Intel
> > > X86_64
> > > dual CPU and dual core cluster( blade servers).
> > > 
> > >   when we try to run mpirun with cpi sample for
> > > -np 32 option runs fine and gives the output also,
> > but
> > > 
> > > after a while there is message like shown below
> > > 
> > > -----------------------------
> > > pi is approximately 3.1416009869231249, Error is
> > > 0.0000083333333318
> > > wall clock time = 0.003906
> > > Timeout in waiting for processes to exit, 2 left. 
> > > This may be due to a defectie rsh program (Some
> > > versions of Kerberos rsh have been observed to
> > have
> > > this problem).
> > > This is not a problem with P4 or MPICH but a
> > problem
> > > with the operating
> > > environment.  For many applications, this problem
> > will
> > > only slow down process termination.
> > > -----------------------------------
> > > 
> > > but when we try to run with -np 64 and above
> > options
> > > 
> > > $mpirun -np 64 -machinefile machines ./cpi
> > > we get fails with killed by signal 2 error, in our
> > > other cluster we can run with -np 64 option.
> > > 
> > > pls let us know how to avoid these errors??
> > > 
> > > Is it cpi is too small for -np 64 option to run??
> > > 
> > > thanks in advance,
> > > -bala-
> > > 
> > > 
> > > 
> > > 
> > >  
> > >
> >
> ______________________________________________________________
> > > ______________________
> > > Need Mail bonding?
> > > Go to the Yahoo! Mail Q&A for great tips from
> > Yahoo! Answers users.
> > >
> >
> http://answers.yahoo.com/dir/?link=list&sid=396546091
> > > 
> > > 
> > 
> > 
> 
> 
> 
> 
>  
> ______________________________________________________________
> ______________________
> Need Mail bonding?
> Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.
> http://answers.yahoo.com/dir/?link=list&sid=396546091
> 
> 




More information about the mpich-discuss mailing list