[MPICH] unmanaged disconnection from mpd ring

jliang at arb.ca.gov jliang at arb.ca.gov
Wed May 16 10:50:23 CDT 2007


The problem was not reproducible, and the disconnected nodes varied.  My job went on okay last night without any problem. :)

Thank you for your help on the analysis, anyway.
With best regards,
Paul

----- Original Message -----
From: Matthew Chambers <matthew.chambers at vanderbilt.edu>
Date: Wednesday, May 16, 2007 8:15 am
Subject: RE: RE: [MPICH] unmanaged disconnection from mpd ring

> It won't help prevent the problem, it will determine if it is 
> reproduciblethough.  If the problem isn't reproducible, I have no 
> idea what happened to
> be honest.
> 
> > -----Original Message-----
> > From: jliang at arb.ca.gov [jliang at arb.ca.gov]
> > Sent: Wednesday, May 16, 2007 10:09 AM
> > To: Matthew Chambers
> > Cc: mpich-discuss at mcs.anl.gov
> > Subject: Re: RE: [MPICH] unmanaged disconnection from mpd ring
> > 
> > So you believe the problem was caused by the disconnected nodes 
> and/or> busy network. Could you explain how the burn-in process 
> will help prevent
> > the problem from happening in the future ?  Thanks.
> > 
> > With best regards,
> > Paul
> > 
> > ----- Original Message -----
> > From: Matthew Chambers <matthew.chambers at vanderbilt.edu>
> > Date: Tuesday, May 15, 2007 1:07 pm
> > Subject: RE: [MPICH] unmanaged disconnection from mpd ring
> > 
> > > I suggest doing an mpdringtest for an absurdly large number of
> > > loops as a
> > > kind of burn-in for your nodes and/or network.
> 
> 
> 




More information about the mpich-discuss mailing list