[MPICH] unmanaged disconnection from mpd ring

Matthew Chambers matthew.chambers at vanderbilt.edu
Wed May 16 10:15:09 CDT 2007


It won't help prevent the problem, it will determine if it is reproducible
though.  If the problem isn't reproducible, I have no idea what happened to
be honest.

> -----Original Message-----
> From: jliang at arb.ca.gov [mailto:jliang at arb.ca.gov]
> Sent: Wednesday, May 16, 2007 10:09 AM
> To: Matthew Chambers
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: RE: [MPICH] unmanaged disconnection from mpd ring
> 
> So you believe the problem was caused by the disconnected nodes and/or
> busy network. Could you explain how the burn-in process will help prevent
> the problem from happening in the future ?  Thanks.
> 
> With best regards,
> Paul
> 
> ----- Original Message -----
> From: Matthew Chambers <matthew.chambers at vanderbilt.edu>
> Date: Tuesday, May 15, 2007 1:07 pm
> Subject: RE: [MPICH] unmanaged disconnection from mpd ring
> 
> > I suggest doing an mpdringtest for an absurdly large number of
> > loops as a
> > kind of burn-in for your nodes and/or network.





More information about the mpich-discuss mailing list