[MPICH] unmanaged disconnection from mpd ring
jliang at arb.ca.gov
jliang at arb.ca.gov
Wed May 16 10:50:23 CDT 2007
The problem was not reproducible, and the disconnected nodes varied. My job went on okay last night without any problem. :)
Thank you for your help on the analysis, anyway.
With best regards,
Paul
----- Original Message -----
From: Matthew Chambers <matthew.chambers at vanderbilt.edu>
Date: Wednesday, May 16, 2007 8:15 am
Subject: RE: RE: [MPICH] unmanaged disconnection from mpd ring
> It won't help prevent the problem, it will determine if it is
> reproduciblethough. If the problem isn't reproducible, I have no
> idea what happened to
> be honest.
>
> > -----Original Message-----
> > From: jliang at arb.ca.gov [jliang at arb.ca.gov]
> > Sent: Wednesday, May 16, 2007 10:09 AM
> > To: Matthew Chambers
> > Cc: mpich-discuss at mcs.anl.gov
> > Subject: Re: RE: [MPICH] unmanaged disconnection from mpd ring
> >
> > So you believe the problem was caused by the disconnected nodes
> and/or> busy network. Could you explain how the burn-in process
> will help prevent
> > the problem from happening in the future ? Thanks.
> >
> > With best regards,
> > Paul
> >
> > ----- Original Message -----
> > From: Matthew Chambers <matthew.chambers at vanderbilt.edu>
> > Date: Tuesday, May 15, 2007 1:07 pm
> > Subject: RE: [MPICH] unmanaged disconnection from mpd ring
> >
> > > I suggest doing an mpdringtest for an absurdly large number of
> > > loops as a
> > > kind of burn-in for your nodes and/or network.
>
>
>
More information about the mpich-discuss
mailing list