[mpich-discuss] Hydra issues
Rajeev Thakur
thakur at mcs.anl.gov
Wed Aug 26 15:29:01 CDT 2009
Which version of MPICH2 are you using? There was a scalability bug in
MPI_Init with Nemesis and MPD in 1.1 that has been fixed in 1.1.1.
Rajeev
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Scott Atchley
> Sent: Wednesday, August 26, 2009 3:24 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Hydra issues
>
> Rusty,
>
> 1024 (128 nodes, 8 ppn).
>
> Scott
>
> On Aug 26, 2009, at 4:17 PM, Rusty Lusk wrote:
>
> > Ah. I assume that we still use a lazy connect, so
> MPI_Init should
> > be fast, but then allgatherv would indeed cause a lot of
> > connections, depending on choice of allgatherv implementation (how
> > many?)
> >
> > On Wednesday,Aug 26, 2009, at 3:09 PM, Scott Atchley wrote:
> >
> >> No, the ring starts fast enough. It is connecting 1024 processes
> >> that is slow (allgatherv).
> >>
> >> By contrast, Intel MPI launched in < 10 seconds.
> >>
> >> Scott
> >>
> >> On Aug 26, 2009, at 4:07 PM, Rusty Lusk wrote:
> >>
> >>> I assume that you mean launching the MPD ring is slow. Once the
> >>> MPD ring is up, launching should be quick. The original
> idea was
> >>> that the MPD ring would be persistent across jobs, even from
> >>> different people, as long as the jobs used the same nodes.
> >>>
> >>> Rusty
> >>>
> >>> On Wednesday,Aug 26, 2009, at 2:45 PM, Scott Atchley wrote:
> >>>
> >>>> On Aug 26, 2009, at 3:39 PM, Pavan Balaji wrote:
> >>>>
> >>>>>>> However you could use one of the various workarounds
> for this,
> >>>>>>> such as an LD_PRELOADed setvbuf call:
> http://lists.gnu.org/archive/html/bug-coreutils/2008-11/msg00164.html
> >>>>>> This does not change the behavior.
> >>>>>> I am still stumped as to why there is no delay when using
> >>>>>> persistent (launch-mode=2) versus a delay with no proxies
> >>>>>> (launch-mode=1).
> >>>>>
> >>>>> This works for me. We need to figure out how to make this
> >>>>> portable now.
> >>>>>
> >>>>> -- Pavan
> >>>>
> >>>> Thanks for your persistence (no pun intended).
> >>>>
> >>>> When running with 1,024 ranks, launching via MPD can
> take several
> >>>> minutes. I am assuming that hydra will launch in seconds.
> >>>>
> >>>> Scott
> >>>
> >>
> >
>
>
More information about the mpich-discuss
mailing list