[mpich-discuss] Hydra issues

Rajeev Thakur thakur at mcs.anl.gov
Wed Aug 26 15:29:01 CDT 2009


Which version of MPICH2 are you using? There was a scalability bug in
MPI_Init with Nemesis and MPD in 1.1 that has been fixed in 1.1.1. 

Rajeev


> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Scott Atchley
> Sent: Wednesday, August 26, 2009 3:24 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Hydra issues
> 
> Rusty,
> 
> 1024 (128 nodes, 8 ppn).
> 
> Scott
> 
> On Aug 26, 2009, at 4:17 PM, Rusty Lusk wrote:
> 
> > Ah.   I assume that we still use a lazy connect, so 
> MPI_Init should  
> > be fast, but then allgatherv would indeed cause a lot of  
> > connections, depending on choice of allgatherv implementation (how  
> > many?)
> >
> > On Wednesday,Aug 26, 2009, at 3:09 PM, Scott Atchley wrote:
> >
> >> No, the ring starts fast enough. It is connecting 1024 processes  
> >> that is slow (allgatherv).
> >>
> >> By contrast, Intel MPI launched in < 10 seconds.
> >>
> >> Scott
> >>
> >> On Aug 26, 2009, at 4:07 PM, Rusty Lusk wrote:
> >>
> >>> I assume that you mean launching the MPD ring is slow.  Once the  
> >>> MPD ring is up, launching should be quick.   The original 
> idea was  
> >>> that the MPD ring would be persistent across jobs, even from  
> >>> different people, as long as the jobs used the same nodes.
> >>>
> >>> Rusty
> >>>
> >>> On Wednesday,Aug 26, 2009, at 2:45 PM, Scott Atchley wrote:
> >>>
> >>>> On Aug 26, 2009, at 3:39 PM, Pavan Balaji wrote:
> >>>>
> >>>>>>> However you could use one of the various workarounds 
> for this,  
> >>>>>>> such as an LD_PRELOADed setvbuf call: 
> http://lists.gnu.org/archive/html/bug-coreutils/2008-11/msg00164.html
> >>>>>> This does not change the behavior.
> >>>>>> I am still stumped as to why there is no delay when using  
> >>>>>> persistent (launch-mode=2) versus a delay with no proxies  
> >>>>>> (launch-mode=1).
> >>>>>
> >>>>> This works for me. We need to figure out how to make this  
> >>>>> portable now.
> >>>>>
> >>>>> -- Pavan
> >>>>
> >>>> Thanks for your persistence (no pun intended).
> >>>>
> >>>> When running with 1,024 ranks, launching via MPD can 
> take several  
> >>>> minutes. I am assuming that hydra will launch in seconds.
> >>>>
> >>>> Scott
> >>>
> >>
> >
> 
> 



More information about the mpich-discuss mailing list