[mpich-discuss] Hydra issues

Wed Aug 26 15:17:08 CDT 2009

Ah.   I assume that we still use a lazy connect, so MPI_Init should be  
fast, but then allgatherv would indeed cause a lot of connections,  
depending on choice of allgatherv implementation (how many?)

On Wednesday,Aug 26, 2009, at 3:09 PM, Scott Atchley wrote:

> No, the ring starts fast enough. It is connecting 1024 processes  
> that is slow (allgatherv).
>
> By contrast, Intel MPI launched in < 10 seconds.
>
> Scott
>
> On Aug 26, 2009, at 4:07 PM, Rusty Lusk wrote:
>
>> I assume that you mean launching the MPD ring is slow.  Once the  
>> MPD ring is up, launching should be quick.   The original idea was  
>> that the MPD ring would be persistent across jobs, even from  
>> different people, as long as the jobs used the same nodes.
>>
>> Rusty
>>
>> On Wednesday,Aug 26, 2009, at 2:45 PM, Scott Atchley wrote:
>>
>>> On Aug 26, 2009, at 3:39 PM, Pavan Balaji wrote:
>>>
>>>>>> However you could use one of the various workarounds for this,  
>>>>>> such as an LD_PRELOADed setvbuf call: http://lists.gnu.org/archive/html/bug-coreutils/2008-11/msg00164.html
>>>>> This does not change the behavior.
>>>>> I am still stumped as to why there is no delay when using  
>>>>> persistent (launch-mode=2) versus a delay with no proxies  
>>>>> (launch-mode=1).
>>>>
>>>> This works for me. We need to figure out how to make this  
>>>> portable now.
>>>>
>>>> -- Pavan
>>>
>>> Thanks for your persistence (no pun intended).
>>>
>>> When running with 1,024 ranks, launching via MPD can take several  
>>> minutes. I am assuming that hydra will launch in seconds.
>>>
>>> Scott
>>
>