[mpich-discuss] Hydra issues

Scott Atchley atchley at myri.com
Wed Aug 26 15:23:47 CDT 2009


Rusty,

1024 (128 nodes, 8 ppn).

Scott

On Aug 26, 2009, at 4:17 PM, Rusty Lusk wrote:

> Ah.   I assume that we still use a lazy connect, so MPI_Init should  
> be fast, but then allgatherv would indeed cause a lot of  
> connections, depending on choice of allgatherv implementation (how  
> many?)
>
> On Wednesday,Aug 26, 2009, at 3:09 PM, Scott Atchley wrote:
>
>> No, the ring starts fast enough. It is connecting 1024 processes  
>> that is slow (allgatherv).
>>
>> By contrast, Intel MPI launched in < 10 seconds.
>>
>> Scott
>>
>> On Aug 26, 2009, at 4:07 PM, Rusty Lusk wrote:
>>
>>> I assume that you mean launching the MPD ring is slow.  Once the  
>>> MPD ring is up, launching should be quick.   The original idea was  
>>> that the MPD ring would be persistent across jobs, even from  
>>> different people, as long as the jobs used the same nodes.
>>>
>>> Rusty
>>>
>>> On Wednesday,Aug 26, 2009, at 2:45 PM, Scott Atchley wrote:
>>>
>>>> On Aug 26, 2009, at 3:39 PM, Pavan Balaji wrote:
>>>>
>>>>>>> However you could use one of the various workarounds for this,  
>>>>>>> such as an LD_PRELOADed setvbuf call: http://lists.gnu.org/archive/html/bug-coreutils/2008-11/msg00164.html
>>>>>> This does not change the behavior.
>>>>>> I am still stumped as to why there is no delay when using  
>>>>>> persistent (launch-mode=2) versus a delay with no proxies  
>>>>>> (launch-mode=1).
>>>>>
>>>>> This works for me. We need to figure out how to make this  
>>>>> portable now.
>>>>>
>>>>> -- Pavan
>>>>
>>>> Thanks for your persistence (no pun intended).
>>>>
>>>> When running with 1,024 ranks, launching via MPD can take several  
>>>> minutes. I am assuming that hydra will launch in seconds.
>>>>
>>>> Scott
>>>
>>
>



More information about the mpich-discuss mailing list