[mpich-discuss] Hydra issues
Scott Atchley
atchley at myri.com
Wed Aug 26 15:23:47 CDT 2009
Rusty,
1024 (128 nodes, 8 ppn).
Scott
On Aug 26, 2009, at 4:17 PM, Rusty Lusk wrote:
> Ah. I assume that we still use a lazy connect, so MPI_Init should
> be fast, but then allgatherv would indeed cause a lot of
> connections, depending on choice of allgatherv implementation (how
> many?)
>
> On Wednesday,Aug 26, 2009, at 3:09 PM, Scott Atchley wrote:
>
>> No, the ring starts fast enough. It is connecting 1024 processes
>> that is slow (allgatherv).
>>
>> By contrast, Intel MPI launched in < 10 seconds.
>>
>> Scott
>>
>> On Aug 26, 2009, at 4:07 PM, Rusty Lusk wrote:
>>
>>> I assume that you mean launching the MPD ring is slow. Once the
>>> MPD ring is up, launching should be quick. The original idea was
>>> that the MPD ring would be persistent across jobs, even from
>>> different people, as long as the jobs used the same nodes.
>>>
>>> Rusty
>>>
>>> On Wednesday,Aug 26, 2009, at 2:45 PM, Scott Atchley wrote:
>>>
>>>> On Aug 26, 2009, at 3:39 PM, Pavan Balaji wrote:
>>>>
>>>>>>> However you could use one of the various workarounds for this,
>>>>>>> such as an LD_PRELOADed setvbuf call: http://lists.gnu.org/archive/html/bug-coreutils/2008-11/msg00164.html
>>>>>> This does not change the behavior.
>>>>>> I am still stumped as to why there is no delay when using
>>>>>> persistent (launch-mode=2) versus a delay with no proxies
>>>>>> (launch-mode=1).
>>>>>
>>>>> This works for me. We need to figure out how to make this
>>>>> portable now.
>>>>>
>>>>> -- Pavan
>>>>
>>>> Thanks for your persistence (no pun intended).
>>>>
>>>> When running with 1,024 ranks, launching via MPD can take several
>>>> minutes. I am assuming that hydra will launch in seconds.
>>>>
>>>> Scott
>>>
>>
>
More information about the mpich-discuss
mailing list