[mpich-discuss] Hydra issues
Rusty Lusk
lusk at mcs.anl.gov
Wed Aug 26 15:17:08 CDT 2009
Ah. I assume that we still use a lazy connect, so MPI_Init should be
fast, but then allgatherv would indeed cause a lot of connections,
depending on choice of allgatherv implementation (how many?)
On Wednesday,Aug 26, 2009, at 3:09 PM, Scott Atchley wrote:
> No, the ring starts fast enough. It is connecting 1024 processes
> that is slow (allgatherv).
>
> By contrast, Intel MPI launched in < 10 seconds.
>
> Scott
>
> On Aug 26, 2009, at 4:07 PM, Rusty Lusk wrote:
>
>> I assume that you mean launching the MPD ring is slow. Once the
>> MPD ring is up, launching should be quick. The original idea was
>> that the MPD ring would be persistent across jobs, even from
>> different people, as long as the jobs used the same nodes.
>>
>> Rusty
>>
>> On Wednesday,Aug 26, 2009, at 2:45 PM, Scott Atchley wrote:
>>
>>> On Aug 26, 2009, at 3:39 PM, Pavan Balaji wrote:
>>>
>>>>>> However you could use one of the various workarounds for this,
>>>>>> such as an LD_PRELOADed setvbuf call: http://lists.gnu.org/archive/html/bug-coreutils/2008-11/msg00164.html
>>>>> This does not change the behavior.
>>>>> I am still stumped as to why there is no delay when using
>>>>> persistent (launch-mode=2) versus a delay with no proxies
>>>>> (launch-mode=1).
>>>>
>>>> This works for me. We need to figure out how to make this
>>>> portable now.
>>>>
>>>> -- Pavan
>>>
>>> Thanks for your persistence (no pun intended).
>>>
>>> When running with 1,024 ranks, launching via MPD can take several
>>> minutes. I am assuming that hydra will launch in seconds.
>>>
>>> Scott
>>
>
More information about the mpich-discuss
mailing list