[mpich-discuss] Problem with spawning child with same executable name

Pavan Balaji balaji at mcs.anl.gov
Sat Feb 12 10:05:01 CST 2011


Hi,

I dug into the code and looked into this. I agree this is a problem and 
needs to be fixed, but it requires ripping out a lot of the code, and 
modifying the internal algorithm used significantly.

This is not a quick fix, so I created a ticket for it: 
https://trac.mcs.anl.gov/projects/mpich2/ticket/1434

Please add yourself to the CC list if you are interested.

Thanks for reporting the issue.

  -- Pavan

On 02/10/2011 08:41 PM, Yauheni Zelenko wrote:
> Hi, Pavan!
>
> I added some debugging output with timestamps.
>
> New set of children spawned after previous set of children call MPI_Finalize. However all processes exited only after mater terminated.
>
> I definitely could lead to more resource usage in supposed program usage since children still some amount of system resources.
>
> Also I'm not sure that at this stage Hydra will have enough information to launch new child processes on freed hosts.
>
> Eugene.
> ________________________________________
> From: Pavan Balaji [balaji at mcs.anl.gov]
> Sent: Wednesday, February 09, 2011 2:31 PM
> To: Yauheni Zelenko
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Problem with spawning child with same executable name
>
> So the first set of spawned processes have terminated before the next set is started, is it?
>
> Pavan Balaji @ iPhone
> (Big fingers. Small email.)
>
> On Feb 9, 2011, at 2:20 PM, Yauheni Zelenko<zelenko at cadence.com>  wrote:
>
>> But there is still living process on host1. I think accounting fact this will be more correct Hydra behaviour.
>>
>> Eugene.
>> ________________________________________
>> From: Pavan Balaji [balaji at mcs.anl.gov]
>> Sent: Wednesday, February 09, 2011 2:11 PM
>> To: mpich-discuss at mcs.anl.gov
>> Cc: Yauheni Zelenko
>> Subject: Re: [mpich-discuss] Problem with spawning child with same executable name
>>
>> On 02/09/2011 04:06 PM, Yauheni Zelenko wrote:
>>> Then I run program with Hydra: mpiexec -host "host1:2,host2:2"
>>>
>>> Master process is run on host1. At first spawn 1 child was run on
>>> host1 and 2 on host2, but on consequent spawns, 2 children was on
>>> host1 and 1 on host2.
>>>
>>> I think such resources allocation may create balancing problems and
>>> Hydra should not spawn children processes on hosts still in use.
>>
>> That sound correct to me. Hydra looks at the host list as:
>>
>> host1, host1, host2, host2, ..., [wrap around].
>>
>> The master process is launched on the first "host1". When you spawn
>> three processes the first time, it launches them on "host1", "host2",
>> and "host2". When you spawn three processes the second time, it launches
>> them on "host1", "host1", "host2". The next spawn of three processes
>> will be "host2", "host1", "host1", etc.
>>
>>   -- Pavan
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list