[mpich-discuss] mpdboot beheviour

Reuti reuti at Staff.Uni-Marburg.DE
Fri Jan 15 11:44:09 CST 2010


Am 15.01.2010 um 18:31 schrieb Dave Goodell:

> On Jan 15, 2010, at 10:57 AM, Cezary Śliwa wrote:
>
>> Regarding mpdboot in mpich2-1.2.1. The default is to use --ncpus=1  
>> rather than the value from  mpd.hosts. Does it make sense? Why not  
>> use the value from mpd.hosts as the default?
>
> This is a longstanding known user interface problem.  It trips  
> people up all the time, including me on occasion.  Unfortunately,  
> we are unlikely to change the behavior for two reasons: (1) it will  
> break compatibility with the thousands of scripts out there that  
> invoke mpdboot assuming the old

A new option to mpdboot could be introduced to select "version 2" of  
mpdboot with its new behavior.


> behavior and (2) mpd is receiving a bare minimum of maintenance and  
> development at this point because it is being replaced by hydra.
>
>> In case of running a job under PBS or SGE, the correct number is  
>> in the file. At present, one has to extract this information from  
>> the file and put it on the mpdboot command line. This is cumbersome.

For SGE it's best to avoid mpdboot and start the daemons dedicated  
for each job in a PE's start_proc_args, i.e. each job gets its own ring:

http://gridengine.sunsource.net/howto/mpich2-integration/mpich2- 
integration.html

This way the daemons can also be started w/o any rsh or ssh between  
the nodes.

-- Reuti


> I agree wholeheartedly.
>
>> Moreover, if the host running mpdboot is not in mpd.hosts, it  
>> makes sense to use --ncpus=0 as the default.
>
> Just FYI, this doesn't actually work the way you would expect.   
> mpdboot still basically sets up an mpd as though you had specified  
> --ncpus=1.  You can't use mpd to have a "head node" in a  
> straightforward fashion.  The best you can do is use the "-1"  
> option to mpiexec to avoid placing processes locally first, but  
> that is a pretty weak option too.
>
> Hydra supports running mpiexec on a node that isn't in the hostfile  
> (at least with all bootstrap servers that support remote process  
> creation).
>
> -Dave
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>



More information about the mpich-discuss mailing list