[mpich-discuss] MPICH2 Hydra integration with SGE and PBS

Pavan Balaji balaji at mcs.anl.gov
Sat Aug 7 13:32:20 CDT 2010


On 08/07/2010 12:58 PM, Reuti wrote:
>>> job_is_first_task  FALSE
>>
>> I'm not sure I follow this. The script should already only launch
>> one process (which will be mpiexec) on the first node. mpiexec
>> will then launch the remaining processes.
>
> SGE will control the number of started slave processes. In the old
> MPICH(1) it was indeed the case, that the started `mpirun` did some
> work in one of its forks and started only (n-1) slaves. What I
> observe in MPICH2 with Hydra is the following for a `qsub -pe mpich 2
> test_mpich.sh`:

Ah, I see the confusion here. This has been fixed in Hydra recently, so 
for local node launches Hydra just does a fork instead of trying to 
ssh/rsh/qrsh. That was probably after 1.3a2. We are trying to get 1.3b1 
out in the next few days which will have this fix. In the meanwhile, can 
you try out the nightly snapshot: 
http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/hydra

> a) Use of "SGE's internal launchers" (i.e. `qrsh ...` instead of a
> plain "ssh/rsh")
>
> This looks like shown above, so that all started master and slave
> processes are kids of the `sge_execd`. Advantage is to remove all
> processes of a job by a single `qdel`. You will also get a correct
> accounting of the consumed memory and time for a job, as SGE can
> track each of sge_execd's kids (called a tight integration).

I see, good point. Sounds like the only change that's required is to 
pass qrsh as the bootstrap executable (-bootstrap-exec qrsh), apart from 
using a newer version of Hydra as described above. Let me know if that 
works and I'll add sge as a bootstrap server which automatically does this.

> b) Use of "builtin" starter in an already tight integration
>
> Originally, slave tasks were started by an `rsh`. In case you need
> X-11 forwarding or a large number of slave tasks (rsh has a certain
> limit of file descriptors) `ssh` can be used. This means of course to
> setup hostbased or passphraseless authentication for the slave tasks.
> Both methods (rsh/ssh) will use a random port per job and per node to
> start the slave processes (for each slave process therefore a
> dedicated rshd/sshd is started, the system wide ones don't need to
> run all the time. I.e. rshd can be disabled in /etc/xinetd.d/rshd,
> and SGE can still use rsh). Whether the started slaves need any port
> on their own is a different things.
>
> The "builtin" method does not need a random port, allows also a
> larger number of file descriptors and need no authorization setup.
> X11 forwarding should be added later.

By "builtin" here, do you mean using "qrsh" or something else? I thought 
qrsh does X-forwarding by default (or does it require us to pass an 
extra argument?).

  -- Pavan

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list