[mpich-discuss] MPICH2 Hydra integration with SGE and PBS

Ivan Pulido mefistofeles87 at gmail.com
Fri Aug 6 14:44:20 CDT 2010


On Fri, Aug 6, 2010 at 10:16 AM, Reuti <reuti at staff.uni-marburg.de> wrote:

> Hi,
>
> I just looked into the Hydra startup in MPICH2. For a tight integration
> into SGE it looks like the default MPICH integration can be reused, with the
> small change to have:
>
> job_is_first_task  FALSE
>
> as MPICH2 with Hydra will also make a local "ssh/rsh" call to start the
> first daemon on the master node of the submitted parallel job. As by default
> the absolute path "/usr/bin/ssh" to "ssh" is complied in (same for "rsh"),
> it's necessary in the jobscript to have:
>
> mpiexec -bootstrap rsh -bootstrap-exec rsh -machinefile $TMPDIR/machines
> ./mpihello
>
> to have a call to a plain "rsh" *), so that SGE's "-catch_rsh" will do all
> the rest automatically.
>
> *) Note: the final communication method is setup solely in SGE, which can
> be "builtin", "classic rsh" or also "ssh" (according to the Howto at
> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html). From the
> point of view of the application, it's also possible to instruct it to call
> "fubar" to reach another node. In SGE it would be necessary in
> start_proc_args to create a link in $TMPDIR which is named "fubar" and point
> to SGE's rsh-wrapper. Only inside the rsh-wrapper, the `qrsh -inherit ...`
> will use the method which is setup in SGE to reach another node in the end.
>
> ==
>
> For PBS I found in the documentation the hint to use:
>
> mpiexec -rmk pbs ...
>
> but it looks like it will just get the list of nodes from PBS. For the use
> of the task manager it's still necessary to use an external `mpiexec` from
> OSC? Are there any plans to have it directly built into MPICH2?
>
>
I've been using this command (mpiexec -rmk pbs) inside torque scripts
without any problems, so it's not necessary to use OSC's mpiexec to
accomplish this, you do need to specify the allocated nodes and procesors
per node inside the qsub script for it to work properly. Example:

#!/bin/sh
#PBS -N Si1-1
#PBS -l nodes=1:ppn=4,walltime=1:00:00
mpiexec -rmk pbs ./parallel_app

Hope this helps.



> -- Reuti
>
>
> PS: As MPICH2 is trying to find the full path of "ssh/rsh" in the source
> "src/pm/hydra/tools/bootstrap/ssh/ssh_launch.c" before the hardcoded
> "/usr/bin/ssh" is used, I wonder whether it would be a shortcut to have an
> option like "-rsh" to `mpiexec`, which will use "rsh" as bootstrap and also
> just a call to `rsh`. If it would be adjusted in the source for SGE
> integration by the user, it would be necessary to change several lines each
> time a new release appears. Another option would be to check whether some of
> the $SGE... environment variables are set, and do the correct thing
> automatically.
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>



-- 
Ivan Pulido
Estudiante de Física
Universidad Nacional de Colombia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100806/d37bc3f5/attachment.htm>


More information about the mpich-discuss mailing list