[mpich-discuss] MPICH2 Hydra integration with SGE and PBS

Reuti reuti at staff.uni-marburg.de
Fri Aug 6 10:16:08 CDT 2010


Hi,

I just looked into the Hydra startup in MPICH2. For a tight integration into SGE it looks like the default MPICH integration can be reused, with the small change to have:

job_is_first_task  FALSE

as MPICH2 with Hydra will also make a local "ssh/rsh" call to start the first daemon on the master node of the submitted parallel job. As by default the absolute path "/usr/bin/ssh" to "ssh" is complied in (same for "rsh"), it's necessary in the jobscript to have:

mpiexec -bootstrap rsh -bootstrap-exec rsh -machinefile $TMPDIR/machines ./mpihello

to have a call to a plain "rsh" *), so that SGE's "-catch_rsh" will do all the rest automatically.

*) Note: the final communication method is setup solely in SGE, which can be "builtin", "classic rsh" or also "ssh" (according to the Howto at http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html). From the point of view of the application, it's also possible to instruct it to call "fubar" to reach another node. In SGE it would be necessary in start_proc_args to create a link in $TMPDIR which is named "fubar" and point to SGE's rsh-wrapper. Only inside the rsh-wrapper, the `qrsh -inherit ...` will use the method which is setup in SGE to reach another node in the end.

==

For PBS I found in the documentation the hint to use:

mpiexec -rmk pbs ...

but it looks like it will just get the list of nodes from PBS. For the use of the task manager it's still necessary to use an external `mpiexec` from OSC? Are there any plans to have it directly built into MPICH2?

-- Reuti


PS: As MPICH2 is trying to find the full path of "ssh/rsh" in the source "src/pm/hydra/tools/bootstrap/ssh/ssh_launch.c" before the hardcoded "/usr/bin/ssh" is used, I wonder whether it would be a shortcut to have an option like "-rsh" to `mpiexec`, which will use "rsh" as bootstrap and also just a call to `rsh`. If it would be adjusted in the source for SGE integration by the user, it would be necessary to change several lines each time a new release appears. Another option would be to check whether some of the $SGE... environment variables are set, and do the correct thing automatically.


More information about the mpich-discuss mailing list