[mpich-discuss] MPICH2 Hydra integration with SGE and PBS

Pavan Balaji balaji at mcs.anl.gov
Sat Aug 7 10:57:06 CDT 2010


Reuti,

Thanks. We do have some support for PBS and SGE, but I'll be happy to 
work with you to improve them.

On 08/06/2010 10:16 AM, Reuti wrote:
> job_is_first_task  FALSE

I'm not sure I follow this. The script should already only launch one 
process (which will be mpiexec) on the first node. mpiexec will then 
launch the remaining processes.

> *) Note: the final communication method is setup solely in SGE,
> which can be "builtin", "classic rsh" or also "ssh" (according to the
> Howto at http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html).
> From the point of view of the application, it's also possible to
> instruct it to call "fubar" to reach another node. In SGE it would be
>  necessary in start_proc_args to create a link in $TMPDIR which is
> named "fubar" and point to SGE's rsh-wrapper. Only inside the
> rsh-wrapper, the `qrsh -inherit ...` will use the method which is
> setup in SGE to reach another node in the end.

The website seems to allow using ssh as well. So, why not use ssh? I'm 
not sure SGE's internal launchers give any benefit compared to ssh (or rsh).

> but it looks like it will just get the list of nodes from PBS. For
> the use of the task manager it's still necessary to use an external
> `mpiexec` from OSC? Are there any plans to have it directly built
> into MPICH2?

Correct. PBS support is only available as a resource management kernel 
(meaning, that Hydra will only query it for the available nodes, but not 
use it to launch processes). Yes, supporting PBS as a bootstrap server 
is in our plans. See https://trac.mcs.anl.gov/projects/mpich2/ticket/443

Please feel free to add yourself to the ticket to track progress on it.

  -- Pavan

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list