<br><br><div class="gmail_quote">On Fri, Aug 6, 2010 at 10:16 AM, Reuti <span dir="ltr"><<a href="mailto:reuti@staff.uni-marburg.de">reuti@staff.uni-marburg.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi,<br>
<br>
I just looked into the Hydra startup in MPICH2. For a tight integration into SGE it looks like the default MPICH integration can be reused, with the small change to have:<br>
<br>
job_is_first_task FALSE<br>
<br>
as MPICH2 with Hydra will also make a local "ssh/rsh" call to start the first daemon on the master node of the submitted parallel job. As by default the absolute path "/usr/bin/ssh" to "ssh" is complied in (same for "rsh"), it's necessary in the jobscript to have:<br>
<br>
mpiexec -bootstrap rsh -bootstrap-exec rsh -machinefile $TMPDIR/machines ./mpihello<br>
<br>
to have a call to a plain "rsh" *), so that SGE's "-catch_rsh" will do all the rest automatically.<br>
<br>
*) Note: the final communication method is setup solely in SGE, which can be "builtin", "classic rsh" or also "ssh" (according to the Howto at <a href="http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html" target="_blank">http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html</a>). From the point of view of the application, it's also possible to instruct it to call "fubar" to reach another node. In SGE it would be necessary in start_proc_args to create a link in $TMPDIR which is named "fubar" and point to SGE's rsh-wrapper. Only inside the rsh-wrapper, the `qrsh -inherit ...` will use the method which is setup in SGE to reach another node in the end.<br>
<br>
==<br>
<br>
For PBS I found in the documentation the hint to use:<br>
<br>
mpiexec -rmk pbs ...<br>
<br>
but it looks like it will just get the list of nodes from PBS. For the use of the task manager it's still necessary to use an external `mpiexec` from OSC? Are there any plans to have it directly built into MPICH2?<br>
<br></blockquote><div><br>I've been using this command (mpiexec -rmk pbs) inside torque scripts without any problems, so it's not necessary to use OSC's mpiexec to accomplish this, you do need to specify the allocated nodes and procesors per node inside the qsub script for it to work properly. Example:<br>
<br>#!/bin/sh<br>#PBS -N Si1-1<br>#PBS -l nodes=1:ppn=4,walltime=1:00:00<br>mpiexec -rmk pbs ./parallel_app<br><br>Hope this helps.<br><br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
-- Reuti<br>
<br>
<br>
PS: As MPICH2 is trying to find the full path of "ssh/rsh" in the source "src/pm/hydra/tools/bootstrap/ssh/ssh_launch.c" before the hardcoded "/usr/bin/ssh" is used, I wonder whether it would be a shortcut to have an option like "-rsh" to `mpiexec`, which will use "rsh" as bootstrap and also just a call to `rsh`. If it would be adjusted in the source for SGE integration by the user, it would be necessary to change several lines each time a new release appears. Another option would be to check whether some of the $SGE... environment variables are set, and do the correct thing automatically.<br>
_______________________________________________<br>
mpich-discuss mailing list<br>
<a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss</a><br>
</blockquote></div><br><br clear="all"><br>-- <br>Ivan Pulido<br>Estudiante de Física<br>Universidad Nacional de Colombia<br>