[mpich-discuss] SGE & Hydra Problem

Reuti reuti at staff.uni-marburg.de
Wed Sep 22 07:42:43 CDT 2010


Am 22.09.2010 um 14:06 schrieb Pavan Balaji:

> ----- "Ursula Winkler" <ursula.winkler at uni-graz.at> wrote:
> 
>>> Ok, just to confirm, if nodes X and Y are both in the
>> $TMPDIR/machines file, you are running the qrsh command from node X to
>> node Y, correct?
>> 
>> yes
> 
> Very surprising, given that this works when used from within Hydra. Without running qrsh independently (without Hydra), it's hard to figure out what's going wrong.
> 
> Reuti: any ideas on why this is happening?
> 
> Below is something I noticed, though that might or might not be a problem.
> 
>> The cluster on which  it works:
>>    SGE_RSH_COMMAND=/installadmin/sge/utilbin/lx24-amd64/rsh
> 
> This doesn't seem to be set on the cluster where mpiexec doesn't work. Is this supposed to be the case?

This will only reflect the actual set up communication method (i.e. it's an output only to be used in the job script if you need a reference to the used [or to be used] command to reach other nodes). With the latest -builtin- method it will also read "builtin".

Did we discuss already SGE's configuration for "rsh_daemon" and "rsh_command"?

-- Reuti


> 
> -- Pavan
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list