[mpich-discuss] SGE & Hydra Problem

Ursula Winkler ursula.winkler at uni-graz.at
Tue Sep 14 02:37:33 CDT 2010


Hi,

thank's for your answer.
>   
> which version of SGE are you running - using port 536 was used in  
> former times, and nowadays the official ports are 6444 and 6445.
>
>
>   
SGE 6.1. The port number is ok.
>>     error: getting configuration: unable to contact qmaster using  
>> port 536 on host "b00"
>>     error:
>>     Cannot get configuration from qmaster.
>>     
>
> This looks more like a network problem, and unrelated to SGE or  
> MPICH2. Dou you have any firewall on the machines? Other applications  
> run across the nodes? AFAICS below, SGE is using rsh, and not the  
> default -builtin- of the newer versions of SGE (there would be no rsh/ 
> rshd any longer) - nevertheless, your setup should work.
>   
Within the cluster there is no firewall. There are no other applications 
running
accross the nodes. The setup works for MPICH1, and MPICH2 smpd, just with
Hydra are the problems. I also can not see any network problems.
The more mysterious, Hydra works fine on another cluster (with same OS 
and SGE).
Hmm.
>   
> In principle this looks nice, as all the processes are bound to the  
> sgeexecd. This is what I tried to achieve in:
>
> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-August/007678.html
>   
yes, the tight integration works fine! (I'd be happy if it were the same 
with mvapich.)
> But in the meantime, SGE is now supported out-of-the-box by MPICH2.  
> Can just issue a plain "mpiexec ./cpitest.x". With a proper request of  
> a PE in SGE (/bin/true is sufficient for the start/stop_proc_args),  
> Hydra should get the number of cores and nodes automatically (in  
> 1.3b1, which you are referring to).
>   
I tried it out on the cluster where hydra performs:  it  works perfectly -
thank you.

Just the hydra problem on the other cluster remains and I don't have any
ideas why.

Ursula


More information about the mpich-discuss mailing list