[mpich-discuss] SGE & Hydra Problem

Reuti reuti at staff.uni-marburg.de
Tue Sep 14 10:10:23 CDT 2010


Hi,

Am 14.09.2010 um 09:37 schrieb Ursula Winkler:

> thank's for your answer.
>>  which version of SGE are you running - using port 536 was used in   
>> former times, and nowadays the official ports are 6444 and 6445.
> SGE 6.1. The port number is ok.
>>>    error: getting configuration: unable to contact qmaster using   
>>> port 536 on host "b00"
>>>    error:
>>>    Cannot get configuration from qmaster.
>>>
>>
>> This looks more like a network problem, and unrelated to SGE or   
>> MPICH2. Dou you have any firewall on the machines? Other  
>> applications  run across the nodes? AFAICS below, SGE is using rsh,  
>> and not the  default -builtin- of the newer versions of SGE (there  
>> would be no rsh/ rshd any longer) - nevertheless, your setup should  
>> work.
>>
> Within the cluster there is no firewall. There are no other  
> applications running
> accross the nodes. The setup works for MPICH1, and MPICH2 smpd, just  
> with
> Hydra are the problems. I also can not see any network problems.
> The more mysterious, Hydra works fine on another cluster (with same  
> OS and SGE).
> Hmm.

did you also recompile all applications with the latest version of  
MPICH2, so that binaries and mpiexec are from the same version?

-- Reuti


>>  In principle this looks nice, as all the processes are bound to  
>> the  sgeexecd. This is what I tried to achieve in:
>>
>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-August/007678.html
>>
> yes, the tight integration works fine! (I'd be happy if it were the  
> same with mvapich.)
>> But in the meantime, SGE is now supported out-of-the-box by  
>> MPICH2.  Can just issue a plain "mpiexec ./cpitest.x". With a  
>> proper request of  a PE in SGE (/bin/true is sufficient for the  
>> start/stop_proc_args),  Hydra should get the number of cores and  
>> nodes automatically (in  1.3b1, which you are referring to).
>>
> I tried it out on the cluster where hydra performs:  it  works  
> perfectly -
> thank you.
>
> Just the hydra problem on the other cluster remains and I don't have  
> any
> ideas why.
>
> Ursula
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list