[mpich-discuss] SGE & Hydra Problem

Pavan Balaji balaji at mcs.anl.gov
Tue Sep 14 03:46:27 CDT 2010


On 09/14/2010 02:37 AM, Ursula Winkler wrote:
>>>      error: getting configuration: unable to contact qmaster using
>>> port 536 on host "b00"
>>>      error:
>>>      Cannot get configuration from qmaster.
>>>
>>
>> This looks more like a network problem, and unrelated to SGE or
>> MPICH2. Dou you have any firewall on the machines? Other applications
>> run across the nodes? AFAICS below, SGE is using rsh, and not the
>> default -builtin- of the newer versions of SGE (there would be no rsh/
>> rshd any longer) - nevertheless, your setup should work.
>>
> Within the cluster there is no firewall. There are no other applications
> running
> accross the nodes. The setup works for MPICH1, and MPICH2 smpd, just with
> Hydra are the problems. I also can not see any network problems.
> The more mysterious, Hydra works fine on another cluster (with same OS
> and SGE).
> Hmm.

Can you run this by passing the -verbose option to mpiexec? It'll give 
some more output to help us debug it.

>> In principle this looks nice, as all the processes are bound to the
>> sgeexecd. This is what I tried to achieve in:
>>
>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-August/007678.html
>>
> yes, the tight integration works fine! (I'd be happy if it were the same
> with mvapich.)

Hydra will work out-of-the-box with MVAPICH2 (or any other derivative of 
MPICH2). I believe the latest version of MVAPICH-1 also supports the PMI 
interface, and hence Hydra and all other MPICH2 process managers.

  -- Pavan

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list