[mpich-discuss] SGE & Hydra Problem
Ursula Winkler
ursula.winkler at uni-graz.at
Tue Sep 14 02:37:33 CDT 2010
Hi,
thank's for your answer.
>
> which version of SGE are you running - using port 536 was used in
> former times, and nowadays the official ports are 6444 and 6445.
>
>
>
SGE 6.1. The port number is ok.
>> error: getting configuration: unable to contact qmaster using
>> port 536 on host "b00"
>> error:
>> Cannot get configuration from qmaster.
>>
>
> This looks more like a network problem, and unrelated to SGE or
> MPICH2. Dou you have any firewall on the machines? Other applications
> run across the nodes? AFAICS below, SGE is using rsh, and not the
> default -builtin- of the newer versions of SGE (there would be no rsh/
> rshd any longer) - nevertheless, your setup should work.
>
Within the cluster there is no firewall. There are no other applications
running
accross the nodes. The setup works for MPICH1, and MPICH2 smpd, just with
Hydra are the problems. I also can not see any network problems.
The more mysterious, Hydra works fine on another cluster (with same OS
and SGE).
Hmm.
>
> In principle this looks nice, as all the processes are bound to the
> sgeexecd. This is what I tried to achieve in:
>
> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-August/007678.html
>
yes, the tight integration works fine! (I'd be happy if it were the same
with mvapich.)
> But in the meantime, SGE is now supported out-of-the-box by MPICH2.
> Can just issue a plain "mpiexec ./cpitest.x". With a proper request of
> a PE in SGE (/bin/true is sufficient for the start/stop_proc_args),
> Hydra should get the number of cores and nodes automatically (in
> 1.3b1, which you are referring to).
>
I tried it out on the cluster where hydra performs: it works perfectly -
thank you.
Just the hydra problem on the other cluster remains and I don't have any
ideas why.
Ursula
More information about the mpich-discuss
mailing list