[mpich-discuss] SGE & Hydra Problem
Reuti
reuti at staff.uni-marburg.de
Tue Sep 14 10:10:23 CDT 2010
Hi,
Am 14.09.2010 um 09:37 schrieb Ursula Winkler:
> thank's for your answer.
>> which version of SGE are you running - using port 536 was used in
>> former times, and nowadays the official ports are 6444 and 6445.
> SGE 6.1. The port number is ok.
>>> error: getting configuration: unable to contact qmaster using
>>> port 536 on host "b00"
>>> error:
>>> Cannot get configuration from qmaster.
>>>
>>
>> This looks more like a network problem, and unrelated to SGE or
>> MPICH2. Dou you have any firewall on the machines? Other
>> applications run across the nodes? AFAICS below, SGE is using rsh,
>> and not the default -builtin- of the newer versions of SGE (there
>> would be no rsh/ rshd any longer) - nevertheless, your setup should
>> work.
>>
> Within the cluster there is no firewall. There are no other
> applications running
> accross the nodes. The setup works for MPICH1, and MPICH2 smpd, just
> with
> Hydra are the problems. I also can not see any network problems.
> The more mysterious, Hydra works fine on another cluster (with same
> OS and SGE).
> Hmm.
did you also recompile all applications with the latest version of
MPICH2, so that binaries and mpiexec are from the same version?
-- Reuti
>> In principle this looks nice, as all the processes are bound to
>> the sgeexecd. This is what I tried to achieve in:
>>
>> http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-August/007678.html
>>
> yes, the tight integration works fine! (I'd be happy if it were the
> same with mvapich.)
>> But in the meantime, SGE is now supported out-of-the-box by
>> MPICH2. Can just issue a plain "mpiexec ./cpitest.x". With a
>> proper request of a PE in SGE (/bin/true is sufficient for the
>> start/stop_proc_args), Hydra should get the number of cores and
>> nodes automatically (in 1.3b1, which you are referring to).
>>
> I tried it out on the cluster where hydra performs: it works
> perfectly -
> thank you.
>
> Just the hydra problem on the other cluster remains and I don't have
> any
> ideas why.
>
> Ursula
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
More information about the mpich-discuss
mailing list