[mpich-discuss] SGE & Hydra Problem

Ursula Winkler ursula.winkler at uni-graz.at
Thu Sep 16 03:11:52 CDT 2010


Ursula Winkler schrieb:
> Pavan Balaji schrieb:
>   
>> On 09/15/2010 04:08 AM, Ursula Winkler wrote:
>>   
>>     
>>>>> But the command "/installadmin/sge/bin/lx24-amd64/qrsh -inherit -V
>>>>> b56
>>>>> /installadmin/mpich2/test/intel/bin/hydra_pmi_proxy --control-port
>>>>> b73:52298
>>>>> --debug --demux poll --pgid 0 --enable-stdin 1 --proxy-id 1" run in a
>>>>> SGE job
>>>>> script from b46 gives the same error: "error: executing task of job
>>>>> 158279 failed:"
>>>>>         
>>>>>           
>> Ok, let's try something simpler (again inside an SGE job script):
>>
>> % /installadmin/sge/bin/lx24-amd64/qrsh -inherit -V b56 /bin/hostname
>>
>>   -- Pavan
>>
>>   
>>     
>
> Same error. But it's the same on the other cluster where I don't have 
> problems
> with Hydra.
>
> Ursula
>
>
>   


Well, it won't work as long as all participating hosts aren't in the 
$TMPDIR/machines file.
If that's the case then the command doesn' hang and I get the error 
(again on both clusters):

[proxy:0:1 at b46] HYDU_sock_connect (./utils/sock/sock.c:151): connect 
error (Connection refused)
[proxy:0:1 at b46] main (./pm/pmiserv/pmip.c:202): unable to connect to 
server b45 at port 52298 (check for firewalls!)






More information about the mpich-discuss mailing list