[mpich-discuss] SGE & Hydra Problem

Pavan Balaji balaji at mcs.anl.gov
Wed Sep 22 06:24:08 CDT 2010


----- "Ursula Winkler" <ursula.winkler at uni-graz.at> wrote:

> No, when mpiexec is placed within the SGE job script, it works fine on
> the second
> cluster. I meant just the command "qrsh -inherit -V ...
> hydra_pmi_proxy 
> ..." placed
> within the SGE script that results in the mentioned error message (on
> both clusters).

Ok, just to confirm, if nodes X and Y are both in the $TMPDIR/machines file, you are running the qrsh command from node X to node Y, correct?

I'm surprised that this is not working on the second cluster, as this is exactly what Hydra does internally.

Can you run mpiexec (from within an SGE script) for both cluster with the -verbose option and send me the outputs?

% mpiexec -verbose /bin/hostname

 -- Pavan


More information about the mpich-discuss mailing list