[mpich-discuss] SGE & Hydra Problem

Pavan Balaji balaji at mcs.anl.gov
Wed Sep 15 03:11:38 CDT 2010


----- "Ursula Winkler" <ursula.winkler at uni-graz.at> wrote:
> [mpiexec at b79] Launch arguments: 
> /installadmin/mpich2/test/intel/bin/hydra_pmi_proxy --control-port 
> b79:45593 --debug --demux poll --pgid 0 --enable-stdin 1 --proxy-id 0
> [mpiexec at b79] Launch arguments: /installadmin/sge/bin/lx24-amd64/qrsh
> 
> -inherit -V b51 /installadmin/mpich2/test/intel/bin/hydra_pmi_proxy 
> --control-port b79:45593 --debug --demux poll --pgid 0 --enable-stdin
> 1 
> --proxy-id 1

I'm assuming the application still hung at this point and you had to kill it?

> > % /installadmin/sge/bin/lx24-amd64/qrsh -inherit -V b56 
> > /installadmin/mpich2/test/intel/bin/hydra_pmi_proxy --control-port 
> > b73:52298 --debug --demux poll --pgid 0 --enable-stdin 1 --proxy-id
> 1
> >   
> error: "qrsh" called with option "-inherit", but "JOB_ID" not set in 
> environment
> 
> export JOB_ID=158269
> [root at b00 ~]# /installadmin/sge/bin/lx24-amd64/qrsh -inherit -V b56 
> /installadmin/mpich2/test/intel/bin/hydra_pmi_proxy --control-port 
> b73:52298 --debug --demux poll --pgid 0 --enable-stdin 1 --proxy-id 1
> error: executing task of job 158269 failed: missing "SGE_TASK_ID" in 
> environment
> 
> I do not know to what value I should set SGE_TASK_ID so I always get
> an 
> error with "error:
> executing task of job 158275 failed"

Are you not running this command within an SGE job script? The qrsh command should be run from b79, not from b00.

 -- Pavan


More information about the mpich-discuss mailing list