[Swift-user] Re: How to run coasters over ssh:pbs ?

Mihael Hategan hategan at mcs.anl.gov
Thu Feb 25 16:17:55 CST 2010


I think what's happening in Wenjun's case is that qsub is not in the
path. I suppose this could be tested by running a simple ssh job that
does "which qsub".

On Thu, 2010-02-25 at 16:08 -0600, Michael Wilde wrote:
> Was: Re: [PADS Support #3457] globus services for accessing PADS
> 
> --
> 
> Hi Wenjun,
> 
> Are you doing this from communicado? If so, you need to use ssh to get
> from communicado to PADS. And since when you get there you want to run
> coasters. you need I think to use the coaster provider and jobmanager
> = ssh:pbs.
> 
> I'm cc'ing swift-user and Mihael to discuss the exact sites options
> needed.
> 
> We'd want coasters to use ssh to launch one PBS job to launch all the
> workers. 
> 
> I think there are still some issues on PADS with interpreting #nodes
> and workerspernode correctly, so please bear with us while we figure
> this out.
> 
> I'll try this myself later tonight, but if you discover anything
> (either working of failing) please post to the User list.
> 
> Thanks,
> 
> Mike
> 
> 
> ----- "Wenjun Wu" <wwj at ci.uchicago.edu> wrote:
> 
> > Hi Mike,
> >   I tried to run loop model through ssh+pbs on PADS. The ssh provider
> > 
> > seems working but pbs not.
> >   Please check the site file and error message in the below:
> >  
> > Wenjun
> > > Wenjun,
> > >
> > > I think having GRAM (GRAM5 especially) on PADS would be useful.
> > >
> > > But in the meantime, I think you can also use swift's ssh provider
> > to get to PADS.
> > > We'll need to try that.
> > >
> > > - Mike
> > >
> > >   
> > 
> > <pool handle="pads">
> >     <!-- gridftp url="gsiftp://stor.ci.uchicago.edu"/ -->
> >     <filesystem url="login.pads.ci.uchicago.edu" provider="ssh" />
> >     <execution provider="pbs" url="svc.pads.ci.uchicago.edu" />
> >     <workdirectory>/gpfs/pads/oops/workflows</workdirectory>
> >     <profile namespace="karajan" key="jobThrottle">0.03</profile>
> >     <profile namespace="karajan" key="initialScore">10000</profile>
> >  </pool>
> > 
> > Caused by: qstat failed (exit code 255)
> > 
> >         at 
> > org.globus.cog.karajan.workflow.events.FailureNotificationEvent.<init>(FailureNotificationEvent.java:36)
> >         at 
> > org.globus.cog.karajan.workflow.events.FailureNotificationEvent.<init>(FailureNotificationEvent.java:42)
> >         at 
> > org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:147)
> >         at 
> > org.globus.cog.karajan.workflow.nodes.grid.GridExec.taskFailed(GridExec.java:320)
> >         at 
> > org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:276)
> >         at 
> > org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:104)
> >         at 
> > org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168)
> >         at 
> > org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:661)
> >         at 
> > org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:424)
> >         at 
> > org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:410)
> >         at 
> > org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:236)
> >         at 
> > org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:224)
> >         at 
> > org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:54)
> >         at 
> > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.processFailed(AbstractJobSubmissionTaskHandler.java:101)
> >         at 
> > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.processFailed(AbstractExecutor.java:246)
> >         at 
> > org.globus.cog.abstraction.impl.scheduler.common.Job.fail(Job.java:198)
> >         at 
> > org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.failAll(AbstractQueuePoller.java:141)
> >         at 
> > org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:172)
> >         at 
> > org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:80)
> >         ... 1 more
> > 
> > 
> > > ----- "Ti Leggett" <pads-support at ci.uchicago.edu> wrote:
> > >
> > >   
> > >> There are currently no Globus job managers, but GridFTP is at
> > >> stor.ci.uchicago.edu.
> > >>
> > >> I'll work on getting GRAM up. Do you need WSRF or Pre-WS GRAM?
> > >>
> > >> On Wed Feb 24 14:45:57 2010, wwj at ci.uchicago.edu wrote:
> > >>     
> > >>> Hello,
> > >>>    Could you please tell me whether there is any globus services
> > for
> > >>>       
> > >>> PADS such as GridFTP and GRAM?
> > >>>    I'm trying to launch jobs to PADS remotely but can't figure
> > out
> > >>>       
> > >> what 
> > >>     
> > >>> are the right URLs for the GridFTP
> > >>>    service and GRAM.
> > >>>    Thanks!
> > >>>
> > >>> Wenjun
> > >>>       
> > >
> > >




More information about the Swift-user mailing list