[Swift-user] Re: How to run coasters over ssh:pbs ?

Michael Wilde wilde at mcs.anl.gov
Thu Feb 25 17:43:36 CST 2010


But I think in addition Wenjun was running this from communicado, his portal host (and ie no local qsub) so he needs to ssh to PADS and run qsub there (ie, get the coaster provider to do that).

- Mike

----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:

> I think what's happening in Wenjun's case is that qsub is not in the
> path. I suppose this could be tested by running a simple ssh job that
> does "which qsub".
> 
> On Thu, 2010-02-25 at 16:08 -0600, Michael Wilde wrote:
> > Was: Re: [PADS Support #3457] globus services for accessing PADS
> > 
> > --
> > 
> > Hi Wenjun,
> > 
> > Are you doing this from communicado? If so, you need to use ssh to
> get
> > from communicado to PADS. And since when you get there you want to
> run
> > coasters. you need I think to use the coaster provider and
> jobmanager
> > = ssh:pbs.
> > 
> > I'm cc'ing swift-user and Mihael to discuss the exact sites options
> > needed.
> > 
> > We'd want coasters to use ssh to launch one PBS job to launch all
> the
> > workers. 
> > 
> > I think there are still some issues on PADS with interpreting
> #nodes
> > and workerspernode correctly, so please bear with us while we
> figure
> > this out.
> > 
> > I'll try this myself later tonight, but if you discover anything
> > (either working of failing) please post to the User list.
> > 
> > Thanks,
> > 
> > Mike
> > 
> > 
> > ----- "Wenjun Wu" <wwj at ci.uchicago.edu> wrote:
> > 
> > > Hi Mike,
> > >   I tried to run loop model through ssh+pbs on PADS. The ssh
> provider
> > > 
> > > seems working but pbs not.
> > >   Please check the site file and error message in the below:
> > >  
> > > Wenjun
> > > > Wenjun,
> > > >
> > > > I think having GRAM (GRAM5 especially) on PADS would be useful.
> > > >
> > > > But in the meantime, I think you can also use swift's ssh
> provider
> > > to get to PADS.
> > > > We'll need to try that.
> > > >
> > > > - Mike
> > > >
> > > >   
> > > 
> > > <pool handle="pads">
> > >     <!-- gridftp url="gsiftp://stor.ci.uchicago.edu"/ -->
> > >     <filesystem url="login.pads.ci.uchicago.edu" provider="ssh"
> />
> > >     <execution provider="pbs" url="svc.pads.ci.uchicago.edu" />
> > >     <workdirectory>/gpfs/pads/oops/workflows</workdirectory>
> > >     <profile namespace="karajan" key="jobThrottle">0.03</profile>
> > >     <profile namespace="karajan"
> key="initialScore">10000</profile>
> > >  </pool>
> > > 
> > > Caused by: qstat failed (exit code 255)
> > > 
> > >         at 
> > >
> org.globus.cog.karajan.workflow.events.FailureNotificationEvent.<init>(FailureNotificationEvent.java:36)
> > >         at 
> > >
> org.globus.cog.karajan.workflow.events.FailureNotificationEvent.<init>(FailureNotificationEvent.java:42)
> > >         at 
> > >
> org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:147)
> > >         at 
> > >
> org.globus.cog.karajan.workflow.nodes.grid.GridExec.taskFailed(GridExec.java:320)
> > >         at 
> > >
> org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:276)
> > >         at 
> > >
> org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:104)
> > >         at 
> > >
> org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168)
> > >         at 
> > >
> org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:661)
> > >         at 
> > >
> org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:424)
> > >         at 
> > >
> org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:410)
> > >         at 
> > >
> org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:236)
> > >         at 
> > >
> org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:224)
> > >         at 
> > >
> org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:54)
> > >         at 
> > >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.processFailed(AbstractJobSubmissionTaskHandler.java:101)
> > >         at 
> > >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.processFailed(AbstractExecutor.java:246)
> > >         at 
> > >
> org.globus.cog.abstraction.impl.scheduler.common.Job.fail(Job.java:198)
> > >         at 
> > >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.failAll(AbstractQueuePoller.java:141)
> > >         at 
> > >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:172)
> > >         at 
> > >
> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:80)
> > >         ... 1 more
> > > 
> > > 
> > > > ----- "Ti Leggett" <pads-support at ci.uchicago.edu> wrote:
> > > >
> > > >   
> > > >> There are currently no Globus job managers, but GridFTP is at
> > > >> stor.ci.uchicago.edu.
> > > >>
> > > >> I'll work on getting GRAM up. Do you need WSRF or Pre-WS GRAM?
> > > >>
> > > >> On Wed Feb 24 14:45:57 2010, wwj at ci.uchicago.edu wrote:
> > > >>     
> > > >>> Hello,
> > > >>>    Could you please tell me whether there is any globus
> services
> > > for
> > > >>>       
> > > >>> PADS such as GridFTP and GRAM?
> > > >>>    I'm trying to launch jobs to PADS remotely but can't
> figure
> > > out
> > > >>>       
> > > >> what 
> > > >>     
> > > >>> are the right URLs for the GridFTP
> > > >>>    service and GRAM.
> > > >>>    Thanks!
> > > >>>
> > > >>> Wenjun
> > > >>>       
> > > >
> > > >



More information about the Swift-user mailing list