[Swift-user] Re: How to run coasters over ssh:pbs ?
Mihael Hategan
hategan at mcs.anl.gov
Thu Feb 25 16:17:55 CST 2010
I think what's happening in Wenjun's case is that qsub is not in the
path. I suppose this could be tested by running a simple ssh job that
does "which qsub".
On Thu, 2010-02-25 at 16:08 -0600, Michael Wilde wrote:
> Was: Re: [PADS Support #3457] globus services for accessing PADS
>
> --
>
> Hi Wenjun,
>
> Are you doing this from communicado? If so, you need to use ssh to get
> from communicado to PADS. And since when you get there you want to run
> coasters. you need I think to use the coaster provider and jobmanager
> = ssh:pbs.
>
> I'm cc'ing swift-user and Mihael to discuss the exact sites options
> needed.
>
> We'd want coasters to use ssh to launch one PBS job to launch all the
> workers.
>
> I think there are still some issues on PADS with interpreting #nodes
> and workerspernode correctly, so please bear with us while we figure
> this out.
>
> I'll try this myself later tonight, but if you discover anything
> (either working of failing) please post to the User list.
>
> Thanks,
>
> Mike
>
>
> ----- "Wenjun Wu" <wwj at ci.uchicago.edu> wrote:
>
> > Hi Mike,
> > I tried to run loop model through ssh+pbs on PADS. The ssh provider
> >
> > seems working but pbs not.
> > Please check the site file and error message in the below:
> >
> > Wenjun
> > > Wenjun,
> > >
> > > I think having GRAM (GRAM5 especially) on PADS would be useful.
> > >
> > > But in the meantime, I think you can also use swift's ssh provider
> > to get to PADS.
> > > We'll need to try that.
> > >
> > > - Mike
> > >
> > >
> >
> > <pool handle="pads">
> > <!-- gridftp url="gsiftp://stor.ci.uchicago.edu"/ -->
> > <filesystem url="login.pads.ci.uchicago.edu" provider="ssh" />
> > <execution provider="pbs" url="svc.pads.ci.uchicago.edu" />
> > <workdirectory>/gpfs/pads/oops/workflows</workdirectory>
> > <profile namespace="karajan" key="jobThrottle">0.03</profile>
> > <profile namespace="karajan" key="initialScore">10000</profile>
> > </pool>
> >
> > Caused by: qstat failed (exit code 255)
> >
> > at
> > org.globus.cog.karajan.workflow.events.FailureNotificationEvent.<init>(FailureNotificationEvent.java:36)
> > at
> > org.globus.cog.karajan.workflow.events.FailureNotificationEvent.<init>(FailureNotificationEvent.java:42)
> > at
> > org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:147)
> > at
> > org.globus.cog.karajan.workflow.nodes.grid.GridExec.taskFailed(GridExec.java:320)
> > at
> > org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:276)
> > at
> > org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:104)
> > at
> > org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168)
> > at
> > org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:661)
> > at
> > org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:424)
> > at
> > org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:410)
> > at
> > org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:236)
> > at
> > org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:224)
> > at
> > org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:54)
> > at
> > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.processFailed(AbstractJobSubmissionTaskHandler.java:101)
> > at
> > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.processFailed(AbstractExecutor.java:246)
> > at
> > org.globus.cog.abstraction.impl.scheduler.common.Job.fail(Job.java:198)
> > at
> > org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.failAll(AbstractQueuePoller.java:141)
> > at
> > org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:172)
> > at
> > org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:80)
> > ... 1 more
> >
> >
> > > ----- "Ti Leggett" <pads-support at ci.uchicago.edu> wrote:
> > >
> > >
> > >> There are currently no Globus job managers, but GridFTP is at
> > >> stor.ci.uchicago.edu.
> > >>
> > >> I'll work on getting GRAM up. Do you need WSRF or Pre-WS GRAM?
> > >>
> > >> On Wed Feb 24 14:45:57 2010, wwj at ci.uchicago.edu wrote:
> > >>
> > >>> Hello,
> > >>> Could you please tell me whether there is any globus services
> > for
> > >>>
> > >>> PADS such as GridFTP and GRAM?
> > >>> I'm trying to launch jobs to PADS remotely but can't figure
> > out
> > >>>
> > >> what
> > >>
> > >>> are the right URLs for the GridFTP
> > >>> service and GRAM.
> > >>> Thanks!
> > >>>
> > >>> Wenjun
> > >>>
> > >
> > >
More information about the Swift-user
mailing list