[Swift-user] How to run coasters over ssh:pbs ?

Michael Wilde wilde at mcs.anl.gov
Thu Feb 25 16:08:13 CST 2010


Was: Re: [PADS Support #3457] globus services for accessing PADS

--

Hi Wenjun,

Are you doing this from communicado? If so, you need to use ssh to get from communicado to PADS. And since when you get there you want to run coasters. you need I think to use the coaster provider and jobmanager = ssh:pbs.

I'm cc'ing swift-user and Mihael to discuss the exact sites options needed.

We'd want coasters to use ssh to launch one PBS job to launch all the workers. 

I think there are still some issues on PADS with interpreting #nodes and workerspernode correctly, so please bear with us while we figure this out.

I'll try this myself later tonight, but if you discover anything (either working of failing) please post to the User list.

Thanks,

Mike


----- "Wenjun Wu" <wwj at ci.uchicago.edu> wrote:

> Hi Mike,
>   I tried to run loop model through ssh+pbs on PADS. The ssh provider
> 
> seems working but pbs not.
>   Please check the site file and error message in the below:
>  
> Wenjun
> > Wenjun,
> >
> > I think having GRAM (GRAM5 especially) on PADS would be useful.
> >
> > But in the meantime, I think you can also use swift's ssh provider
> to get to PADS.
> > We'll need to try that.
> >
> > - Mike
> >
> >   
> 
> <pool handle="pads">
>     <!-- gridftp url="gsiftp://stor.ci.uchicago.edu"/ -->
>     <filesystem url="login.pads.ci.uchicago.edu" provider="ssh" />
>     <execution provider="pbs" url="svc.pads.ci.uchicago.edu" />
>     <workdirectory>/gpfs/pads/oops/workflows</workdirectory>
>     <profile namespace="karajan" key="jobThrottle">0.03</profile>
>     <profile namespace="karajan" key="initialScore">10000</profile>
>  </pool>
> 
> Caused by: qstat failed (exit code 255)
> 
>         at 
> org.globus.cog.karajan.workflow.events.FailureNotificationEvent.<init>(FailureNotificationEvent.java:36)
>         at 
> org.globus.cog.karajan.workflow.events.FailureNotificationEvent.<init>(FailureNotificationEvent.java:42)
>         at 
> org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:147)
>         at 
> org.globus.cog.karajan.workflow.nodes.grid.GridExec.taskFailed(GridExec.java:320)
>         at 
> org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:276)
>         at 
> org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:104)
>         at 
> org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168)
>         at 
> org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:661)
>         at 
> org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:424)
>         at 
> org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:410)
>         at 
> org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:236)
>         at 
> org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:224)
>         at 
> org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:54)
>         at 
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.processFailed(AbstractJobSubmissionTaskHandler.java:101)
>         at 
> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.processFailed(AbstractExecutor.java:246)
>         at 
> org.globus.cog.abstraction.impl.scheduler.common.Job.fail(Job.java:198)
>         at 
> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.failAll(AbstractQueuePoller.java:141)
>         at 
> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:172)
>         at 
> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:80)
>         ... 1 more
> 
> 
> > ----- "Ti Leggett" <pads-support at ci.uchicago.edu> wrote:
> >
> >   
> >> There are currently no Globus job managers, but GridFTP is at
> >> stor.ci.uchicago.edu.
> >>
> >> I'll work on getting GRAM up. Do you need WSRF or Pre-WS GRAM?
> >>
> >> On Wed Feb 24 14:45:57 2010, wwj at ci.uchicago.edu wrote:
> >>     
> >>> Hello,
> >>>    Could you please tell me whether there is any globus services
> for
> >>>       
> >>> PADS such as GridFTP and GRAM?
> >>>    I'm trying to launch jobs to PADS remotely but can't figure
> out
> >>>       
> >> what 
> >>     
> >>> are the right URLs for the GridFTP
> >>>    service and GRAM.
> >>>    Thanks!
> >>>
> >>> Wenjun
> >>>       
> >
> >



More information about the Swift-user mailing list