[Swift-user] Re: How to run coasters over ssh:pbs ?
Wenjun Wu
wwj at ci.uchicago.edu
Thu Feb 25 17:16:53 CST 2010
Hi Mihael,
Please find the attached log file and sites.xml.
I run the swift script from sidgrid to pads. And ssh "which qsub"
returns the right path.
But the coaster didn't work for me.
Thanks!
Wenjun
> I think what's happening in Wenjun's case is that qsub is not in the
> path. I suppose this could be tested by running a simple ssh job that
> does "which qsub".
>
> On Thu, 2010-02-25 at 16:08 -0600, Michael Wilde wrote:
>
>> Was: Re: [PADS Support #3457] globus services for accessing PADS
>>
>> --
>>
>> Hi Wenjun,
>>
>> Are you doing this from communicado? If so, you need to use ssh to get
>> from communicado to PADS. And since when you get there you want to run
>> coasters. you need I think to use the coaster provider and jobmanager
>> = ssh:pbs.
>>
>> I'm cc'ing swift-user and Mihael to discuss the exact sites options
>> needed.
>>
>> We'd want coasters to use ssh to launch one PBS job to launch all the
>> workers.
>>
>> I think there are still some issues on PADS with interpreting #nodes
>> and workerspernode correctly, so please bear with us while we figure
>> this out.
>>
>> I'll try this myself later tonight, but if you discover anything
>> (either working of failing) please post to the User list.
>>
>> Thanks,
>>
>> Mike
>>
>>
>> ----- "Wenjun Wu" <wwj at ci.uchicago.edu> wrote:
>>
>>
>>> Hi Mike,
>>> I tried to run loop model through ssh+pbs on PADS. The ssh provider
>>>
>>> seems working but pbs not.
>>> Please check the site file and error message in the below:
>>>
>>> Wenjun
>>>
>>>> Wenjun,
>>>>
>>>> I think having GRAM (GRAM5 especially) on PADS would be useful.
>>>>
>>>> But in the meantime, I think you can also use swift's ssh provider
>>>>
>>> to get to PADS.
>>>
>>>> We'll need to try that.
>>>>
>>>> - Mike
>>>>
>>>>
>>>>
>>> <pool handle="pads">
>>> <!-- gridftp url="gsiftp://stor.ci.uchicago.edu"/ -->
>>> <filesystem url="login.pads.ci.uchicago.edu" provider="ssh" />
>>> <execution provider="pbs" url="svc.pads.ci.uchicago.edu" />
>>> <workdirectory>/gpfs/pads/oops/workflows</workdirectory>
>>> <profile namespace="karajan" key="jobThrottle">0.03</profile>
>>> <profile namespace="karajan" key="initialScore">10000</profile>
>>> </pool>
>>>
>>> Caused by: qstat failed (exit code 255)
>>>
>>> at
>>> org.globus.cog.karajan.workflow.events.FailureNotificationEvent.<init>(FailureNotificationEvent.java:36)
>>> at
>>> org.globus.cog.karajan.workflow.events.FailureNotificationEvent.<init>(FailureNotificationEvent.java:42)
>>> at
>>> org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:147)
>>> at
>>> org.globus.cog.karajan.workflow.nodes.grid.GridExec.taskFailed(GridExec.java:320)
>>> at
>>> org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:276)
>>> at
>>> org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:104)
>>> at
>>> org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168)
>>> at
>>> org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:661)
>>> at
>>> org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:424)
>>> at
>>> org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:410)
>>> at
>>> org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:236)
>>> at
>>> org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:224)
>>> at
>>> org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:54)
>>> at
>>> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.processFailed(AbstractJobSubmissionTaskHandler.java:101)
>>> at
>>> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.processFailed(AbstractExecutor.java:246)
>>> at
>>> org.globus.cog.abstraction.impl.scheduler.common.Job.fail(Job.java:198)
>>> at
>>> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.failAll(AbstractQueuePoller.java:141)
>>> at
>>> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:172)
>>> at
>>> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:80)
>>> ... 1 more
>>>
>>>
>>>
>>>> ----- "Ti Leggett" <pads-support at ci.uchicago.edu> wrote:
>>>>
>>>>
>>>>
>>>>> There are currently no Globus job managers, but GridFTP is at
>>>>> stor.ci.uchicago.edu.
>>>>>
>>>>> I'll work on getting GRAM up. Do you need WSRF or Pre-WS GRAM?
>>>>>
>>>>> On Wed Feb 24 14:45:57 2010, wwj at ci.uchicago.edu wrote:
>>>>>
>>>>>
>>>>>> Hello,
>>>>>> Could you please tell me whether there is any globus services
>>>>>>
>>> for
>>>
>>>>>>
>>>>>> PADS such as GridFTP and GRAM?
>>>>>> I'm trying to launch jobs to PADS remotely but can't figure
>>>>>>
>>> out
>>>
>>>>>>
>>>>>>
>>>>> what
>>>>>
>>>>>
>>>>>> are the right URLs for the GridFTP
>>>>>> service and GRAM.
>>>>>> Thanks!
>>>>>>
>>>>>> Wenjun
>>>>>>
>>>>>>
>>>>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sites.xml
Type: text/xml
Size: 2314 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20100225/8f568528/attachment.xml>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: psim.loops-20100225-1712-3zrtb5i5.log
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20100225/8f568528/attachment.ksh>
More information about the Swift-user
mailing list