[Swift-user] Re: How to run coasters over ssh:pbs ?

Wenjun Wu wwj at ci.uchicago.edu
Thu Feb 25 17:16:53 CST 2010


Hi Mihael,
  Please find the attached log file and sites.xml.
  I run the swift script from sidgrid to pads. And ssh "which qsub" 
returns the right path.
  But the coaster didn't work for me.

 Thanks!

Wenjun
> I think what's happening in Wenjun's case is that qsub is not in the
> path. I suppose this could be tested by running a simple ssh job that
> does "which qsub".
>
> On Thu, 2010-02-25 at 16:08 -0600, Michael Wilde wrote:
>   
>> Was: Re: [PADS Support #3457] globus services for accessing PADS
>>
>> --
>>
>> Hi Wenjun,
>>
>> Are you doing this from communicado? If so, you need to use ssh to get
>> from communicado to PADS. And since when you get there you want to run
>> coasters. you need I think to use the coaster provider and jobmanager
>> = ssh:pbs.
>>
>> I'm cc'ing swift-user and Mihael to discuss the exact sites options
>> needed.
>>
>> We'd want coasters to use ssh to launch one PBS job to launch all the
>> workers. 
>>
>> I think there are still some issues on PADS with interpreting #nodes
>> and workerspernode correctly, so please bear with us while we figure
>> this out.
>>
>> I'll try this myself later tonight, but if you discover anything
>> (either working of failing) please post to the User list.
>>
>> Thanks,
>>
>> Mike
>>
>>
>> ----- "Wenjun Wu" <wwj at ci.uchicago.edu> wrote:
>>
>>     
>>> Hi Mike,
>>>   I tried to run loop model through ssh+pbs on PADS. The ssh provider
>>>
>>> seems working but pbs not.
>>>   Please check the site file and error message in the below:
>>>  
>>> Wenjun
>>>       
>>>> Wenjun,
>>>>
>>>> I think having GRAM (GRAM5 especially) on PADS would be useful.
>>>>
>>>> But in the meantime, I think you can also use swift's ssh provider
>>>>         
>>> to get to PADS.
>>>       
>>>> We'll need to try that.
>>>>
>>>> - Mike
>>>>
>>>>   
>>>>         
>>> <pool handle="pads">
>>>     <!-- gridftp url="gsiftp://stor.ci.uchicago.edu"/ -->
>>>     <filesystem url="login.pads.ci.uchicago.edu" provider="ssh" />
>>>     <execution provider="pbs" url="svc.pads.ci.uchicago.edu" />
>>>     <workdirectory>/gpfs/pads/oops/workflows</workdirectory>
>>>     <profile namespace="karajan" key="jobThrottle">0.03</profile>
>>>     <profile namespace="karajan" key="initialScore">10000</profile>
>>>  </pool>
>>>
>>> Caused by: qstat failed (exit code 255)
>>>
>>>         at 
>>> org.globus.cog.karajan.workflow.events.FailureNotificationEvent.<init>(FailureNotificationEvent.java:36)
>>>         at 
>>> org.globus.cog.karajan.workflow.events.FailureNotificationEvent.<init>(FailureNotificationEvent.java:42)
>>>         at 
>>> org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:147)
>>>         at 
>>> org.globus.cog.karajan.workflow.nodes.grid.GridExec.taskFailed(GridExec.java:320)
>>>         at 
>>> org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:276)
>>>         at 
>>> org.griphyn.vdl.karajan.lib.Execute.statusChanged(Execute.java:104)
>>>         at 
>>> org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:168)
>>>         at 
>>> org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:661)
>>>         at 
>>> org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:424)
>>>         at 
>>> org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:410)
>>>         at 
>>> org.globus.cog.abstraction.impl.common.task.TaskImpl.notifyListeners(TaskImpl.java:236)
>>>         at 
>>> org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:224)
>>>         at 
>>> org.globus.cog.abstraction.impl.common.AbstractDelegatedTaskHandler.failTask(AbstractDelegatedTaskHandler.java:54)
>>>         at 
>>> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.processFailed(AbstractJobSubmissionTaskHandler.java:101)
>>>         at 
>>> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.processFailed(AbstractExecutor.java:246)
>>>         at 
>>> org.globus.cog.abstraction.impl.scheduler.common.Job.fail(Job.java:198)
>>>         at 
>>> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.failAll(AbstractQueuePoller.java:141)
>>>         at 
>>> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.pollQueue(AbstractQueuePoller.java:172)
>>>         at 
>>> org.globus.cog.abstraction.impl.scheduler.common.AbstractQueuePoller.run(AbstractQueuePoller.java:80)
>>>         ... 1 more
>>>
>>>
>>>       
>>>> ----- "Ti Leggett" <pads-support at ci.uchicago.edu> wrote:
>>>>
>>>>   
>>>>         
>>>>> There are currently no Globus job managers, but GridFTP is at
>>>>> stor.ci.uchicago.edu.
>>>>>
>>>>> I'll work on getting GRAM up. Do you need WSRF or Pre-WS GRAM?
>>>>>
>>>>> On Wed Feb 24 14:45:57 2010, wwj at ci.uchicago.edu wrote:
>>>>>     
>>>>>           
>>>>>> Hello,
>>>>>>    Could you please tell me whether there is any globus services
>>>>>>             
>>> for
>>>       
>>>>>>       
>>>>>> PADS such as GridFTP and GRAM?
>>>>>>    I'm trying to launch jobs to PADS remotely but can't figure
>>>>>>             
>>> out
>>>       
>>>>>>       
>>>>>>             
>>>>> what 
>>>>>     
>>>>>           
>>>>>> are the right URLs for the GridFTP
>>>>>>    service and GRAM.
>>>>>>    Thanks!
>>>>>>
>>>>>> Wenjun
>>>>>>       
>>>>>>             
>>>>         
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sites.xml
Type: text/xml
Size: 2314 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20100225/8f568528/attachment.xml>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: psim.loops-20100225-1712-3zrtb5i5.log
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20100225/8f568528/attachment.ksh>


More information about the Swift-user mailing list