[Swift-devel] cleanup fails on abe

Justin M Wozniak wozniak at mcs.anl.gov
Fri Aug 13 13:37:06 CDT 2010


I have been seeing this as well on PADS, I'm looking into it...

(Note that this now also may leave nc processes running, use killall if 
those pile up.)

On Fri, 13 Aug 2010, Jonathan Monette wrote:

> My problem is on PADS.  And no I do not have a cleanup error.  At the end of 
> the run where it qdel all the workers it, it just hands.  Once all the jobs 
> in the queue have been deleted swift hangs and I have to 'control c' the job 
> to gain control of the terminal again.  On my larger jobs I get a 'Failed to 
> shutdown block' error.
>
> On 8/13/10 1:16 PM, Michael Wilde wrote:
>> Sarah, what does "shut off cleanup" mean?
>> 
>> Jon, is there any similarity between what Sarah is encountering and what 
>> you observed on TeraPort (presumably using provider=coaster 
>> jobmanager=local:pbs)?
>> 
>> - Mike
>> 
>> ----- "Sarah Kenny"<skenny at uchicago.edu>  wrote:
>>
>> 
>>> hi all, not sure if anyone else is running on abe, but for some
>>> reason
>>> cleanup seems to fail on there very consistently. swift throws a
>>> warning:
>>> 
>>> The following warnings have occurred:
>>> 1. Cleanup on ABE failed
>>> Caused by:
>>> 
>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>> Cannot submit job
>>>          at
>>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:146)
>>>          at
>>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:100)
>>>          at
>>> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
>>>          at
>>> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
>>>          at
>>> org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40)
>>> Caused by: org.globus.gram.GramException: Parameter not supported
>>>          at org.globus.gram.Gram.request(Gram.java:358)
>>>          at org.globus.gram.GramJob.request(GramJob.java:262)
>>>          at
>>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:134)
>>>          ... 4 more
>>> 
>>> if i shut off cleanup, i don't get the warning and the workflow
>>> 'apprears' to have completed successfully, however even with cleanup
>>> shut off pbs still generates the email below giving the error:
>>> 
>>> 
>>> i'm still poking around to see if i can figure out what's up, but
>>> thought i would throw this out there in case someone else has come
>>> across it.
>>> 
>>> swift, coaster and gram logs attached.
>>> 
>>> ~sk
>>> 
>>> ---------- Forwarded message ----------
>>> From: adm<adm at ncsa.uiuc.edu>
>>> Date: Fri, Aug 13, 2010 at 12:53 PM
>>> Subject: PBS JOB 3000582.abem5.ncsa.uiuc.edu
>>> To: skenny at abe1196.ncsa.uiuc.edu
>>> 
>>> 
>>> PBS Job Id: 3000582.abem5.ncsa.uiuc.edu
>>> Job Name:   configtester
>>> Exec host:
>>>   abe0553/0+abe0314/0+abe0313/0+abe0311/0+abe0310/0+abe0307/0+abe0294/0+abe0290/0+abe0287/0+abe0286/0+abe0285/0+abe0284/0+abe0283/0+abe0279/0+abe0278/0+abe0277/0+abe0275/0+abe0273/0+abe0272/0+abe0271/0+abe0256/0+abe0254/0+abe0174/0+abe0173/0+abe0166/0+abe0165/0+abe0163/0+abe0087/0+abe0085/0+abe0084/0+abe0010/0+abe0387/0
>>> An error has occurred processing your job, see below.
>>> Post job file processing error; job 3000582.abem5.ncsa.uiuc.edu on
>>> host
>>> abe0553/0+abe0314/0+abe0313/0+abe0311/0+abe0310/0+abe0307/0+abe0294/0+abe0290/0+abe0287/0+abe0286/0+abe0285/0+abe0284/0+abe0283/0+abe0279/0+abe0278/0+abe0277/0+abe0275/0+abe0273/0+abe0272/0+abe0271/0+abe0256/0+abe0254/0+abe0174/0+abe0173/0+abe0166/0+abe0165/0+abe0163/0+abe0087/0+abe0085/0+abe0084/0+abe0010/0+abe0387/0
>>> 
>>> Unable to copy file
>>> /u/ac/skenny/.pbs_spool//3000582.abem5.ncsa.uiuc.edu.OU to
>>> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stdout
>>> *** error from copy
>>> /bin/cp: cannot create regular file
>>> `/u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stdout':
>>> No such file or directory
>>> *** end error output
>>> 
>>> Unable to copy file
>>> /u/ac/skenny/.pbs_spool//3000582.abem5.ncsa.uiuc.edu.ER to
>>> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stderr
>>> *** error from copy
>>> /bin/cp: cannot create regular file
>>> `/u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stderr':
>>> No such file or directory
>>> *** end error output
>>> 
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>> 
>
>

-- 
Justin M Wozniak



More information about the Swift-devel mailing list