[Swift-devel] cleanup fails on abe
Jonathan Monette
jon.monette at gmail.com
Fri Aug 13 13:33:30 CDT 2010
My problem is on PADS. And no I do not have a cleanup error. At the
end of the run where it qdel all the workers it, it just hands. Once
all the jobs in the queue have been deleted swift hangs and I have to
'control c' the job to gain control of the terminal again. On my larger
jobs I get a 'Failed to shutdown block' error.
On 8/13/10 1:16 PM, Michael Wilde wrote:
> Sarah, what does "shut off cleanup" mean?
>
> Jon, is there any similarity between what Sarah is encountering and what you observed on TeraPort (presumably using provider=coaster jobmanager=local:pbs)?
>
> - Mike
>
> ----- "Sarah Kenny"<skenny at uchicago.edu> wrote:
>
>
>> hi all, not sure if anyone else is running on abe, but for some
>> reason
>> cleanup seems to fail on there very consistently. swift throws a
>> warning:
>>
>> The following warnings have occurred:
>> 1. Cleanup on ABE failed
>> Caused by:
>>
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>> Cannot submit job
>> at
>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:146)
>> at
>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:100)
>> at
>> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
>> at
>> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
>> at
>> org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40)
>> Caused by: org.globus.gram.GramException: Parameter not supported
>> at org.globus.gram.Gram.request(Gram.java:358)
>> at org.globus.gram.GramJob.request(GramJob.java:262)
>> at
>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:134)
>> ... 4 more
>>
>> if i shut off cleanup, i don't get the warning and the workflow
>> 'apprears' to have completed successfully, however even with cleanup
>> shut off pbs still generates the email below giving the error:
>>
>>
>> i'm still poking around to see if i can figure out what's up, but
>> thought i would throw this out there in case someone else has come
>> across it.
>>
>> swift, coaster and gram logs attached.
>>
>> ~sk
>>
>> ---------- Forwarded message ----------
>> From: adm<adm at ncsa.uiuc.edu>
>> Date: Fri, Aug 13, 2010 at 12:53 PM
>> Subject: PBS JOB 3000582.abem5.ncsa.uiuc.edu
>> To: skenny at abe1196.ncsa.uiuc.edu
>>
>>
>> PBS Job Id: 3000582.abem5.ncsa.uiuc.edu
>> Job Name: configtester
>> Exec host:
>> abe0553/0+abe0314/0+abe0313/0+abe0311/0+abe0310/0+abe0307/0+abe0294/0+abe0290/0+abe0287/0+abe0286/0+abe0285/0+abe0284/0+abe0283/0+abe0279/0+abe0278/0+abe0277/0+abe0275/0+abe0273/0+abe0272/0+abe0271/0+abe0256/0+abe0254/0+abe0174/0+abe0173/0+abe0166/0+abe0165/0+abe0163/0+abe0087/0+abe0085/0+abe0084/0+abe0010/0+abe0387/0
>> An error has occurred processing your job, see below.
>> Post job file processing error; job 3000582.abem5.ncsa.uiuc.edu on
>> host
>> abe0553/0+abe0314/0+abe0313/0+abe0311/0+abe0310/0+abe0307/0+abe0294/0+abe0290/0+abe0287/0+abe0286/0+abe0285/0+abe0284/0+abe0283/0+abe0279/0+abe0278/0+abe0277/0+abe0275/0+abe0273/0+abe0272/0+abe0271/0+abe0256/0+abe0254/0+abe0174/0+abe0173/0+abe0166/0+abe0165/0+abe0163/0+abe0087/0+abe0085/0+abe0084/0+abe0010/0+abe0387/0
>>
>> Unable to copy file
>> /u/ac/skenny/.pbs_spool//3000582.abem5.ncsa.uiuc.edu.OU to
>> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stdout
>> *** error from copy
>> /bin/cp: cannot create regular file
>> `/u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stdout':
>> No such file or directory
>> *** end error output
>>
>> Unable to copy file
>> /u/ac/skenny/.pbs_spool//3000582.abem5.ncsa.uiuc.edu.ER to
>> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stderr
>> *** error from copy
>> /bin/cp: cannot create regular file
>> `/u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stderr':
>> No such file or directory
>> *** end error output
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>
--
Jon
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein
More information about the Swift-devel
mailing list