[Swift-devel] cleanup fails on abe
Jonathan Monette
jon.monette at gmail.com
Fri Aug 13 14:01:51 CDT 2010
Ok. I will do a killall on the nc jobs. Also if swift fails to
shutdown the block does the coaster service shutdown? On my largest run
it failed to shutdown 4 blocks and no has hung. There are no more jobs
in the queue and none are being submitted anymore.
On 8/13/10 1:37 PM, Justin M Wozniak wrote:
>
> I have been seeing this as well on PADS, I'm looking into it...
>
> (Note that this now also may leave nc processes running, use killall
> if those pile up.)
>
> On Fri, 13 Aug 2010, Jonathan Monette wrote:
>
>> My problem is on PADS. And no I do not have a cleanup error. At the
>> end of the run where it qdel all the workers it, it just hands. Once
>> all the jobs in the queue have been deleted swift hangs and I have to
>> 'control c' the job to gain control of the terminal again. On my
>> larger jobs I get a 'Failed to shutdown block' error.
>>
>> On 8/13/10 1:16 PM, Michael Wilde wrote:
>>> Sarah, what does "shut off cleanup" mean?
>>>
>>> Jon, is there any similarity between what Sarah is encountering and
>>> what you observed on TeraPort (presumably using provider=coaster
>>> jobmanager=local:pbs)?
>>>
>>> - Mike
>>>
>>> ----- "Sarah Kenny"<skenny at uchicago.edu> wrote:
>>>
>>>
>>>> hi all, not sure if anyone else is running on abe, but for some
>>>> reason
>>>> cleanup seems to fail on there very consistently. swift throws a
>>>> warning:
>>>>
>>>> The following warnings have occurred:
>>>> 1. Cleanup on ABE failed
>>>> Caused by:
>>>>
>>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>>>> Cannot submit job
>>>> at
>>>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:146)
>>>>
>>>> at
>>>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:100)
>>>>
>>>> at
>>>> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
>>>>
>>>> at
>>>> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
>>>>
>>>> at
>>>> org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40)
>>>>
>>>> Caused by: org.globus.gram.GramException: Parameter not supported
>>>> at org.globus.gram.Gram.request(Gram.java:358)
>>>> at org.globus.gram.GramJob.request(GramJob.java:262)
>>>> at
>>>> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:134)
>>>>
>>>> ... 4 more
>>>>
>>>> if i shut off cleanup, i don't get the warning and the workflow
>>>> 'apprears' to have completed successfully, however even with cleanup
>>>> shut off pbs still generates the email below giving the error:
>>>>
>>>>
>>>> i'm still poking around to see if i can figure out what's up, but
>>>> thought i would throw this out there in case someone else has come
>>>> across it.
>>>>
>>>> swift, coaster and gram logs attached.
>>>>
>>>> ~sk
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: adm<adm at ncsa.uiuc.edu>
>>>> Date: Fri, Aug 13, 2010 at 12:53 PM
>>>> Subject: PBS JOB 3000582.abem5.ncsa.uiuc.edu
>>>> To: skenny at abe1196.ncsa.uiuc.edu
>>>>
>>>>
>>>> PBS Job Id: 3000582.abem5.ncsa.uiuc.edu
>>>> Job Name: configtester
>>>> Exec host:
>>>>
>>>> abe0553/0+abe0314/0+abe0313/0+abe0311/0+abe0310/0+abe0307/0+abe0294/0+abe0290/0+abe0287/0+abe0286/0+abe0285/0+abe0284/0+abe0283/0+abe0279/0+abe0278/0+abe0277/0+abe0275/0+abe0273/0+abe0272/0+abe0271/0+abe0256/0+abe0254/0+abe0174/0+abe0173/0+abe0166/0+abe0165/0+abe0163/0+abe0087/0+abe0085/0+abe0084/0+abe0010/0+abe0387/0
>>>>
>>>> An error has occurred processing your job, see below.
>>>> Post job file processing error; job 3000582.abem5.ncsa.uiuc.edu on
>>>> host
>>>> abe0553/0+abe0314/0+abe0313/0+abe0311/0+abe0310/0+abe0307/0+abe0294/0+abe0290/0+abe0287/0+abe0286/0+abe0285/0+abe0284/0+abe0283/0+abe0279/0+abe0278/0+abe0277/0+abe0275/0+abe0273/0+abe0272/0+abe0271/0+abe0256/0+abe0254/0+abe0174/0+abe0173/0+abe0166/0+abe0165/0+abe0163/0+abe0087/0+abe0085/0+abe0084/0+abe0010/0+abe0387/0
>>>>
>>>>
>>>> Unable to copy file
>>>> /u/ac/skenny/.pbs_spool//3000582.abem5.ncsa.uiuc.edu.OU to
>>>> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stdout
>>>> *** error from copy
>>>> /bin/cp: cannot create regular file
>>>> `/u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stdout':
>>>>
>>>> No such file or directory
>>>> *** end error output
>>>>
>>>> Unable to copy file
>>>> /u/ac/skenny/.pbs_spool//3000582.abem5.ncsa.uiuc.edu.ER to
>>>> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stderr
>>>> *** error from copy
>>>> /bin/cp: cannot create regular file
>>>> `/u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stderr':
>>>>
>>>> No such file or directory
>>>> *** end error output
>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>
>>
>>
>
--
Jon
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein
More information about the Swift-devel
mailing list