[Swift-devel] cleanup fails on abe

Mihael Hategan hategan at mcs.anl.gov
Fri Aug 13 13:36:15 CDT 2010


On Fri, 2010-08-13 at 13:33 -0500, Jonathan Monette wrote:
> My problem is on PADS.  And no I do not have a cleanup error.  At the 
> end of the run where it qdel all the workers it, it just hands.  Once 
> all the jobs in the queue have been deleted swift hangs and I have to 
> 'control c' the job to gain control of the terminal again.  On my larger 
> jobs I get a 'Failed to shutdown block' error.

There is a watchdog that will eventually (after 5 minutes) force the
service to shut down.

It may also be possible to shut the service down asynchronously (i.e.
send the command and then not wait for the actual shutdown).

Mihael

> 
> On 8/13/10 1:16 PM, Michael Wilde wrote:
> > Sarah, what does "shut off cleanup" mean?
> >
> > Jon, is there any similarity between what Sarah is encountering and what you observed on TeraPort (presumably using provider=coaster jobmanager=local:pbs)?
> >
> > - Mike
> >
> > ----- "Sarah Kenny"<skenny at uchicago.edu>  wrote:
> >
> >    
> >> hi all, not sure if anyone else is running on abe, but for some
> >> reason
> >> cleanup seems to fail on there very consistently. swift throws a
> >> warning:
> >>
> >> The following warnings have occurred:
> >> 1. Cleanup on ABE failed
> >> Caused by:
> >>
> >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> >> Cannot submit job
> >>          at
> >> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:146)
> >>          at
> >> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:100)
> >>          at
> >> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
> >>          at
> >> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
> >>          at
> >> org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40)
> >> Caused by: org.globus.gram.GramException: Parameter not supported
> >>          at org.globus.gram.Gram.request(Gram.java:358)
> >>          at org.globus.gram.GramJob.request(GramJob.java:262)
> >>          at
> >> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:134)
> >>          ... 4 more
> >>
> >> if i shut off cleanup, i don't get the warning and the workflow
> >> 'apprears' to have completed successfully, however even with cleanup
> >> shut off pbs still generates the email below giving the error:
> >>
> >>
> >> i'm still poking around to see if i can figure out what's up, but
> >> thought i would throw this out there in case someone else has come
> >> across it.
> >>
> >> swift, coaster and gram logs attached.
> >>
> >> ~sk
> >>
> >> ---------- Forwarded message ----------
> >> From: adm<adm at ncsa.uiuc.edu>
> >> Date: Fri, Aug 13, 2010 at 12:53 PM
> >> Subject: PBS JOB 3000582.abem5.ncsa.uiuc.edu
> >> To: skenny at abe1196.ncsa.uiuc.edu
> >>
> >>
> >> PBS Job Id: 3000582.abem5.ncsa.uiuc.edu
> >> Job Name:   configtester
> >> Exec host:
> >>   abe0553/0+abe0314/0+abe0313/0+abe0311/0+abe0310/0+abe0307/0+abe0294/0+abe0290/0+abe0287/0+abe0286/0+abe0285/0+abe0284/0+abe0283/0+abe0279/0+abe0278/0+abe0277/0+abe0275/0+abe0273/0+abe0272/0+abe0271/0+abe0256/0+abe0254/0+abe0174/0+abe0173/0+abe0166/0+abe0165/0+abe0163/0+abe0087/0+abe0085/0+abe0084/0+abe0010/0+abe0387/0
> >> An error has occurred processing your job, see below.
> >> Post job file processing error; job 3000582.abem5.ncsa.uiuc.edu on
> >> host
> >> abe0553/0+abe0314/0+abe0313/0+abe0311/0+abe0310/0+abe0307/0+abe0294/0+abe0290/0+abe0287/0+abe0286/0+abe0285/0+abe0284/0+abe0283/0+abe0279/0+abe0278/0+abe0277/0+abe0275/0+abe0273/0+abe0272/0+abe0271/0+abe0256/0+abe0254/0+abe0174/0+abe0173/0+abe0166/0+abe0165/0+abe0163/0+abe0087/0+abe0085/0+abe0084/0+abe0010/0+abe0387/0
> >>
> >> Unable to copy file
> >> /u/ac/skenny/.pbs_spool//3000582.abem5.ncsa.uiuc.edu.OU to
> >> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stdout
> >> *** error from copy
> >> /bin/cp: cannot create regular file
> >> `/u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stdout':
> >> No such file or directory
> >> *** end error output
> >>
> >> Unable to copy file
> >> /u/ac/skenny/.pbs_spool//3000582.abem5.ncsa.uiuc.edu.ER to
> >> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stderr
> >> *** error from copy
> >> /bin/cp: cannot create regular file
> >> `/u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stderr':
> >> No such file or directory
> >> *** end error output
> >>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>      
> >    
> 





More information about the Swift-devel mailing list