[Swift-devel] cleanup fails on abe

Mihael Hategan hategan at mcs.anl.gov
Mon Aug 16 11:36:07 CDT 2010


Looks like some coaster parameters are making it into the RSL and I
believe that's what GRAM is complaining about.

On Mon, 2010-08-16 at 11:28 -0500, Sarah Kenny wrote:
> here's the entirety of the gram log for the rm job:
> 
> 8/16 10:58:31 JM: Security context imported
> 8/16 10:58:31 Pre-parsed RSL string: &( directory =
> "/scratch/users/skenny" )( arguments = "-rf"
> "rmx-20100816-1055-k1z90mf7" )( maxnodes = "16" )( executable =
> "/bin/rm" )( maxwalltime = "30" )( project = "TG-DBS080004N" )( queue
> = "normal" )( slots = "10" )( nodegranularity = "16" )( name =
> "cleantest" )( workerspernode = "1" )
> 8/16 10:58:31
> <<<<<Job Request RSL
> &("directory" = "/scratch/users/skenny" )("arguments" = "-rf"
> "rmx-20100816-1055-k1z90mf7" )("maxnodes" = "16" )("executable" =
> "/bin/rm" )("maxwalltime" = "30" )("project" = "TG-DBS080004N"
> )("queue" = "normal" )("slots" = "10" )("nodegranularity" = "16"
> )("name" = "cleantest" )("workerspernode" = "1" )
> >>>>>Job Request RSL
> 8/16 10:58:31
> <<<<<Job Request RSL (canonical)
> &("directory" = "/scratch/users/skenny" )("arguments" = "-rf"
> "rmx-20100816-1055-k1z90mf7" )("maxnodes" = "16" )("executable" =
> "/bin/rm" )("maxwalltime" = "30" )("project" = "TG-DBS080004N"
> )("queue" = "normal" )("slots" = "10" )("nodegranularity" = "16"
> )("name" = "cleantest" )("workerspernode" = "1" )
> >>>>>Job Request RSL (canonical)
> 8/16 10:58:31
> <<<<<Job RSL
> &("environment" = ("HOME" "/u/ac/skenny" ) ("LOGNAME" "skenny" )
> )("directory" = "/scratch/users/skenny" )("arguments" = "-rf"
> "rmx-20100816-1055-k1z90mf7")("maxnodes" = "16" )("executable" =
> "/bin/rm" )("maxwalltime" = "30" )("project" = "TG-DBS080004N"
> )("queue" = "normal" )("slots" = "10" )("nodegranularity" = "16"
> )("name" = "cleantest" )("workerspernode" = "1" )
> >>>>>Job RSL
> 8/16 10:58:31
> <<<<<Job RSL (post-eval)
> &("environment" = ("HOME" "/u/ac/skenny" ) ("LOGNAME" "skenny" )
> )("directory" = "/scratch/users/skenny" )("arguments" = "-rf"
> "rmx-20100816-1055-k1z90mf7" )("maxnodes" = "16" )("executable" =
> "/bin/rm" )("maxwalltime" = "30" )("project" = "TG-DBS080004N"
> )("queue" = "normal" )("slots" = "10" )("nodegranularity\
> " = "16" )("name" = "cleantest" )("workerspernode" = "1" )
> >>>>>Job RSL (post-eval)
> 8/16 10:58:31 JMI: testing job manager scripts for type pbs exist and
> permissions are ok.
> 8/16 10:58:31 JMI: completed script validation: job manager type is pbs.
> 8/16 10:58:31 JMI: cmd = cache_cleanup
> Mon Aug 16 10:58:31 2010 JM_SCRIPT: New Perl JobManager created.
> Mon Aug 16 10:58:31 2010 JM_SCRIPT: Using jm supplied job dir:
> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/2408.1281974311
> Mon Aug 16 10:58:31 2010 JM_SCRIPT: Using jm supplied job dir:
> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/2408.1281974311
> Mon Aug 16 10:58:31 2010 JM_SCRIPT: cache_cleanup(enter)
> Mon Aug 16 10:58:31 2010 JM_SCRIPT: Cleaning files in job dir
> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/2408.1281974311
> Mon Aug 16 10:58:31 2010 JM_SCRIPT: Removed 1 files from
> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/2408.1281974311
> Mon Aug 16 10:58:31 2010 JM_SCRIPT: cache_cleanup(exit)
> 8/16 10:58:31 JM: before sending to client: rc=0 (Success)
> 8/16 10:58:31 JM: in globus_gram_job_manager_reporting_file_remove()
> 8/16 10:58:31 JM: in globus_gram_job_manager_reporting_file_remove()
> 8/16 10:58:31 JM: exiting globus_gram_job_manager.
> 
> as far as i can tell i'm not at quota on my work or home dir's on abe.
> yeah we were able to run fine before...haven't changed our config
> since then so maybe something on their end.
> 
> 
> On Fri, Aug 13, 2010 at 1:19 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > Can you can find the gram log for the cleanup job (it's a /bin/rm)?
> >
> > Also, I remember you being able to run things just fine on Abe. Are you
> > aware of any configuration changes there? Any disks full?
> >
> > On Fri, 2010-08-13 at 13:11 -0500, Sarah Kenny wrote:
> >> hi all, not sure if anyone else is running on abe, but for some reason
> >> cleanup seems to fail on there very consistently. swift throws a
> >> warning:
> >>
> >> The following warnings have occurred:
> >> 1. Cleanup on ABE failed
> >> Caused by:
> >>
> >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> >> Cannot submit job
> >>         at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:146)
> >>         at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:100)
> >>         at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:46)
> >>         at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:50)
> >>         at org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40)
> >> Caused by: org.globus.gram.GramException: Parameter not supported
> >>         at org.globus.gram.Gram.request(Gram.java:358)
> >>         at org.globus.gram.GramJob.request(GramJob.java:262)
> >>         at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:134)
> >>         ... 4 more
> >>
> >> if i shut off cleanup, i don't get the warning and the workflow
> >> 'apprears' to have completed successfully, however even with cleanup
> >> shut off pbs still generates the email below giving the error:
> >>
> >>
> >> i'm still poking around to see if i can figure out what's up, but
> >> thought i would throw this out there in case someone else has come
> >> across it.
> >>
> >> swift, coaster and gram logs attached.
> >>
> >> ~sk
> >>
> >> ---------- Forwarded message ----------
> >> From: adm <adm at ncsa.uiuc.edu>
> >> Date: Fri, Aug 13, 2010 at 12:53 PM
> >> Subject: PBS JOB 3000582.abem5.ncsa.uiuc.edu
> >> To: skenny at abe1196.ncsa.uiuc.edu
> >>
> >>
> >> PBS Job Id: 3000582.abem5.ncsa.uiuc.edu
> >> Job Name:   configtester
> >> Exec host:  abe0553/0+abe0314/0+abe0313/0+abe0311/0+abe0310/0+abe0307/0+abe0294/0+abe0290/0+abe0287/0+abe0286/0+abe0285/0+abe0284/0+abe0283/0+abe0279/0+abe0278/0+abe0277/0+abe0275/0+abe0273/0+abe0272/0+abe0271/0+abe0256/0+abe0254/0+abe0174/0+abe0173/0+abe0166/0+abe0165/0+abe0163/0+abe0087/0+abe0085/0+abe0084/0+abe0010/0+abe0387/0
> >> An error has occurred processing your job, see below.
> >> Post job file processing error; job 3000582.abem5.ncsa.uiuc.edu on
> >> host abe0553/0+abe0314/0+abe0313/0+abe0311/0+abe0310/0+abe0307/0+abe0294/0+abe0290/0+abe0287/0+abe0286/0+abe0285/0+abe0284/0+abe0283/0+abe0279/0+abe0278/0+abe0277/0+abe0275/0+abe0273/0+abe0272/0+abe0271/0+abe0256/0+abe0254/0+abe0174/0+abe0173/0+abe0166/0+abe0165/0+abe0163/0+abe0087/0+abe0085/0+abe0084/0+abe0010/0+abe0387/0
> >>
> >> Unable to copy file
> >> /u/ac/skenny/.pbs_spool//3000582.abem5.ncsa.uiuc.edu.OU to
> >> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stdout
> >> *** error from copy
> >> /bin/cp: cannot create regular file
> >> `/u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stdout':
> >> No such file or directory
> >> *** end error output
> >>
> >> Unable to copy file
> >> /u/ac/skenny/.pbs_spool//3000582.abem5.ncsa.uiuc.edu.ER to
> >> /u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stderr
> >> *** error from copy
> >> /bin/cp: cannot create regular file
> >> `/u/ac/skenny/.globus/job/abe1196.ncsa.uiuc.edu/15575.1281721892/stderr':
> >> No such file or directory
> >> *** end error output
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >
> >
> >





More information about the Swift-devel mailing list