[Swift-devel] SWIFT jobs in C/X states on uc3-sub

Michael Wilde wilde at mcs.anl.gov
Tue Jun 4 11:56:52 CDT 2013


Hi Lincoln,

No update yet.  I just filed this as Swift bug 1010 assigned it to you, David.

Lincoln, can you help us on this?  Below is a sample of the Condor submit script generated by Swift.

Is our problem simply caused by the "leave_in_queue" flag being TRUE?  I think a few simple experiments with the .submit file below should help re-create and fix the problem.

Also, I see that our stdout and stderr files are empty.  David, should be be (optionally) capturing a per-job condor .log file?

Lincoln, any other guidance on the contents of the submit file?

- Mike

should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
Transfer_Executable = false
machine_count = 1
output = /home/wilde/.globus/scripts/Condor2152159861288524507.submit.stdout
error = /home/wilde/.globus/scripts/Condor2152159861288524507.submit.stderr
environment = WORKER_LOGGING_LEVEL=NONE;
executable = /usr/bin/perl
arguments = cscript5775404700699952879.pl http://10.1.3.94:53610,http://128.135.158.243:53610 0523-5710180-000010 NOLOGGING
transfer_input_files = /home/wilde/.globus/coasters/cscript5775404700699952879.pl
requirements = regexp("uc3-c*", Machine)
+accountinggroup = "group_friends.wilde"
notification = Never
leave_in_queue = TRUE
queue


----- Original Message -----
> From: "Lincoln Bryant" <lincolnb at uchicago.edu>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "David Kelly" <davidk at ci.uchicago.edu>, "swift-devel" <swift-devel at ci.uchicago.edu>, "uc3-support"
> <uc3-support at lists.uchicago.edu>
> Sent: Tuesday, June 4, 2013 11:34:42 AM
> Subject: Re: SWIFT jobs in C/X states on uc3-sub
> 
> Hi Mike,
> 
> I was just wondering if there was any update on this. I still see
> Swift jobs sit in the C state in our Condor pool after they've
> finished.
> 
> > [lincolnb at uc3-sub local]$ condor_q
> > 
> > 
> > -- Submitter: uc3-sub.uchicago.edu :
> > <10.1.3.94:9618?sock=25212_0c25_38> : uc3-sub.uchicago.edu
> >  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
> > 110372.0   maheshwari      6/4  09:19   0+00:21:40 C  0   732.4
> > perl cscript948689
> > 110373.0   maheshwari      6/4  09:19   0+00:21:39 C  0   732.4
> > perl cscript948689
> > 110374.0   maheshwari      6/4  09:19   0+00:21:40 C  0   732.4
> > perl cscript948689
> > 110375.0   maheshwari      6/4  09:19   0+00:21:39 C  0   732.4
> > perl cscript948689
> > 110376.0   maheshwari      6/4  09:19   0+00:21:39 C  0   732.4
> > perl cscript948689
> > 110377.0   maheshwari      6/4  09:19   0+00:21:39 C  0   732.4
> > perl cscript948689
> > 110378.0   maheshwari      6/4  09:19   0+00:21:39 C  0   732.4
> > perl cscript948689
> > 110379.0   maheshwari      6/4  09:19   0+00:21:41 C  0   463.9
> > perl cscript948689
> 
> 
> I have swift-0.94 installed on uc3-sub.uchicago.edu
> 
> Cheers,
> Lincoln Bryant
> 
> On Mar 20, 2013, at 11:58 AM, Michael Wilde wrote:
> 
> > My jobs must be fossils; David, Yadu, we should test whether and
> > why Swift doesnt always clean up.
> > 
> > I realize that if Swift hangs and needs to be SIGKILL'ed then it
> > cant.  But lets see if the Condor provider is cleaning up when
> > Swift gets a catchable signal.
> > 
> > Lincoln, lease remove the "wilde" jobs if you can do that.
> > 
> > Thanks,
> > 
> > - Mike
> > 
> > ----- Original Message -----
> >> From: "Lincoln Bryant" <lincolnb at uchicago.edu>
> >> To: "David Kelly" <davidk at ci.uchicago.edu>
> >> Cc: "Michael Wilde" <wilde at mcs.anl.gov>
> >> Sent: Wednesday, March 20, 2013 11:26:30 AM
> >> Subject: SWIFT jobs in C/X states on uc3-sub
> >> 
> >> Hi David / Mike,
> >> 
> >> I notice there are a lot of old jobs sitting in the UC3 queue.
> >> They're sitting in either "X" (removed) or "C" (completed). Sample
> >> below:
> >> 
> >>> 70980.0   wilde           3/12 11:42   0+00:18:59 C  0   43.9
> >>> perl
> >>> cscript906755
> >>> 70981.0   wilde           3/12 11:42   0+00:03:14 C  0   46.4
> >>> perl
> >>> cscript906755
> >>> 70982.0   wilde           3/12 11:42   0+00:03:14 C  0   46.4
> >>> perl
> >>> cscript906755
> >> 
> >>> 71652.0   davidk          3/13 19:28   0+00:00:01 X  0   0.0
> >>>  perl
> >>> cscript500002
> >>> 71653.0   davidk          3/13 19:28   0+00:00:01 X  0   0.0
> >>>  perl
> >>> cscript500002
> >>> 71896.0   davidk          3/13 19:38   0+00:00:01 X  0   0.0
> >>>  perl
> >>> cscript339551
> >> 
> >> Are these jobs completing OK on your side?
> >> 
> >> Occasionally I go in and purge old jobs, but I'm curious if
> >> there's
> >> something in your submit files that is causing them to stick.
> >> 
> >> Cheers,
> >> Lincoln
> 
> 



More information about the Swift-devel mailing list