[Swift-devel] SWIFT jobs in C/X states on uc3-sub
Michael Wilde
wilde at mcs.anl.gov
Tue Jun 4 11:56:52 CDT 2013
Hi Lincoln,
No update yet. I just filed this as Swift bug 1010 assigned it to you, David.
Lincoln, can you help us on this? Below is a sample of the Condor submit script generated by Swift.
Is our problem simply caused by the "leave_in_queue" flag being TRUE? I think a few simple experiments with the .submit file below should help re-create and fix the problem.
Also, I see that our stdout and stderr files are empty. David, should be be (optionally) capturing a per-job condor .log file?
Lincoln, any other guidance on the contents of the submit file?
- Mike
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
Transfer_Executable = false
machine_count = 1
output = /home/wilde/.globus/scripts/Condor2152159861288524507.submit.stdout
error = /home/wilde/.globus/scripts/Condor2152159861288524507.submit.stderr
environment = WORKER_LOGGING_LEVEL=NONE;
executable = /usr/bin/perl
arguments = cscript5775404700699952879.pl http://10.1.3.94:53610,http://128.135.158.243:53610 0523-5710180-000010 NOLOGGING
transfer_input_files = /home/wilde/.globus/coasters/cscript5775404700699952879.pl
requirements = regexp("uc3-c*", Machine)
+accountinggroup = "group_friends.wilde"
notification = Never
leave_in_queue = TRUE
queue
----- Original Message -----
> From: "Lincoln Bryant" <lincolnb at uchicago.edu>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "David Kelly" <davidk at ci.uchicago.edu>, "swift-devel" <swift-devel at ci.uchicago.edu>, "uc3-support"
> <uc3-support at lists.uchicago.edu>
> Sent: Tuesday, June 4, 2013 11:34:42 AM
> Subject: Re: SWIFT jobs in C/X states on uc3-sub
>
> Hi Mike,
>
> I was just wondering if there was any update on this. I still see
> Swift jobs sit in the C state in our Condor pool after they've
> finished.
>
> > [lincolnb at uc3-sub local]$ condor_q
> >
> >
> > -- Submitter: uc3-sub.uchicago.edu :
> > <10.1.3.94:9618?sock=25212_0c25_38> : uc3-sub.uchicago.edu
> > ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> > 110372.0 maheshwari 6/4 09:19 0+00:21:40 C 0 732.4
> > perl cscript948689
> > 110373.0 maheshwari 6/4 09:19 0+00:21:39 C 0 732.4
> > perl cscript948689
> > 110374.0 maheshwari 6/4 09:19 0+00:21:40 C 0 732.4
> > perl cscript948689
> > 110375.0 maheshwari 6/4 09:19 0+00:21:39 C 0 732.4
> > perl cscript948689
> > 110376.0 maheshwari 6/4 09:19 0+00:21:39 C 0 732.4
> > perl cscript948689
> > 110377.0 maheshwari 6/4 09:19 0+00:21:39 C 0 732.4
> > perl cscript948689
> > 110378.0 maheshwari 6/4 09:19 0+00:21:39 C 0 732.4
> > perl cscript948689
> > 110379.0 maheshwari 6/4 09:19 0+00:21:41 C 0 463.9
> > perl cscript948689
>
>
> I have swift-0.94 installed on uc3-sub.uchicago.edu
>
> Cheers,
> Lincoln Bryant
>
> On Mar 20, 2013, at 11:58 AM, Michael Wilde wrote:
>
> > My jobs must be fossils; David, Yadu, we should test whether and
> > why Swift doesnt always clean up.
> >
> > I realize that if Swift hangs and needs to be SIGKILL'ed then it
> > cant. But lets see if the Condor provider is cleaning up when
> > Swift gets a catchable signal.
> >
> > Lincoln, lease remove the "wilde" jobs if you can do that.
> >
> > Thanks,
> >
> > - Mike
> >
> > ----- Original Message -----
> >> From: "Lincoln Bryant" <lincolnb at uchicago.edu>
> >> To: "David Kelly" <davidk at ci.uchicago.edu>
> >> Cc: "Michael Wilde" <wilde at mcs.anl.gov>
> >> Sent: Wednesday, March 20, 2013 11:26:30 AM
> >> Subject: SWIFT jobs in C/X states on uc3-sub
> >>
> >> Hi David / Mike,
> >>
> >> I notice there are a lot of old jobs sitting in the UC3 queue.
> >> They're sitting in either "X" (removed) or "C" (completed). Sample
> >> below:
> >>
> >>> 70980.0 wilde 3/12 11:42 0+00:18:59 C 0 43.9
> >>> perl
> >>> cscript906755
> >>> 70981.0 wilde 3/12 11:42 0+00:03:14 C 0 46.4
> >>> perl
> >>> cscript906755
> >>> 70982.0 wilde 3/12 11:42 0+00:03:14 C 0 46.4
> >>> perl
> >>> cscript906755
> >>
> >>> 71652.0 davidk 3/13 19:28 0+00:00:01 X 0 0.0
> >>> perl
> >>> cscript500002
> >>> 71653.0 davidk 3/13 19:28 0+00:00:01 X 0 0.0
> >>> perl
> >>> cscript500002
> >>> 71896.0 davidk 3/13 19:38 0+00:00:01 X 0 0.0
> >>> perl
> >>> cscript339551
> >>
> >> Are these jobs completing OK on your side?
> >>
> >> Occasionally I go in and purge old jobs, but I'm curious if
> >> there's
> >> something in your submit files that is causing them to stick.
> >>
> >> Cheers,
> >> Lincoln
>
>
More information about the Swift-devel
mailing list