[Swift-devel] condor leave_in_queue
David Kelly
davidk at ci.uchicago.edu
Sat Jul 6 03:08:51 CDT 2013
Mihael,
Thanks for the info.
The problem we were seeing was that condor jobs were not being removed. They would complete, but remain visible from condor_q forever until manually removed by the user with condor_rm. At the suggestion of the uc3 admins, I tried testing with leave_in_queue set to false. Jobs are being removed now, and I just ran a quick test (uc3 /home/davidk/test4/run003) to verify exit codes still being read correctly, but perhaps there is a better fix?
David
----- Original Message -----
From: "Mihael Hategan" <hategan at mcs.anl.gov>
To: "Swift Devel" <swift-devel at ci.uchicago.edu>
Sent: Saturday, July 6, 2013 1:43:53 AM
Subject: [Swift-devel] condor leave_in_queue
This is in regards to http://sourceforge.net/p/cogkit/svn/3671/
The reason why leave_in_queue was set to TRUE was in order to get the
exit code from the job (and therefore figure whether it failed or not).
If the job is automatically removed from the queue by condor when the
job is done, that information is lost.
Instead, the queue poller, after it figures out that a job is done and
it reads the exit code, sets leave_in_queue to FALSE and removes the job
from the queue.
I'm guessing that was broken somehow, but I'd like to get more details
before I can like the change (or before I merge it into the faster
branch).
Mihael
_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20130706/65961268/attachment.html>
More information about the Swift-devel
mailing list