[Swift-devel] condor leave_in_queue

David Kelly davidk at ci.uchicago.edu
Sat Jul 6 03:08:51 CDT 2013


Mihael, 


Thanks for the info. 


The problem we were seeing was that condor jobs were not being removed. They would complete, but remain visible from condor_q forever until manually removed by the user with condor_rm. At the suggestion of the uc3 admins, I tried testing with leave_in_queue set to false. Jobs are being removed now, and I just ran a quick test (uc3 /home/davidk/test4/run003) to verify exit codes still being read correctly, but perhaps there is a better fix? 


David 



----- Original Message -----


From: "Mihael Hategan" <hategan at mcs.anl.gov> 
To: "Swift Devel" <swift-devel at ci.uchicago.edu> 
Sent: Saturday, July 6, 2013 1:43:53 AM 
Subject: [Swift-devel] condor leave_in_queue 

This is in regards to http://sourceforge.net/p/cogkit/svn/3671/ 

The reason why leave_in_queue was set to TRUE was in order to get the 
exit code from the job (and therefore figure whether it failed or not). 

If the job is automatically removed from the queue by condor when the 
job is done, that information is lost. 

Instead, the queue poller, after it figures out that a job is done and 
it reads the exit code, sets leave_in_queue to FALSE and removes the job 
from the queue. 

I'm guessing that was broken somehow, but I'd like to get more details 
before I can like the change (or before I merge it into the faster 
branch). 

Mihael 

_______________________________________________ 
Swift-devel mailing list 
Swift-devel at ci.uchicago.edu 
https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20130706/65961268/attachment.html>


More information about the Swift-devel mailing list