[Swift-devel] Test with condor-g provider

Mats Rynge rynge at renci.org
Wed May 6 12:39:35 CDT 2009


Ben Clifford wrote:
> On Wed, 6 May 2009, Zhao Zhang wrote:
> 
>> Cool! This is really good to know. Another question is, how a user could tell
>> this issue? like me?
> 
> You have to learn how Condor-G works...
> 
> I think probably the condor provider should be changed so that instead of 
> jobs going on hold, they fail when things like this happen. I think that 
> would make the code behave more like other execution systems.

One idea would be to have swift detect held jobs and that would be 
"failure". You can use periodic_hold for that. Example:

#  GlobusStatus==16 is suspended
#  GlobusStatus==32 is submitting
#  JobStatus==1 is pending
#  JobStatus==2 is running
periodic_hold = ( (GlobusStatus==16) || \
     ( (GlobusStatus==32) && (CurrentHosts==1) && \
       ((CurrentTime - EnteredCurrentStatus) > (20*60)) ) || \
     ( (JobStatus==1) && \
       ((CurrentTime - EnteredCurrentStatus) > (1*24*60)) ) || \
     (JobStatus==2) && \
       ((CurrentTime - EnteredCurrentStatus) > (10*24*60)) ) )
on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)


Or, you can use a similar expression with periodic_remove, but then you 
have to figure out afterwards why a job exited the queue.

-- 
Mats Rynge
Renaissance Computing Institute <http://www.renci.org>



More information about the Swift-devel mailing list