[Swift-devel] Test with condor-g provider
Mats Rynge
rynge at renci.org
Wed May 6 12:39:35 CDT 2009
Ben Clifford wrote:
> On Wed, 6 May 2009, Zhao Zhang wrote:
>
>> Cool! This is really good to know. Another question is, how a user could tell
>> this issue? like me?
>
> You have to learn how Condor-G works...
>
> I think probably the condor provider should be changed so that instead of
> jobs going on hold, they fail when things like this happen. I think that
> would make the code behave more like other execution systems.
One idea would be to have swift detect held jobs and that would be
"failure". You can use periodic_hold for that. Example:
# GlobusStatus==16 is suspended
# GlobusStatus==32 is submitting
# JobStatus==1 is pending
# JobStatus==2 is running
periodic_hold = ( (GlobusStatus==16) || \
( (GlobusStatus==32) && (CurrentHosts==1) && \
((CurrentTime - EnteredCurrentStatus) > (20*60)) ) || \
( (JobStatus==1) && \
((CurrentTime - EnteredCurrentStatus) > (1*24*60)) ) || \
(JobStatus==2) && \
((CurrentTime - EnteredCurrentStatus) > (10*24*60)) ) )
on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)
Or, you can use a similar expression with periodic_remove, but then you
have to figure out afterwards why a job exited the queue.
--
Mats Rynge
Renaissance Computing Institute <http://www.renci.org>
More information about the Swift-devel
mailing list