[Swift-devel] walltime compulsion

Mats Rynge rynge at renci.org
Thu Feb 12 09:55:42 CST 2009


Mihael Hategan wrote:
> On Thu, 2009-02-12 at 15:36 +0000, Ben Clifford wrote:
>> On Thu, 12 Feb 2009, Mihael Hategan wrote:
>>
>>> I have yet to see a queuing system that works that same way (not that 
>>> I've seen many).
>> Plenty of queueing systems give you a default walltime on jobs that you 
>> submit. I don't see that its Swift's business to be interfering with that 
>> default.
>>
> 
> I suppose there's no clear thing here. Anybody else?

Ignoring the queuing system for a moment, it is still a good idea to
know what the expected runtime is. Ben and I had some of this
conversation when we tried Swift on OSG, and we had a couple of
instances where job and/or file transfer status changes where "lost",
and Swift got stuck. I strongly believe that you need to have internal
timeouts for pretty much all your states in your state machine, and that
the timeouts for the job states should be based on the "walltime".

We are using state timeouts for a lot of our OSG jobs based on Condor
and OSGMM. This ensures that it is the workflow engine, and not the
user, that picks up weird states and handles them accordingly (resubmit
to another site for example).

-- 
Mats Rynge
Renaissance Computing Institute <http://www.renci.org>



More information about the Swift-devel mailing list