[Swift-devel] stageout ordering vs restarts

Ben Clifford benc at hawaga.org.uk
Sun Jul 6 11:41:52 CDT 2008


At present, stageouts for jobs tend to execute quite late in a run, in as 
much as when there are other jobs to run, the stageins for those jobs will 
usually use available file transfer rate-limit load before stageouts 
happen.

I've noticed this before as a user interface quirk - users see GRAM jobs 
complete on remote sites, but do not see output files appear on the submit 
side until much much later and sometimes misinterpret that as a failure.

However, I think there is an issue here with how restarts work too. Jobs 
are not recorded as done for the purposes of restart (i.e. will not be 
re-executed) until stageout has finished.

When stageout is happening late, that means in late-stageout situations, 
lots of work will be done but to the extent that it can be ignored on 
restarts.

So that makes early-stageout behaviour more appealing in some situations - 
situations in which it is expected that a restart will be necessary, or 
where it is preferable to have slower job execution in exchange for more 
stuff marked as done in the restart logs.

That is perhaps worth thinking about as part of the project that Ragib is 
working on.

-- 



More information about the Swift-devel mailing list