[Swift-devel] stageout ordering vs restarts
Ben Clifford
benc at hawaga.org.uk
Sun Jul 6 11:41:52 CDT 2008
At present, stageouts for jobs tend to execute quite late in a run, in as
much as when there are other jobs to run, the stageins for those jobs will
usually use available file transfer rate-limit load before stageouts
happen.
I've noticed this before as a user interface quirk - users see GRAM jobs
complete on remote sites, but do not see output files appear on the submit
side until much much later and sometimes misinterpret that as a failure.
However, I think there is an issue here with how restarts work too. Jobs
are not recorded as done for the purposes of restart (i.e. will not be
re-executed) until stageout has finished.
When stageout is happening late, that means in late-stageout situations,
lots of work will be done but to the extent that it can be ignored on
restarts.
So that makes early-stageout behaviour more appealing in some situations -
situations in which it is expected that a restart will be necessary, or
where it is preferable to have slower job execution in exchange for more
stuff marked as done in the restart logs.
That is perhaps worth thinking about as part of the project that Ragib is
working on.
--
More information about the Swift-devel
mailing list