[Swift-devel] Re: Slow job processing by SGE execution provider?

Mihael Hategan hategan at mcs.anl.gov
Mon Jan 31 22:52:20 CST 2011


On Mon, 2011-01-31 at 16:27 -0800, Mihael Hategan wrote:
> On Sat, 2011-01-29 at 13:35 -0600, Michael Wilde wrote:
> > Mihael,
> > 
> > Im running some simple Swift tests on the "siraf" SGE cluster in UC Radiology.
> > 
> > The swift script runs 10 cat jobs with one tiny input file (and in
> > this case, no output files, even though an output dataset was mapped).
> > 
> > I see the following unexpected behavior:
> > 
> > The siraf cluster schedules jobs very quickly. The 10 jobs are
> > launched and finished (as seen by a "watch qstat" command) in the
> > first few seconds of the script's execution.
> > 
> > But then, swift slowly logs the completions on stdout at a rate of
> > less than one per second, so the overall workflow takes almost 40
> > seconds when it should finish in say 5-10 seconds. The same behavior
> > occurs when I send in 50 or 100 jobs - the jobs get through SGE *very*
> > quickly, and then swift sluggishly recognizes their completion.
> > 
> > Its almost like SGE qstat is polled less than once per second, and
> > only one completed job is recognized per poll. 
> 
> It is polled less than once per second. There is one poll every 10
> seconds. 
> 
> This doesn't explain the delay between the stageouts. I suspect it might
> be some side effect of the "fast" code, which clearly isn't very fast in
> this case.
> > 
> > Also note that the progress log on stdout makes it look like Swift
> > thinks only one job is active at a time, when in fact all 10 of the
> > jobs have long finished as seen by an external qstat.
> 
> Right. It looks like karajan code is executed in the task notification
> thread which serializes those notifications.

That doesn't turn out to be the issue. So I'll need to dig more.





More information about the Swift-devel mailing list