[Swift-devel] Re: Slow job processing by SGE execution provider?

Mihael Hategan hategan at mcs.anl.gov
Mon Jan 31 18:27:42 CST 2011


On Sat, 2011-01-29 at 13:35 -0600, Michael Wilde wrote:
> Mihael,
> 
> Im running some simple Swift tests on the "siraf" SGE cluster in UC Radiology.
> 
> The swift script runs 10 cat jobs with one tiny input file (and in
> this case, no output files, even though an output dataset was mapped).
> 
> I see the following unexpected behavior:
> 
> The siraf cluster schedules jobs very quickly. The 10 jobs are
> launched and finished (as seen by a "watch qstat" command) in the
> first few seconds of the script's execution.
> 
> But then, swift slowly logs the completions on stdout at a rate of
> less than one per second, so the overall workflow takes almost 40
> seconds when it should finish in say 5-10 seconds. The same behavior
> occurs when I send in 50 or 100 jobs - the jobs get through SGE *very*
> quickly, and then swift sluggishly recognizes their completion.
> 
> Its almost like SGE qstat is polled less than once per second, and
> only one completed job is recognized per poll. 

It is polled less than once per second. There is one poll every 10
seconds. 

This doesn't explain the delay between the stageouts. I suspect it might
be some side effect of the "fast" code, which clearly isn't very fast in
this case.
> 
> Also note that the progress log on stdout makes it look like Swift
> thinks only one job is active at a time, when in fact all 10 of the
> jobs have long finished as seen by an external qstat.

Right. It looks like karajan code is executed in the task notification
thread which serializes those notifications.

There should be a simple fix for this.

Mihael




More information about the Swift-devel mailing list