[Swift-devel] Falkon and Coaster support for MPI

Mihael Hategan hategan at mcs.anl.gov
Mon Jun 30 12:38:08 CDT 2008


On Mon, 2008-06-30 at 12:18 -0500, Ioan Raicu wrote:
> I am just now catching up with the dozens of emails...
> 
> Ian Foster wrote:
> > 4) Ioan points out that a fully general multi-level scheduling 
> > solution with support for multi-CPU jobs may introduce the need for a 
> > smarter scheduler than our current FIFO approach. E.g., if we have 256 
> > nodes and a queue with jobs of size {32,256,32,32,32,32,32,32,32,32}, 
> > a FIFO strategy would run them in that order, and waste much CPU time. 
> > On the other hand, a simple "first-fit" strategy might starve large jobs.
> This is all true... in the case of Falkon, there are further 
> limitations, such as:
> 32 CPU MPI job starts and runs for 10 min
> 256 CPU MPI job is ready to run, but not enough CPUs are available; what 
> is easy in Falkon to do is to place the 256 CPU job back in the queue, 
> and process the next one, which is 32 CPUs... and keep doing this until 
> it finds all 256 CPUs free to schedule the 256 CPU MPI job.  This means 
> that the order will be {32, ...., 32, 256}... and this is assuming that 
> at some point, the smaller MPI jobs will stop coming, and let the 256 
> CPU MPI job start, or else the 256 CPU MPI job will run the risk of 
> being starved. 

This seems like coming up with particularly bad (though simple)
scheduling algorithms and then pointing out that they are... bad. I'm
not sure what this is supposed to achieve, but I'd rather start by
reading some papers on the topic.




More information about the Swift-devel mailing list