[Swift-devel] Falkon and Coaster support for MPI

Michael Wilde wilde at mcs.anl.gov
Sun Jun 29 13:24:45 CDT 2008


We mean MPI jobs launched from Swift onto BG/P, or directly to Falkon or 
Coaster to BGP. But in general the desire is to run all work, even 
workloads with no workflow dependencies, under Swift, for uniformity, 
site independence, and provenance.

The initial discussion here was based on the assumption that a 
Falkon-like mechanism was required in order to run workloads of many 
small jobs on the BGP - whether that be through Swift, or directly. 
(Small meaning 1 to 64 CPUs each and order of a few minutes of runtime 
each).

I think that assumption is true on the Argonne BGP for two reasons: 1) 
scheduling policy doesnt allow or favor any user from running > 2 BGP 
jobs at once, and 2) on the production BGP the production partitions 
favor large jobs, of 512 to 2048 compute nodes. Running many smaller (eg 
16, 32 CPU) Swift jobs doesnt seem like its going to be an accepted 
model.  That drives these kinds of app needs towards a Falkon/Coaster 
approach.

Recently, IBM circulated info on their "HTC" mode support for the BG/P, 
which may change the nature of the assumptions above.

- Mike

On 6/29/08 12:58 PM, Ben Clifford wrote:
> Do you really mean falkon and/or coaster or do you mean MPI jobs launched 
> from Swift onto BG/P?
> 
> The implementation of the latter might be completely distinct from Coaster 
> and or Falkon. It might be desired to run a specific MPI application on 
> all cores in a particular processor set (or whatever they are called). In 
> such a case, the per-node individual job management that falkon and 
> coaster provide would be almost/entirely irrelevant.
> 
> I presume there is some existing mechanism for launching an MPI job on 
> every core in a processor set already.
> 
> It might be that it would be more appropriate for Swift to cause that 
> mechanism to be used, making 'one node' = 'one pset' rather than 'one 
> node' = 'one cpu' (where node is the basic unit that can execute a job). 
> 
> There is a (substantially?) more complicated case of causing one pset to 
> run multiple different MPI jobs simultaneously, with some cores going to 
> one job, and some to another.
> 
> The above two are (from my perspective) very different use cases; any 
> future discussion should clarify which one is being discussed, rather than 
> being based the always-vague "I want to use MPI".
> 



More information about the Swift-devel mailing list