[Swift-devel] Falkon and Coaster support for MPI
Michael Wilde
wilde at mcs.anl.gov
Sun Jun 29 13:24:45 CDT 2008
We mean MPI jobs launched from Swift onto BG/P, or directly to Falkon or
Coaster to BGP. But in general the desire is to run all work, even
workloads with no workflow dependencies, under Swift, for uniformity,
site independence, and provenance.
The initial discussion here was based on the assumption that a
Falkon-like mechanism was required in order to run workloads of many
small jobs on the BGP - whether that be through Swift, or directly.
(Small meaning 1 to 64 CPUs each and order of a few minutes of runtime
each).
I think that assumption is true on the Argonne BGP for two reasons: 1)
scheduling policy doesnt allow or favor any user from running > 2 BGP
jobs at once, and 2) on the production BGP the production partitions
favor large jobs, of 512 to 2048 compute nodes. Running many smaller (eg
16, 32 CPU) Swift jobs doesnt seem like its going to be an accepted
model. That drives these kinds of app needs towards a Falkon/Coaster
approach.
Recently, IBM circulated info on their "HTC" mode support for the BG/P,
which may change the nature of the assumptions above.
- Mike
On 6/29/08 12:58 PM, Ben Clifford wrote:
> Do you really mean falkon and/or coaster or do you mean MPI jobs launched
> from Swift onto BG/P?
>
> The implementation of the latter might be completely distinct from Coaster
> and or Falkon. It might be desired to run a specific MPI application on
> all cores in a particular processor set (or whatever they are called). In
> such a case, the per-node individual job management that falkon and
> coaster provide would be almost/entirely irrelevant.
>
> I presume there is some existing mechanism for launching an MPI job on
> every core in a processor set already.
>
> It might be that it would be more appropriate for Swift to cause that
> mechanism to be used, making 'one node' = 'one pset' rather than 'one
> node' = 'one cpu' (where node is the basic unit that can execute a job).
>
> There is a (substantially?) more complicated case of causing one pset to
> run multiple different MPI jobs simultaneously, with some cores going to
> one job, and some to another.
>
> The above two are (from my perspective) very different use cases; any
> future discussion should clarify which one is being discussed, rather than
> being based the always-vague "I want to use MPI".
>
More information about the Swift-devel
mailing list