[Swift-devel] Status of coasters
Ben Clifford
benc at hawaga.org.uk
Fri Feb 13 09:27:37 CST 2009
On Fri, 13 Feb 2009, Michael Wilde wrote:
> - Ben has a patch to integrate to run the coaster service on a worker node.
> Question: this is only usable when workers have sufficient IP access, correct?
Yes. I plan on making this presentable and then committing it. As part of
that, probably I should document who connects where in coasters with a
pretty diagram, to aid in understanding of what 'sufficient' is.
> - The scalability problem submitting to GT2 GRAM sites still exists. Potential
> solutions are:
>
> -- Service submits workers via PBS (using jobmanger=gt2:pbs). Valid only on
> PBS sites. Not yet tested.
>
> -- Service submits workers via Condor-G (using jobmanager=gt2:condor). Mihael
> feels this requires a new Condor provider, the one in the current code base
> being insufficient and untested - really more of a prototype developed by a
> student).
That would be regular Condor, not Condor-G, I think.
The two above could be summarised as "submit service workers through the
local LRM using CoG specific providers for that LRM".
The PBS provider seems to be getting a reasonable amount of use recently,
and I think is also useful in the single-site case where it allows GRAM to
be avoided entirely.
A decent Condor provider would probably allow something similar for Condor
based clusters.
> -- Service submits via WS-GRAM. This should be tested, on sites where
> WS-GRAM is working. This woild use jobmanager=gt2:gt4:{pbs/condor/sge},
> and needs to be tested.
If gram4.0 is working on a site, is there any reason to use gt2 for the
head job submission? It seems to add a dependency on one more service
(depending on both gram2 and gram4.0) rather than substituting one
dependency for another (gram2 for gram4.0)
> For sites where WS-GRAM is not functional, I suggested we consider configuring
> our own non-root WS-GRAM, ideally using already-installed GT4 software, eg,
> from the OSG package on OSG and TG sites where its installed. Mihael thought
> this would be considerable work. I agree but it might be a stable solution
> with fewer unknowns and suppot from the GRAM group. We can bring in the latest
> GT4 as needed if that provides a better solution than some older installed GT4
> which we have no control over and which wont change till upcoming releases of
> say OSG or TG packages.
I agree that this is considerable work. I think it is not something we
should pursue.
> Lastly: it seems that a Condor-G provide might be a powerful capability (as
> one configuration option) to be able to submit all swift jobs via Condor-G
> (e.g, for non-coaster runs as well). Please comment on the value of such a
> capability.
I've pondered that before.
Using Condor-G appears to be the officially supported mechanism for
submitting to OSG in some peoples minds; and similarly, using plain GRAM2
is Prohibited in those peoples minds.
Using Condor-G would be more in line with some peoples views of how jobs
should properly be submitted to OSG.
Such functionality could fit in as a CoG execution provider (similar to,
or part of a plain Condor execution provider), and would not peturb the
architecture of Swift. Swift runs in such a situation would look a little
like DAGman runs, with a management process handling some rate limiting
and deciding which jobs to run and where, but then the mechanics of
submission being handled by a local Condor.
This approach would necessitate a local Condor installation, but only in
situations where this approach was used; so this would not peturb
usability too much, and many places where this would be used already have
a Condor installation.
So I'm cautiously supportive of this approach.
Specifically given the two different uses for condor interfacing discussed
above, I think that it would be useful to investigate making the Condor
provider decent.
--
More information about the Swift-devel
mailing list