[Swift-devel] Status of coasters

Ben Clifford benc at hawaga.org.uk
Fri Feb 13 09:27:37 CST 2009


On Fri, 13 Feb 2009, Michael Wilde wrote:

> - Ben has a patch to integrate to run the coaster service on a worker node.
> Question: this is only usable when workers have sufficient IP access, correct?

Yes. I plan on making this presentable and then committing it. As part of 
that, probably I should document who connects where in coasters with a 
pretty diagram, to aid in understanding of what 'sufficient' is.

> - The scalability problem submitting to GT2 GRAM sites still exists. Potential
> solutions are:
> 
> -- Service submits workers via PBS (using jobmanger=gt2:pbs). Valid only on
> PBS sites. Not yet tested.
> 
> -- Service submits workers via Condor-G (using jobmanager=gt2:condor). Mihael
> feels this requires a new Condor provider, the one in the current code base
> being insufficient and untested - really more of a prototype developed by a
> student).

That would be regular Condor, not Condor-G, I think.

The two above could be summarised as "submit service workers through the 
local LRM using CoG specific providers for that LRM".

The PBS provider seems to be getting a reasonable amount of use recently, 
and I think is also useful in the single-site case where it allows GRAM to 
be avoided entirely.

A decent Condor provider would probably allow something similar for Condor 
based clusters.

> -- Service submits via WS-GRAM. This should be tested, on sites where 
> WS-GRAM is working. This woild use jobmanager=gt2:gt4:{pbs/condor/sge}, 
> and needs to be tested.

If gram4.0 is working on a site, is there any reason to use gt2 for the 
head job submission? It seems to add a dependency on one more service 
(depending on both gram2 and gram4.0) rather than substituting one 
dependency for another (gram2 for gram4.0)

> For sites where WS-GRAM is not functional, I suggested we consider configuring
> our own non-root WS-GRAM, ideally using already-installed GT4 software, eg,
> from the OSG package on OSG and TG sites where its installed. Mihael thought
> this would be considerable work. I agree but it might be a stable solution
> with fewer unknowns and suppot from the GRAM group. We can bring in the latest
> GT4 as needed if that provides a better solution than some older installed GT4
> which we have no control over and which wont change till upcoming releases of
> say OSG or TG packages.

I agree that this is considerable work. I think it is not something we 
should pursue.

> Lastly: it seems that a Condor-G provide might be a powerful capability (as
> one configuration option) to be able to submit all swift jobs via Condor-G
> (e.g, for non-coaster runs as well).  Please comment on the value of such a
> capability.

I've pondered that before.

Using Condor-G appears to be the officially supported mechanism for 
submitting to OSG in some peoples minds; and similarly, using plain GRAM2 
is Prohibited in those peoples minds.

Using Condor-G would be more in line with some peoples views of how jobs 
should properly be submitted to OSG.

Such functionality could fit in as a CoG execution provider (similar to, 
or part of a plain Condor execution provider), and would not peturb the 
architecture of Swift. Swift runs in such a situation would look a little 
like DAGman runs, with a management process handling some rate limiting 
and deciding which jobs to run and where, but then the mechanics of 
submission being handled by a local Condor.

This approach would necessitate a local Condor installation, but only in 
situations where this approach was used; so this would not peturb 
usability too much, and many places where this would be used already have 
a Condor installation.

So I'm cautiously supportive of this approach.

Specifically given the two different uses for condor interfacing discussed 
above, I think that it would be useful to investigate making the Condor 
provider decent.

-- 




More information about the Swift-devel mailing list