[Swift-devel] testing

Michael Wilde wilde at mcs.anl.gov
Thu Mar 19 18:23:35 CDT 2009


I was writing the following yesterday before you posted the Coaster 
design notes.

Those were very helpful, exactly what I was looking for. Now the changes 
being discussed can be couched in terms of deltas to that spec.

So I'm just going to post thoughts I had below before I loose them, to 
try to nudge this issue forward.

On 3/18/09 12:11 PM, Mihael Hategan wrote:
> On Wed, 2009-03-18 at 12:05 -0500, Michael Wilde wrote:
>> On 3/18/09 10:11 AM, Mihael Hategan wrote:
>>> On Wed, 2009-03-18 at 07:27 -0500, Michael Wilde wrote:
>>>
>>>>>  iii) running anything through gram2 is bad - any base job submissions 
>>>>>   need to be through condor-g using its hybrid gram2+gridmanager system.
>>>> I agree, and was assuming that on OSG we would only use the new Condor 
>>>> provider, and run jobs in this manner.
>>> There seems to be some confusion here.
>>>
>>> Ben, the point is to run with one of the scheduler providers, not gram2.
>>>
>>> Mike, the condor provider is not a condor-through-gram provider. It only
>>> submits to the local condor queue.
>> I was thinking/hoping that the condor provider would have a setting that 
>> submitted swift apps as condor-g jobs to N grid sites, *via* the local 
>> condor queue.
>>
>> Isn't that how condor-g works? I send my local condor (via condor_sumit) 
>> a .sub file that says e.g.:
>>
>>    universe=grid
>>    grid_resource=gt2 osg-edu.cs.wisc.edu/jobmanager-condor
>>
>> If I cant do that yet through the condor provider, was it your intent 
>> that users eventually be able to do that, or was that not what you were 
>> implementing?
> 
> That was not what I was implementing.

OK. But (a) is the Condor-G provider worth implementing and (b) how far 
is it, in effort, what what you were implementing?

> What I was aiming for was a local condor provider, similar to the PBS
> provider, that would address the scalability issues with gram2 for sites
> using condor as a queuing system.

But one uses the pbs provider by running swift directly on a system that 
has the pbs tools (ie qsub) installed. There are very few systems that 
users have direct login access to which have a Condor LRM. (The TG 
Purdue systems being one exception).

But, could a swift user use the "local condor provider", the one you are 
implementing, to have the coaster service launch its workers?

Alternatively, if you *were* to implement a Condor-G provider as above, 
could coaster workers be submitted via that provider via Swift, rather 
than having them started by the coaster service?

And then there is Ben's important point about putting the Coaster 
service on a cluster worker node rather than the head node - wherever 
possible. And accepting the limitation that on some systems, if the 
coaster service cant run on a worker node, it cant run on that site.  So 
sites where the workers can not connect back to the Swift submit host 
could not run coasters.

But the basic limitations are I believe as Ben stated them. We have to, 
for the time being and probably quite a while, submit both the worker 
and the service via Condor-G to pre-WS-GRAM with the grid_monitor enabled.




More information about the Swift-devel mailing list