[Swift-devel] Discussion on next steps for Coasters

Thu Apr 2 21:31:39 CDT 2009

I had a brief off-list discussion with Mihael on next steps for 
coasters. Im posting it here for group discussion and to get us started 
on the same page.

This follows up on discussion a few weeks ago on the same topic.

Rather than try to reorg the email below, Im posting it largely as-is in 
the interest of effort and time.

Bottom line: Mihael will work on Coasters next, as he suggested in a 
prior email, taking the next steps to harden them for users, establish a 
better test mechanism and procedure, and work on some usability & 
enhancement issues.

- Mike

-------- Original Message --------
Subject: Re: Hi / status / next ?
Date: Thu, 02 Apr 2009 21:01:14 -0500
From: Michael Wilde <wilde at mcs.anl.gov>
To: Mihael Hategan <hategan at mcs.anl.gov>
References: <49D551B8.5010105 at mcs.anl.gov> 
<1238721084.19231.18.camel at localhost>

OK, all sounds good. Many more details to work out, but a short followup
  below.

On 4/2/09 8:11 PM, Mihael Hategan wrote:
> On Thu, 2009-04-02 at 19:00 -0500, Michael Wilde wrote:
>> Hi Mihael,
>>
...
>> So next on Swift: I think you should do a fairly intensive burst of 
>> effort on Coaster stabilization and portability, like you suggested on 
>> the list a little while ago.
> 
> Right.
> 
>> At a very high level, what I want to see is:
>>
>> - solid test suite, so we know its working on a agreed on and growing 
>> set of platforms, mainly the TG, OSG and a few miscellaneous sites the 
>> users need
>>
>> - solve the "GT2 / OSG thing", which I *think* involves starting coaster 
>> workers from the submit host with GT2 using Condor-G.
> 
> The complexity of adding condor-g into the loop will likely be nasty.
> But I'll try.

Before you start, then, especially if its not an obvious answer, lets
sanity check with discussion on list. As a proposed update to your
design doc.
> 
>> - check that coaster shutdown is working.
> 
> Is there any reason to believe it's not?

Yeah, some suspicious behavior that we havent been able to pin down (me,
Glen) but suspect may be happening.

> 
>> Then lower priority:
>>
>> - make it possible to allocate a persistent pool of Coaster workers all 
>> at once (say, gimme 1000 nodes on Ranger for 1 hour".
> 
> That I think isn't a good idea. Here's why, and correct me if I'm
> missing something:
> - regardless of whether you use it or not, you need to wait for nodes to
> be available. Whether that waiting happens while swift is running or
> not, it still happens.

true

> - once you have a pre-set number of nodes, you need to quickly start
> swift and use them, otherwise you lose allocations. By contrast, in
> automatic mode, swift will use them as soon as they are available

true

> - allocation of a pre-set number of nodes may be delayed if that number
> of nodes is not available. In the automatic mode, swift will use fewer
> nodes when they are available and ramp up to whatever it can get. A
> limiting case, when your 1k nodes will not be available at all, shows
> that the automatic case will yield better performance (you workflow will
> finish).

true

> - better balancing can be done if there are multiple sites with
> automatic allocation.

all true ;)

Only case where its handy is benchmarking a workflow on a known quantity
of nodes.

Driven in part by fact that on BGP, this is how they are allocated.
(But even there we could do multi-block allocation in varying chunks if
the allocator was aware of the scheduling policy of the cluster)

So what I was thinking was "ask for N nodes all at once".  In all cases,
it would be assumed "...and then start your workflow". So it would not
need to be a separate allocation.

Tied to an option to say "leave my nodes running when wf done" this
would I think meet all needs. But your points above are complelling,
hence this feature needs deliberation and is nowhere need the top of the
list.  Higher on the list would be demand-based grow-shrink of pool, but
  in varying sized blocks. And on all systems, I think, you need to free
in the same sized blocks (of of CPUs) that you allocated in.

It raises another Q: for some sites like TeraPort, which I think places
jobs on all cores, independently, in todays coasters implementation, I
am assuming the user should not specify coastersPerNode > 1. True? (even
though it has 2-core nodes.)  We should clarify this in the users guide.
I will ask this on the list right now so all can get answer.

> One advantage to allocating blocks of coasters may be the possibility
> that a single multi-node job is started (so it solves the gt2
> scalability problem, but so does you provisioning point below).

I would be interested in this, both for its intrinsic performance
benefits, but also as a short-term solution to the OSG GT2 "overheating"
problem.  Especially if the Condor-G solution gets complex and takes
long to implement and perfect. Ie, as a short term fix with long term
benefits, migt make sense to do it first, assuming that *it* is not
harder than Condor-G provider and coaster integ.

> 
>> - other ways to leave coasters running for the next jobs
> 
> Right. That may be possible with persistent services instead of the
> current transient scheme.
> 
>> - ensure that coaster time allocations are being done sensibly
>>
>> - revisit the coaster "provisioning" mechanism in terms of in what 
>> increments workers are allocated and released in
>>
>> - some kind of coaster status display
>>
>> - some way to probe a job thats running on a coaster?
> 
> Define "probe".

- ps -f on the running process.
- probe its resource usage (/proc, also ps, etc)
- ls -lR of its jobdir (as these will more often be on /tmp)

We have these needs today; on the BGP under falkon we manually login to
the node, but thats cumbersome: hard to find the node; 2-stage login
process.

Low prio, a pipe dream. But theoretically do-able.

So, very cool, we are converging on a plan.

I'll cc most of the above to the list now.

> 
>>  Issue a shell 
>> command on the worker of the job?
>>
>> - other things I missed.
>>
>> I'll send this to the list for discussion; what I mainly want to 
>> understand from you first is your time availability, what you feel you 
>> owe swift in terms of compensating from i2u2 hours, and anything you 
>> know of on swift that is higher priority that the coaster things above? 
>> (I dont, but maybe missing something)
>>
>> Lastly, how is Phong doing, and to what extent can he be self-sufficient 
>> if you were to go 100% swift for a while?
> 
> I think he'll be able to take over most things. However, with the
> current big push, he's probably not confident enough, so it may have to
> happen after the new version is put into production.

...