[Swift-devel] Re: several questions about coaster

Mihael Hategan hategan at mcs.anl.gov
Tue Apr 28 12:59:38 CDT 2009


On Tue, 2009-04-28 at 00:46 -0500, Zhao Zhang wrote:
> Hi, Mihael
> 
> As I am going to test coaster deeply on various systems. I got several 
> questions regarding to the coaster infrastructure.
> 1. Scalability
>     From source code, I could tell that coaster is using TCP for task 
> transmission.(correct me if I am wrong) What is the largest
>     scale test we have done with coaster? I mean the ratio between 
> coaster dispatcher and the number of workers.

Typical stuff on ranger. Around 1000 nodes.

> 
> 2. Usability
>     All tests I have tried with coaster were running together with 
> swift. That is a black box test for me. Is there any interface in coaster
>     that I could specify the number of workers and the wall time?

Like all profile entries, they are task attributes.

> 
>     Also, is there a way for me to start coaster service and workers 
> separately and independently?

The coaster provider is a provider like any other. Use the "job
submission example" in cog and change the provider to "coaster".

> 
>     Besides, I am guessing coaster is using a dynamic provisioning 
> approach to request resources, is this correct? Which means coaster
>     will decide how many compute nodes to request according to the 
> number of jobs, and the length of jobs.

No.

For each job, it will try to find the worker with the least time left
that can still run the job. If no such worker can be found, it will try
to start one with 10 times larger walltime up to a maximum of 250
workers I think.

>  If I run coaster in a super computer
>     context, can I ask coaster to hold a certain number of compute nodes 
> for a certain amount of time? (This somehow overlaps the first Q in 
> section 2)

You don't seem to have tuned in much into recent discussion on this
mailing list.

No, you can't.

> 
> 3. Performance
>     Does coaster provide alternative interface other than the coaster 
> provider? Say if I want to test the dispatch rate of coaster, but don't 
> want to
>     introduce swift overhead, which is a good way to start coaster?

The abstraction api in cog has nothing to do with swift, so use that.

> 
>     Is there a coaster log that could show the number of active workers 
> currently registered with the coaster service, how many jobs are running,
>     how many jobs returned successful, and etc.?

Yes. On the remote site, in ~/.globus/coasters

> 
> 4. Dispatch Algorithm
>     Does coaster use a scoring algorithm for dispatching jobs?

No scoring algorithm. Read appropriate answer in (2).

>  Which 
> means coaster service keeps scores for every workers, and dispatch jobs
>     based on those scores? Is there an alternative way, say FIFO algorithm?
> 
> 5. Reliability
>     I know that if a job failed, swift could resend the same job. But 
> does coaster have any error recovery mechanism built in?

No. It deliberately has none, in order to avoid obscuring errors.





More information about the Swift-devel mailing list