[Swift-devel] Re: several questions about coaster

Zhao Zhang zhaozhang at uchicago.edu
Tue Apr 28 15:42:08 CDT 2009


Hi, Mihael

Mihael Hategan wrote:
> On Tue, 2009-04-28 at 00:46 -0500, Zhao Zhang wrote:
>   
>> Hi, Mihael
>>
>> As I am going to test coaster deeply on various systems. I got several 
>> questions regarding to the coaster infrastructure.
>> 1. Scalability
>>     From source code, I could tell that coaster is using TCP for task 
>> transmission.(correct me if I am wrong) What is the largest
>>     scale test we have done with coaster? I mean the ratio between 
>> coaster dispatcher and the number of workers.
>>     
>
> Typical stuff on ranger. Around 1000 nodes.
>   
Btw, could coaster use all cores on a single compute node? I mean in a 
multi-core context.
>   
>> 2. Usability
>>     All tests I have tried with coaster were running together with 
>> swift. That is a black box test for me. Is there any interface in coaster
>>     that I could specify the number of workers and the wall time?
>>     
>
> Like all profile entries, they are task attributes.
>
>   
>>     Also, is there a way for me to start coaster service and workers 
>> separately and independently?
>>     
>
> The coaster provider is a provider like any other. Use the "job
> submission example" in cog and change the provider to "coaster".
>   
I found this link, but it is empty. 
http://wiki.cogkit.org/wiki/Java_Cog_Kit_Examples_Guide#Job_Submission
>   
>>     Besides, I am guessing coaster is using a dynamic provisioning 
>> approach to request resources, is this correct? Which means coaster
>>     will decide how many compute nodes to request according to the 
>> number of jobs, and the length of jobs.
>>     
>
> No.
>
> For each job, it will try to find the worker with the least time left
> that can still run the job. If no such worker can be found, it will try
> to start one with 10 times larger walltime up to a maximum of 250
> workers I think.
>
>   
>>  If I run coaster in a super computer
>>     context, can I ask coaster to hold a certain number of compute nodes 
>> for a certain amount of time? (This somehow overlaps the first Q in 
>> section 2)
>>     
>
> You don't seem to have tuned in much into recent discussion on this
> mailing list.
>
> No, you can't.
>   
I knew this from the discussion between you and Mike. So is this a 
feature that we are going to implement soon, or we haven't decided?
>   
>> 3. Performance
>>     Does coaster provide alternative interface other than the coaster 
>> provider? Say if I want to test the dispatch rate of coaster, but don't 
>> want to
>>     introduce swift overhead, which is a good way to start coaster?
>>     
>
> The abstraction api in cog has nothing to do with swift, so use that.
>   
err, I tried to find it on the cog kit wiki, but could not find it. 
Could you point me to somewhere handy that I could try it out?
>   
>>     Is there a coaster log that could show the number of active workers 
>> currently registered with the coaster service, how many jobs are running,
>>     how many jobs returned successful, and etc.?
>>     
>
> Yes. On the remote site, in ~/.globus/coasters
>   
I found those.

zhao
>   
>> 4. Dispatch Algorithm
>>     Does coaster use a scoring algorithm for dispatching jobs?
>>     
>
> No scoring algorithm. Read appropriate answer in (2).
>
>   
>>  Which 
>> means coaster service keeps scores for every workers, and dispatch jobs
>>     based on those scores? Is there an alternative way, say FIFO algorithm?
>>
>> 5. Reliability
>>     I know that if a job failed, swift could resend the same job. But 
>> does coaster have any error recovery mechanism built in?
>>     
>
> No. It deliberately has none, in order to avoid obscuring errors.
>
>
>
>   



More information about the Swift-devel mailing list