[Swift-devel] Re: several questions about coaster
Zhao Zhang
zhaozhang at uchicago.edu
Tue Apr 28 15:42:08 CDT 2009
Hi, Mihael
Mihael Hategan wrote:
> On Tue, 2009-04-28 at 00:46 -0500, Zhao Zhang wrote:
>
>> Hi, Mihael
>>
>> As I am going to test coaster deeply on various systems. I got several
>> questions regarding to the coaster infrastructure.
>> 1. Scalability
>> From source code, I could tell that coaster is using TCP for task
>> transmission.(correct me if I am wrong) What is the largest
>> scale test we have done with coaster? I mean the ratio between
>> coaster dispatcher and the number of workers.
>>
>
> Typical stuff on ranger. Around 1000 nodes.
>
Btw, could coaster use all cores on a single compute node? I mean in a
multi-core context.
>
>> 2. Usability
>> All tests I have tried with coaster were running together with
>> swift. That is a black box test for me. Is there any interface in coaster
>> that I could specify the number of workers and the wall time?
>>
>
> Like all profile entries, they are task attributes.
>
>
>> Also, is there a way for me to start coaster service and workers
>> separately and independently?
>>
>
> The coaster provider is a provider like any other. Use the "job
> submission example" in cog and change the provider to "coaster".
>
I found this link, but it is empty.
http://wiki.cogkit.org/wiki/Java_Cog_Kit_Examples_Guide#Job_Submission
>
>> Besides, I am guessing coaster is using a dynamic provisioning
>> approach to request resources, is this correct? Which means coaster
>> will decide how many compute nodes to request according to the
>> number of jobs, and the length of jobs.
>>
>
> No.
>
> For each job, it will try to find the worker with the least time left
> that can still run the job. If no such worker can be found, it will try
> to start one with 10 times larger walltime up to a maximum of 250
> workers I think.
>
>
>> If I run coaster in a super computer
>> context, can I ask coaster to hold a certain number of compute nodes
>> for a certain amount of time? (This somehow overlaps the first Q in
>> section 2)
>>
>
> You don't seem to have tuned in much into recent discussion on this
> mailing list.
>
> No, you can't.
>
I knew this from the discussion between you and Mike. So is this a
feature that we are going to implement soon, or we haven't decided?
>
>> 3. Performance
>> Does coaster provide alternative interface other than the coaster
>> provider? Say if I want to test the dispatch rate of coaster, but don't
>> want to
>> introduce swift overhead, which is a good way to start coaster?
>>
>
> The abstraction api in cog has nothing to do with swift, so use that.
>
err, I tried to find it on the cog kit wiki, but could not find it.
Could you point me to somewhere handy that I could try it out?
>
>> Is there a coaster log that could show the number of active workers
>> currently registered with the coaster service, how many jobs are running,
>> how many jobs returned successful, and etc.?
>>
>
> Yes. On the remote site, in ~/.globus/coasters
>
I found those.
zhao
>
>> 4. Dispatch Algorithm
>> Does coaster use a scoring algorithm for dispatching jobs?
>>
>
> No scoring algorithm. Read appropriate answer in (2).
>
>
>> Which
>> means coaster service keeps scores for every workers, and dispatch jobs
>> based on those scores? Is there an alternative way, say FIFO algorithm?
>>
>> 5. Reliability
>> I know that if a job failed, swift could resend the same job. But
>> does coaster have any error recovery mechanism built in?
>>
>
> No. It deliberately has none, in order to avoid obscuring errors.
>
>
>
>
More information about the Swift-devel
mailing list