[Swift-devel] Re: 244 MolDyn run was successful!

Ioan Raicu iraicu at cs.uchicago.edu
Mon Aug 27 13:19:31 CDT 2007


Yes it is, it is VERY different! 

With GRAM/PBS, although the failed job only takes 10ms to fail, there is 
about a 1 sec overhead to submit the job and get the error code.  In 
Falkon, the overhead is about 20ms.  Also, in the time that the 1 node 
was faulty (~30 sec), Falkon can submit and return about 1000 failed 
tasks, while GRAM/PBS could only do about 15~30 failed jobs.  The fact 
that Falkon's submit/execute throughput is 2 orders of magnitude higher 
than GRAM/PBS is what makes is different, and hence needs to be handled 
different.

Ioan

Ben Clifford wrote:
> On Mon, 27 Aug 2007, Ioan Raicu wrote:
>
>   
>> On a similar note, IMO, the heuristic in Karajan should be modified to take
>> into account the task execution time of the failed or successful task, and not
>> just the number of tasks.  This would ensure that Swift is not throttling task
>> submission to Falkon when there are 1000s of successful tasks that take on the
>> order of 100s of second to complete, yet there are also 1000s of failed tasks
>> that are only 10 ms long.  This is exactly the case with MolDyn, when we get a
>> bad node in a bunch of 100s of nodes, which ends up throttling the number of
>> active and running tasks to about 100, regardless of the number of processors
>> Falkon has. 
>>     
>
> Is that different from when submitting to PBS or GRAM where there are 
> 1000s of successful tasks taking 100s of seconds to complete but with 
> 1000s of failed tasks that are only 10ms long?
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070827/3f27093d/attachment.html>


More information about the Swift-devel mailing list