[Swift-devel] Re: 244 MolDyn run was successful!
Ioan Raicu
iraicu at cs.uchicago.edu
Mon Aug 27 13:19:31 CDT 2007
Yes it is, it is VERY different!
With GRAM/PBS, although the failed job only takes 10ms to fail, there is
about a 1 sec overhead to submit the job and get the error code. In
Falkon, the overhead is about 20ms. Also, in the time that the 1 node
was faulty (~30 sec), Falkon can submit and return about 1000 failed
tasks, while GRAM/PBS could only do about 15~30 failed jobs. The fact
that Falkon's submit/execute throughput is 2 orders of magnitude higher
than GRAM/PBS is what makes is different, and hence needs to be handled
different.
Ioan
Ben Clifford wrote:
> On Mon, 27 Aug 2007, Ioan Raicu wrote:
>
>
>> On a similar note, IMO, the heuristic in Karajan should be modified to take
>> into account the task execution time of the failed or successful task, and not
>> just the number of tasks. This would ensure that Swift is not throttling task
>> submission to Falkon when there are 1000s of successful tasks that take on the
>> order of 100s of second to complete, yet there are also 1000s of failed tasks
>> that are only 10 ms long. This is exactly the case with MolDyn, when we get a
>> bad node in a bunch of 100s of nodes, which ends up throttling the number of
>> active and running tasks to about 100, regardless of the number of processors
>> Falkon has.
>>
>
> Is that different from when submitting to PBS or GRAM where there are
> 1000s of successful tasks taking 100s of seconds to complete but with
> 1000s of failed tasks that are only 10ms long?
>
>
--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dsl.cs.uchicago.edu/
============================================
============================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070827/3f27093d/attachment.html>
More information about the Swift-devel
mailing list