[Swift-devel] CPU usage with provider-deef
Ioan Raicu
iraicu at cs.uchicago.edu
Sat Jun 16 09:35:52 CDT 2007
With Falkon, we had 34 machines, with 68 processors, running a job on
each processor. I think it took about 20 min. We then ran over GRAM,
but there are only 60 IA64 nodes (120 processors) at ANL, so when the 68
jobs got submitted, only 60 of them went in the run queue, and 8 of them
went in the wait queue.... there were enough processors to perform all
jobs at the same time, but I don't know how we were supposed to tweak
Swift to have it dispatch tasks per GRAM job, and perform both tasks in
parallel on both processors. I believe the total time for the GRAM2 run
was about 26 min. The extra round of 8 jobs (which Falkon didn't have)
took about 200 sec (3.4 min), so the rough improvement would have
probably been around 1~2.6 min (5~10%). That sounds about right with
the 0.~1 job/sec, so 68 jobs would have taken 68~136 or so seconds.
The comparison wasn't done scientifically, so don't quote the numbers
exactly, but Falkon was a bit faster. In Nika's workflow case, where
high throughput isn't essential, the big gain to use Falkon is the
scalability of the Falkon wait queue, and the resource provisioning,
once you get some resources, using them over and over to avoid the LRM
queue wait time for each job.
BTW, we ran a 20 molecule short run yesterday successfully, but we are
still having problems with the 100 molecule run in MolDyn. Its not
clear where the problem is, on the surface Falkon looks fine... we are
looking into where everything breaks to cause Swift to not continue with
the workflow to completion!
Ioan
Ben Clifford wrote:
> Cool.
>
> Did you try the long long run again with that change in place?
>
> Also what was the elapsed realtime for using GRAM2 vs Falkon for that 68
> node / ~15 minute workflow?
>
> On Sat, 16 Jun 2007, Ioan Raicu wrote:
>
>
>> Actually, it was a bug the Falkon provider, there was a tight polling loop on
>> a task queue even if it was empty... it got fixed with one line of code :)
>> its now running the CPU relatively idle for Nika's workflow which doesn't
>> require high throughputs.
>>
>> Thanks Yong for fixing it!
>>
>> Ioan
>>
>> Ben Clifford wrote:
>>
>>> It was running something like 68 jobs in 15 minutes. Kinda scary if each of
>>> those jobs needs 15 cpu.seconds on the submit side.
>>>
>>> On Sat, 16 Jun 2007, Mihael Hategan wrote:
>>>
>>>
>>>
>>>> That can either be good or bad. If the CPU is used doing meaningful
>>>> stuff, then it's good. In other words, I'm guessing that the job
>>>> throughput is also higher with Falkon.
>>>>
>>>> Mihael
>>>>
>>>> On Fri, 2007-06-15 at 16:48 +0000, Ben Clifford wrote:
>>>>
>>>>
>>>>> Yesterday, I was playing a bit with Ioan and Nika - they submitted a 68
>>>>> node / 15 minute workflow through provider-deef & falkon and saw the
>>>>> swift JVM on the submit node using about 100% CPU; then the same
>>>>> workflow running through the GT2 GRAM provider rather than provider-deef
>>>>> and falkon appeared to use significantly less.
>>>>>
>>>>> I wandered off at that point so don't know if any interesting results
>>>>> came after.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dsl.cs.uchicago.edu/
============================================
============================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070616/dce07879/attachment.html>
More information about the Swift-devel
mailing list