[Swift-devel] CPU usage with provider-deef

Sat Jun 16 09:35:52 CDT 2007

With Falkon, we had 34 machines, with 68 processors, running a job on 
each processor.  I think it took about 20 min.  We then ran over GRAM, 
but there are only 60 IA64 nodes (120 processors) at ANL, so when the 68 
jobs got submitted, only 60 of them went in the run queue, and 8 of them 
went in the wait queue.... there were enough processors to perform all 
jobs at the same time, but I don't know how we were supposed to tweak 
Swift to have it dispatch tasks per GRAM job, and perform both tasks in 
parallel on both processors.  I believe the total time for the GRAM2 run 
was about 26 min.  The extra round of 8 jobs (which Falkon didn't have) 
took about 200 sec (3.4 min), so the rough improvement would have 
probably been around 1~2.6 min (5~10%).  That sounds about right with 
the 0.~1 job/sec, so 68 jobs would have taken 68~136 or so seconds. 

The comparison wasn't done scientifically, so don't quote the numbers 
exactly, but Falkon was a bit faster.  In Nika's workflow case, where 
high throughput isn't essential, the big gain to use Falkon is the 
scalability of the Falkon wait queue, and the resource provisioning, 
once you get some resources, using them over and over to avoid the LRM 
queue wait time for each job.

BTW, we ran a 20 molecule short run yesterday successfully, but we are 
still having problems with the 100 molecule run in MolDyn.  Its not 
clear where the problem is, on the surface Falkon looks fine... we are 
looking into where everything breaks to cause Swift to not continue with 
the workflow to completion!

Ioan

Ben Clifford wrote:
> Cool.
>
> Did you try the long long run again with that change in place?
>
> Also what was the elapsed realtime for using GRAM2 vs Falkon for that 68 
> node / ~15 minute workflow?
>
> On Sat, 16 Jun 2007, Ioan Raicu wrote:
>
>   
>> Actually, it was a bug the Falkon provider, there was a tight polling loop on
>> a task queue even if it was empty... it got fixed with one line of code :)
>> its now running the CPU relatively idle for Nika's workflow which doesn't
>> require high throughputs.
>>
>> Thanks Yong for fixing it!
>>
>> Ioan
>>
>> Ben Clifford wrote:
>>     
>>> It was running something like 68 jobs in 15 minutes. Kinda scary if each of
>>> those jobs needs 15 cpu.seconds on the submit side.
>>>
>>> On Sat, 16 Jun 2007, Mihael Hategan wrote:
>>>
>>>   
>>>       
>>>> That can either be good or bad. If the CPU is used doing meaningful
>>>> stuff, then it's good. In other words, I'm guessing that the job
>>>> throughput is also higher with Falkon.
>>>>
>>>> Mihael
>>>>
>>>> On Fri, 2007-06-15 at 16:48 +0000, Ben Clifford wrote:
>>>>     
>>>>         
>>>>> Yesterday, I was playing a bit with Ioan and Nika - they submitted a 68
>>>>> node / 15 minute workflow through provider-deef & falkon and saw the
>>>>> swift JVM on the submit node using about 100% CPU; then the same
>>>>> workflow running through the GT2 GRAM provider rather than provider-deef
>>>>> and falkon appeared to use significantly less.
>>>>>
>>>>> I wandered off at that point so don't know if any interesting results
>>>>> came after.
>>>>>
>>>>>       
>>>>>           
>>>>     
>>>>         
>>>   
>>>       
>>     
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070616/dce07879/attachment.html>