[Swift-devel] bug 53

Veronika Nefedova nefedova at mcs.anl.gov
Thu Sep 13 17:35:25 CDT 2007


Thanks, Mihael! I could try submitting now some 20 molecules to tg-uc  
(directly to GRAM) -- just to be on a safe side. If no GRAM problems  
will be reported, I'll increase the number to 244.
Of, course the performance will suffer greatly -- but I hope it would  
enable to get the whole workflow to go throw. Are there any throttles  
that could be set to increase a bit the performance (given that I set  
the maxSubmitRate to 0.2) ?'

Nika

On Sep 13, 2007, at 4:41 PM, Mihael Hategan wrote:

> Ok, so there's something in.
> There are some discussions that can be had on certain aesthetic  
> topics.
> In any event, in sites.xml, you can add, for a site, something like
> this:
>
> <profile namespace="karajan" key="maxSubmitRate">0.1</profile>
>
> The rate is in jobs per second. The above would mean one job every ten
> seconds.
>
> Mihael
>
> On Thu, 2007-09-13 at 15:23 +0000, Ben Clifford wrote:
>> Yes?
>>
>> On Thu, 13 Sep 2007, Mihael Hategan wrote:
>>
>>> May I still fix that bug though?
>>>
>>> On Thu, 2007-09-13 at 09:54 -0500, Ioan Raicu wrote:
>>>> Hi,
>>>> I am still working on the new feature for Falkon to avoid  
>>>> submitting
>>>> tasks to known bad nodes, and to perhaps do its own retries for  
>>>> failed
>>>> jobs with certain known errors (i.e. stale NFS handle).  I  
>>>> should have
>>>> that ready for next week to try out.  Once this new feature is  
>>>> in, we
>>>> could try MolDyn again to see how it behaves.
>>>>
>>>> About avoiding Falkon of MolDyn, I recall something about the
>>>> scalability/policies of GRAM/PBS to handle many con current jobs,
>>>> having to throttle job submissions to something around 1 job  
>>>> every 10
>>>> seconds (for sustained periods of time, short bursts could send
>>>> faster), and the fact that only a few 10s of nodes would be used
>>>> concurrently, even though the sites that it was running on had more
>>>> free nodes.  I also think that MolDyn through GRAM/PBS was running
>>>> only 1 job per node, in essence only using 1 processor of the 2 per
>>>> node.  I think the largest workflow Nika was able to run over  
>>>> GRAM/PBS
>>>> was 5 molecules, 421 jobs (but only 340 jobs in the large stage).
>>>> Nika, were there other problems you encountered?
>>>>
>>>> Ioan
>>>>
>>>> Mihael Hategan wrote:
>>>>> Very well Sir. I shall see to the priority of the issue being  
>>>>> raised.
>>>>>
>>>>> On Thu, 2007-09-13 at 14:09 +0000, Ben Clifford wrote:
>>>>>
>>>>>> I think one of the main impediments to moldyn running with  
>>>>>> GRAM directly
>>>>>> is bug 53 which is a request for sumission rate limiting.
>>>>>>
>>>>>> It might be relatively easy to implement that and see how the  
>>>>>> MolDyn
>>>>>> workflow behaves then.
>>>>>>
>>>>>> I'm interested to see if Falkon can be avoided for this workflow.
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Swift-devel mailing list
>>>>> Swift-devel at ci.uchicago.edu
>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> ============================================
>>>> Ioan Raicu
>>>> Ph.D. Student
>>>> ============================================
>>>> Distributed Systems Laboratory
>>>> Computer Science Department
>>>> University of Chicago
>>>> 1100 E. 58th Street, Ryerson Hall
>>>> Chicago, IL 60637
>>>> ============================================
>>>> Email: iraicu at cs.uchicago.edu
>>>> Web:   http://www.cs.uchicago.edu/~iraicu
>>>>        http://dsl.cs.uchicago.edu/
>>>> ============================================
>>>> ============================================
>>>
>>>
>>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>




More information about the Swift-devel mailing list