[Swift-devel] bug 53

Veronika Nefedova nefedova at mcs.anl.gov
Fri Sep 14 08:59:29 CDT 2007


Ioan, how your work on that 'avoiding bad node' thing is progressing?  
You seem to be more interested in running my workflow on a virtual  
cluster  rather then working on a new feature  that would enable  
MolDyn to run reliably on TG... I apologize if I am wrong - the lack  
of information made me to come to this conclusion; please provide me  
with a relevant information and an estimate on when I can expect  
Falcon to be ready for a new rounds of tests.

Thanks,

Nika



On Sep 13, 2007, at 5:48 PM, Ioan Raicu wrote:

> It would be good to have some comparison numbers, so I think its  
> worth doing to see if the workflow will complete, and to see what  
> performance it gets!
> Ioan
>
> Veronika Nefedova wrote:
>> Thanks, Mihael! I could try submitting now some 20 molecules to tg- 
>> uc (directly to GRAM) -- just to be on a safe side. If no GRAM  
>> problems will be reported, I'll increase the number to 244.
>> Of, course the performance will suffer greatly -- but I hope it  
>> would enable to get the whole workflow to go throw. Are there any  
>> throttles that could be set to increase a bit the performance  
>> (given that I set the maxSubmitRate to 0.2) ?'
>>
>> Nika
>>
>> On Sep 13, 2007, at 4:41 PM, Mihael Hategan wrote:
>>
>>> Ok, so there's something in.
>>> There are some discussions that can be had on certain aesthetic  
>>> topics.
>>> In any event, in sites.xml, you can add, for a site, something like
>>> this:
>>>
>>> <profile namespace="karajan" key="maxSubmitRate">0.1</profile>
>>>
>>> The rate is in jobs per second. The above would mean one job  
>>> every ten
>>> seconds.
>>>
>>> Mihael
>>>
>>> On Thu, 2007-09-13 at 15:23 +0000, Ben Clifford wrote:
>>>> Yes?
>>>>
>>>> On Thu, 13 Sep 2007, Mihael Hategan wrote:
>>>>
>>>>> May I still fix that bug though?
>>>>>
>>>>> On Thu, 2007-09-13 at 09:54 -0500, Ioan Raicu wrote:
>>>>>> Hi,
>>>>>> I am still working on the new feature for Falkon to avoid  
>>>>>> submitting
>>>>>> tasks to known bad nodes, and to perhaps do its own retries  
>>>>>> for failed
>>>>>> jobs with certain known errors (i.e. stale NFS handle).  I  
>>>>>> should have
>>>>>> that ready for next week to try out.  Once this new feature is  
>>>>>> in, we
>>>>>> could try MolDyn again to see how it behaves.
>>>>>>
>>>>>> About avoiding Falkon of MolDyn, I recall something about the
>>>>>> scalability/policies of GRAM/PBS to handle many con current jobs,
>>>>>> having to throttle job submissions to something around 1 job  
>>>>>> every 10
>>>>>> seconds (for sustained periods of time, short bursts could send
>>>>>> faster), and the fact that only a few 10s of nodes would be used
>>>>>> concurrently, even though the sites that it was running on had  
>>>>>> more
>>>>>> free nodes.  I also think that MolDyn through GRAM/PBS was  
>>>>>> running
>>>>>> only 1 job per node, in essence only using 1 processor of the  
>>>>>> 2 per
>>>>>> node.  I think the largest workflow Nika was able to run over  
>>>>>> GRAM/PBS
>>>>>> was 5 molecules, 421 jobs (but only 340 jobs in the large stage).
>>>>>> Nika, were there other problems you encountered?
>>>>>>
>>>>>> Ioan
>>>>>>
>>>>>> Mihael Hategan wrote:
>>>>>>> Very well Sir. I shall see to the priority of the issue being  
>>>>>>> raised.
>>>>>>>
>>>>>>> On Thu, 2007-09-13 at 14:09 +0000, Ben Clifford wrote:
>>>>>>>
>>>>>>>> I think one of the main impediments to moldyn running with  
>>>>>>>> GRAM directly
>>>>>>>> is bug 53 which is a request for sumission rate limiting.
>>>>>>>>
>>>>>>>> It might be relatively easy to implement that and see how  
>>>>>>>> the MolDyn
>>>>>>>> workflow behaves then.
>>>>>>>>
>>>>>>>> I'm interested to see if Falkon can be avoided for this  
>>>>>>>> workflow.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Swift-devel mailing list
>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> ============================================
>>>>>> Ioan Raicu
>>>>>> Ph.D. Student
>>>>>> ============================================
>>>>>> Distributed Systems Laboratory
>>>>>> Computer Science Department
>>>>>> University of Chicago
>>>>>> 1100 E. 58th Street, Ryerson Hall
>>>>>> Chicago, IL 60637
>>>>>> ============================================
>>>>>> Email: iraicu at cs.uchicago.edu
>>>>>> Web:   http://www.cs.uchicago.edu/~iraicu
>>>>>>        http://dsl.cs.uchicago.edu/
>>>>>> ============================================
>>>>>> ============================================
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>       http://dsl.cs.uchicago.edu/
> ============================================
> ============================================
>




More information about the Swift-devel mailing list