[Swift-devel] bug 53
Ioan Raicu
iraicu at cs.uchicago.edu
Fri Sep 14 09:42:19 CDT 2007
The virtual cluster work is Catalin's... he has everything ready, and is
stuck at the application stage.... has been for over a week, and you
haven't been too responsive (i.e. we still don't know how to re-compile
all the different stages of the app) :( Maybe we'll change the app to
something else (simpler) that we can manage ourselves when trouble
arises. Myself, I have been busy with my data caching work, the central
part of my dissertation...
I should have the new feature to avoid bad nodes in by early next week.
In the meantime, can you try our Mihael's new fix which throttles the
submission rate?
Ioan
Veronika Nefedova wrote:
> Ioan, how your work on that 'avoiding bad node' thing is progressing?
> You seem to be more interested in running my workflow on a virtual
> cluster rather then working on a new feature that would enable
> MolDyn to run reliably on TG... I apologize if I am wrong - the lack
> of information made me to come to this conclusion; please provide me
> with a relevant information and an estimate on when I can expect
> Falcon to be ready for a new rounds of tests.
>
> Thanks,
>
> Nika
>
>
>
> On Sep 13, 2007, at 5:48 PM, Ioan Raicu wrote:
>
>> It would be good to have some comparison numbers, so I think its
>> worth doing to see if the workflow will complete, and to see what
>> performance it gets!
>> Ioan
>>
>> Veronika Nefedova wrote:
>>> Thanks, Mihael! I could try submitting now some 20 molecules to
>>> tg-uc (directly to GRAM) -- just to be on a safe side. If no GRAM
>>> problems will be reported, I'll increase the number to 244.
>>> Of, course the performance will suffer greatly -- but I hope it
>>> would enable to get the whole workflow to go throw. Are there any
>>> throttles that could be set to increase a bit the performance (given
>>> that I set the maxSubmitRate to 0.2) ?'
>>>
>>> Nika
>>>
>>> On Sep 13, 2007, at 4:41 PM, Mihael Hategan wrote:
>>>
>>>> Ok, so there's something in.
>>>> There are some discussions that can be had on certain aesthetic
>>>> topics.
>>>> In any event, in sites.xml, you can add, for a site, something like
>>>> this:
>>>>
>>>> <profile namespace="karajan" key="maxSubmitRate">0.1</profile>
>>>>
>>>> The rate is in jobs per second. The above would mean one job every ten
>>>> seconds.
>>>>
>>>> Mihael
>>>>
>>>> On Thu, 2007-09-13 at 15:23 +0000, Ben Clifford wrote:
>>>>> Yes?
>>>>>
>>>>> On Thu, 13 Sep 2007, Mihael Hategan wrote:
>>>>>
>>>>>> May I still fix that bug though?
>>>>>>
>>>>>> On Thu, 2007-09-13 at 09:54 -0500, Ioan Raicu wrote:
>>>>>>> Hi,
>>>>>>> I am still working on the new feature for Falkon to avoid
>>>>>>> submitting
>>>>>>> tasks to known bad nodes, and to perhaps do its own retries for
>>>>>>> failed
>>>>>>> jobs with certain known errors (i.e. stale NFS handle). I
>>>>>>> should have
>>>>>>> that ready for next week to try out. Once this new feature is
>>>>>>> in, we
>>>>>>> could try MolDyn again to see how it behaves.
>>>>>>>
>>>>>>> About avoiding Falkon of MolDyn, I recall something about the
>>>>>>> scalability/policies of GRAM/PBS to handle many con current jobs,
>>>>>>> having to throttle job submissions to something around 1 job
>>>>>>> every 10
>>>>>>> seconds (for sustained periods of time, short bursts could send
>>>>>>> faster), and the fact that only a few 10s of nodes would be used
>>>>>>> concurrently, even though the sites that it was running on had more
>>>>>>> free nodes. I also think that MolDyn through GRAM/PBS was running
>>>>>>> only 1 job per node, in essence only using 1 processor of the 2 per
>>>>>>> node. I think the largest workflow Nika was able to run over
>>>>>>> GRAM/PBS
>>>>>>> was 5 molecules, 421 jobs (but only 340 jobs in the large stage).
>>>>>>> Nika, were there other problems you encountered?
>>>>>>>
>>>>>>> Ioan
>>>>>>>
>>>>>>> Mihael Hategan wrote:
>>>>>>>> Very well Sir. I shall see to the priority of the issue being
>>>>>>>> raised.
>>>>>>>>
>>>>>>>> On Thu, 2007-09-13 at 14:09 +0000, Ben Clifford wrote:
>>>>>>>>
>>>>>>>>> I think one of the main impediments to moldyn running with
>>>>>>>>> GRAM directly
>>>>>>>>> is bug 53 which is a request for sumission rate limiting.
>>>>>>>>>
>>>>>>>>> It might be relatively easy to implement that and see how the
>>>>>>>>> MolDyn
>>>>>>>>> workflow behaves then.
>>>>>>>>>
>>>>>>>>> I'm interested to see if Falkon can be avoided for this workflow.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Swift-devel mailing list
>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ============================================
>>>>>>> Ioan Raicu
>>>>>>> Ph.D. Student
>>>>>>> ============================================
>>>>>>> Distributed Systems Laboratory
>>>>>>> Computer Science Department
>>>>>>> University of Chicago
>>>>>>> 1100 E. 58th Street, Ryerson Hall
>>>>>>> Chicago, IL 60637
>>>>>>> ============================================
>>>>>>> Email: iraicu at cs.uchicago.edu
>>>>>>> Web: http://www.cs.uchicago.edu/~iraicu
>>>>>>> http://dsl.cs.uchicago.edu/
>>>>>>> ============================================
>>>>>>> ============================================
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>
>> --
>> ============================================
>> Ioan Raicu
>> Ph.D. Student
>> ============================================
>> Distributed Systems Laboratory
>> Computer Science Department
>> University of Chicago
>> 1100 E. 58th Street, Ryerson Hall
>> Chicago, IL 60637
>> ============================================
>> Email: iraicu at cs.uchicago.edu
>> Web: http://www.cs.uchicago.edu/~iraicu
>> http://dsl.cs.uchicago.edu/
>> ============================================
>> ============================================
>>
>
>
--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dsl.cs.uchicago.edu/
============================================
============================================
More information about the Swift-devel
mailing list