[Swift-devel] bug 53

Veronika Nefedova nefedova at mcs.anl.gov
Tue Sep 18 13:25:53 CDT 2007


Viper was used only because *no* other machine could handle the size  
of moldyn workflow (not at ANL and not at ci). Now, since the code  
has been reduced dramatically, its quite possible to use terminable  
-- and I've switched to running the tests from terminable this morning.

NIka

On Sep 18, 2007, at 1:13 PM, Ian Foster wrote:

> It seems ridiculous to me that we are still using a student- 
> supported machine to run major applications. Surely we should have  
> one highly capable, well-maintained machine for this? And this  
> shouldn't be a "suggestion" but a clear policy.
>
>
> Sent via BlackBerry from T-Mobile
>
> -----Original Message-----
> From: Ioan Raicu <iraicu at cs.uchicago.edu>
>
> Date: Tue, 18 Sep 2007 13:13:08
> To:Michael Wilde <wilde at mcs.anl.gov>
> Cc:swift-devel at ci.uchicago.edu
> Subject: Re: [Swift-devel] bug 53
>
>
>
>
> Michael Wilde wrote:
>> Its not clear when this happened, as Nika and Ioan's workflow
>> submission from viper has afaik been mostly through Falkon for  
>> quite a
>> while now.
>>
>> Nika, perhaps you can shift back to trying the two Falkon approaches
>> (with higher prio on testing Ioan's retry code) in the meantime.
>>
>> Ioan, is CI Support / Ti supporting viper, or are you the "sysadmin"
>> Ben is referring to?
>>
> Yes, I am viper's support.  viper is my department office machine.
>> Ive also suggested in the past that we focus on using evitable and
>> terminable (and swift03/04) as our main submit hosts, primarily for
>> support and coordination reasons.  Is this a good time to try the
>> GRAM/non-Falkon workfow there?
> Sure, but watch out for the large MolDyn runs as 1GB or less of memory
> is not enough for 244 mol runs.
>
> Ioan
>>
>> - Mike
>>
>>
>> Ben Clifford wrote:
>>> sounds like viper had firewall configuration changed recently. viper
>>> sysadmin needs to help debug basic job submission with simple globus
>>> tools before that machine is worth using again.
>>>
>>> On Tue, 18 Sep 2007, Michael Wilde wrote:
>>>
>>>> does the cog equivalent of globus_tcp_source_range also need to  
>>>> be set?
>>>> is that only for gridftp, or gram as well?  or could this be a
>>>> gridftp hang?
>>>>
>>>> - mike
>>>>
>>>> Ben Clifford wrote:
>>>>> can you submit a job using globus-job-run?
>>>>>
>>>>> On Tue, 18 Sep 2007, Veronika Nefedova wrote:
>>>>>
>>>>>> I set tcp.port.range in swift properties but even a simple  
>>>>>> helloworld
>>>>>> workflow
>>>>>> hangs (the  submit host doesn't receive the notification from the
>>>>>> compute
>>>>>> host
>>>>>> that the job has finished).
>>>>>> tcp.port.range=50000,60000
>>>>>>
>>>>>> Not sure what else has changed on viper? It used to be a very  
>>>>>> good
>>>>>> submit
>>>>>> host, I never had any problems with it );
>>>>>>
>>>>>> Nika
>>>>>>
>>>>>> On Sep 18, 2007, at 9:13 AM, Mihael Hategan wrote:
>>>>>>
>>>>>>> Should pick that one. If not ~/.globus/cog.properties ->
>>>>>>> tcp.port.range=begin,end
>>>>>>>
>>>>>>> On Tue, 2007-09-18 at 07:42 +0000, Ben Clifford wrote:
>>>>>>>> Not sure if cog picks up the GLOBUS_whatever environment  
>>>>>>>> variables.
>>>>>>>> Mihael
>>>>>>>> presumably knows.
>>>>>>>>
>>>>>>>> On Mon, 17 Sep 2007, Ioan Raicu wrote:
>>>>>>>>
>>>>>>>>> There is a firewall on viper.  Ports 50000 - 60000 are open  
>>>>>>>>> for
>>>>>>>>> TCP.
>>>>>>>>> You
>>>>>>>>> might want to set the TCP_PORT_RANGE (I am not sure this is  
>>>>>>>>> the
>>>>>>>>> exact
>>>>>>>>> environment variable, but something like that) to be between
>>>>>>>>> 50K and
>>>>>>>>> 60K
>>>>>>>>> ports
>>>>>>>>> to ensure that GT4 uses one of these open ports.
>>>>>>>>> Ioan
>>>>>>>>>
>>>>>>>>> Veronika Nefedova wrote:
>>>>>>>>>> The same. You can check the job's status in its log on  
>>>>>>>>>> viper in
>>>>>>>>>> ~nefedova/alamines/MolDyn-244-loops-20070917-1356- 
>>>>>>>>>> h95gxij8.log.
>>>>>>>>>>
>>>>>>>>>> The job is still runnning (i.e. hanging) with the same  
>>>>>>>>>> symptom as
>>>>>>>>>> before:
>>>>>>>>>> the first jobs is done and then nothing else gets  
>>>>>>>>>> submitted (the
>>>>>>>>>> submit host
>>>>>>>>>> doesn't receive any notification that the job has finished).
>>>>>>>>>>
>>>>>>>>>> NIka
>>>>>>>>>>
>>>>>>>>>> On Sep 17, 2007, at 9:51 AM, Mihael Hategan wrote:
>>>>>>>>>>
>>>>>>>>>>> On Mon, 2007-09-17 at 09:41 -0500, Veronika Nefedova wrote:
>>>>>>>>>>>> I did 'svn up' in cog directory and then did 'ant dist'  
>>>>>>>>>>>> in the
>>>>>>>>>>>> same
>>>>>>>>>>>> directory.
>>>>>>>>>>> 'ant dist' should be done in the swift directory.
>>>>>>>>>>>
>>>>>>>>>>>> My 'svn info' gives me r1740.
>>>>>>>>>>>>
>>>>>>>>>>>> On Sep 17, 2007, at 8:55 AM, Mihael Hategan wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Did you update cog?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, 2007-09-17 at 08:38 -0500, Veronika Nefedova  
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> No, I've tried with r1740, it still hanged (timed out).
>>>>>>>>>>>>>> the log is on viper:/home/nefedova/alamines/MolDyn-244-
>>>>>>>>>>>>>> loops-20070914-1834-pvhyji75.log
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> NIka
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sep 15, 2007, at 10:59 AM, Mihael Hategan wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, 2007-09-15 at 09:06 +0000, Ben Clifford wrote:
>>>>>>>>>>>>>>>> On Fri, 14 Sep 2007, Mihael Hategan wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, 2007-09-13 at 16:41 -0500, Mihael Hategan
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Ok, so there's something in.
>>>>>>>>>>>>>>>>> That something was throttling a bit too much (not
>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>> jobs,
>>>>>>>>>>>>>>>>> but all
>>>>>>>>>>>>>>>>> tasks on that site). I need to take a second look at
>>>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>> Is that fixed by cog r1740? It looks like that commit
>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>> intended to.
>>>>>>>>>>>>>>> It's an attempt to fix it, but it needs to be confirmed
>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>> Nika.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>> Swift-devel mailing list
>>>>>>>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Swift-devel mailing list
>>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Swift-devel mailing list
>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Swift-devel mailing list
>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>
>>>>> _______________________________________________
>>>>> Swift-devel mailing list
>>>>> Swift-devel at ci.uchicago.edu
>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>
>>>>>
>>>>
>>>
>>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>        http://dsl.cs.uchicago.edu/
> ============================================
> ============================================
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list