[Swift-devel] bug 53

Ioan Raicu iraicu at cs.uchicago.edu
Tue Sep 18 13:10:48 CDT 2007


The viper firewall settings have not changed in a long time, before Nika 
and the Swift team got accounts on it.

Here are the settings:
TCP Ports: 50000:60000 8080 8443
UDP Ports: 50000:60000

Now, just to double check if it is the firewall, I have disabled the 
firewall temporarily until we know for sure it is the firewall that is 
causing the trouble. 

Ioan
PS: MolDyn with 244 mol was running out of memory on machines with 0.5 
or 1GB of memory... viper has 2GB, and it seems to have held up fine 
through the various large runs we made... so if you want to switch to a 
different machine, just make sure to have at least 2GB of memory so you 
can give the JVM the maximum memory of 1.5GB.


Ben Clifford wrote:
> sounds like viper had firewall configuration changed recently. viper 
> sysadmin needs to help debug basic job submission with simple globus tools 
> before that machine is worth using again.
>
> On Tue, 18 Sep 2007, Michael Wilde wrote:
>
>   
>> does the cog equivalent of globus_tcp_source_range also need to be set?
>> is that only for gridftp, or gram as well?  or could this be a gridftp hang?
>>
>> - mike
>>
>> Ben Clifford wrote:
>>     
>>> can you submit a job using globus-job-run?
>>>
>>> On Tue, 18 Sep 2007, Veronika Nefedova wrote:
>>>
>>>       
>>>> I set tcp.port.range in swift properties but even a simple helloworld
>>>> workflow
>>>> hangs (the  submit host doesn't receive the notification from the compute
>>>> host
>>>> that the job has finished).
>>>> tcp.port.range=50000,60000
>>>>
>>>> Not sure what else has changed on viper? It used to be a very good submit
>>>> host, I never had any problems with it );
>>>>
>>>> Nika
>>>>
>>>> On Sep 18, 2007, at 9:13 AM, Mihael Hategan wrote:
>>>>
>>>>         
>>>>> Should pick that one. If not ~/.globus/cog.properties ->
>>>>> tcp.port.range=begin,end
>>>>>
>>>>> On Tue, 2007-09-18 at 07:42 +0000, Ben Clifford wrote:
>>>>>           
>>>>>> Not sure if cog picks up the GLOBUS_whatever environment variables.
>>>>>> Mihael
>>>>>> presumably knows.
>>>>>>
>>>>>> On Mon, 17 Sep 2007, Ioan Raicu wrote:
>>>>>>
>>>>>>             
>>>>>>> There is a firewall on viper.  Ports 50000 - 60000 are open for TCP.
>>>>>>> You
>>>>>>> might want to set the TCP_PORT_RANGE (I am not sure this is the
>>>>>>> exact
>>>>>>> environment variable, but something like that) to be between 50K and
>>>>>>> 60K
>>>>>>> ports
>>>>>>> to ensure that GT4 uses one of these open ports.
>>>>>>> Ioan
>>>>>>>
>>>>>>> Veronika Nefedova wrote:
>>>>>>>               
>>>>>>>> The same. You can check the job's status in its log on viper in
>>>>>>>> ~nefedova/alamines/MolDyn-244-loops-20070917-1356-h95gxij8.log.
>>>>>>>>
>>>>>>>> The job is still runnning (i.e. hanging) with the same symptom as
>>>>>>>> before:
>>>>>>>> the first jobs is done and then nothing else gets submitted (the
>>>>>>>> submit host
>>>>>>>> doesn't receive any notification that the job has finished).
>>>>>>>>
>>>>>>>> NIka
>>>>>>>>
>>>>>>>> On Sep 17, 2007, at 9:51 AM, Mihael Hategan wrote:
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> On Mon, 2007-09-17 at 09:41 -0500, Veronika Nefedova wrote:
>>>>>>>>>                   
>>>>>>>>>> I did 'svn up' in cog directory and then did 'ant dist' in the
>>>>>>>>>> same
>>>>>>>>>> directory.
>>>>>>>>>>                     
>>>>>>>>> 'ant dist' should be done in the swift directory.
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> My 'svn info' gives me r1740.
>>>>>>>>>>
>>>>>>>>>> On Sep 17, 2007, at 8:55 AM, Mihael Hategan wrote:
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> Did you update cog?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 2007-09-17 at 08:38 -0500, Veronika Nefedova wrote:
>>>>>>>>>>>                       
>>>>>>>>>>>> No, I've tried with r1740, it still hanged (timed out).
>>>>>>>>>>>> the log is on viper:/home/nefedova/alamines/MolDyn-244-
>>>>>>>>>>>> loops-20070914-1834-pvhyji75.log
>>>>>>>>>>>>
>>>>>>>>>>>> NIka
>>>>>>>>>>>>
>>>>>>>>>>>> On Sep 15, 2007, at 10:59 AM, Mihael Hategan wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>>>> On Sat, 2007-09-15 at 09:06 +0000, Ben Clifford wrote:
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>> On Fri, 14 Sep 2007, Mihael Hategan wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> On Thu, 2007-09-13 at 16:41 -0500, Mihael Hategan
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>>>> Ok, so there's something in.
>>>>>>>>>>>>>>>>                                 
>>>>>>>>>>>>>>> That something was throttling a bit too much (not
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>> jobs,
>>>>>>>>>>>>>>> but all
>>>>>>>>>>>>>>> tasks on that site). I need to take a second look at
>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>> Is that fixed by cog r1740? It looks like that commit
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> intended to.
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>> It's an attempt to fix it, but it needs to be confirmed
>>>>>>>>>>>>> by
>>>>>>>>>>>>> Nika.
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Swift-devel mailing list
>>>>>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>> _______________________________________________
>>>>>>>> Swift-devel mailing list
>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>
>>>>>>>>                 
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>
>>>>>>             
>>>>> _______________________________________________
>>>>> Swift-devel mailing list
>>>>> Swift-devel at ci.uchicago.edu
>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>
>>>>>           
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>>>       
>>     
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20070918/be95e35b/attachment.html>


More information about the Swift-devel mailing list