[Swift-devel] bug 53

Ian Foster itf at mcs.anl.gov
Tue Sep 18 13:13:23 CDT 2007


It seems ridiculous to me that we are still using a student-supported machine to run major applications. Surely we should have one highly capable, well-maintained machine for this? And this shouldn't be a "suggestion" but a clear policy.


Sent via BlackBerry from T-Mobile

-----Original Message-----
From: Ioan Raicu <iraicu at cs.uchicago.edu>

Date: Tue, 18 Sep 2007 13:13:08 
To:Michael Wilde <wilde at mcs.anl.gov>
Cc:swift-devel at ci.uchicago.edu
Subject: Re: [Swift-devel] bug 53




Michael Wilde wrote:
> Its not clear when this happened, as Nika and Ioan's workflow 
> submission from viper has afaik been mostly through Falkon for quite a 
> while now.
>
> Nika, perhaps you can shift back to trying the two Falkon approaches 
> (with higher prio on testing Ioan's retry code) in the meantime.
>
> Ioan, is CI Support / Ti supporting viper, or are you the "sysadmin" 
> Ben is referring to?
>
Yes, I am viper's support.  viper is my department office machine.
> Ive also suggested in the past that we focus on using evitable and 
> terminable (and swift03/04) as our main submit hosts, primarily for 
> support and coordination reasons.  Is this a good time to try the 
> GRAM/non-Falkon workfow there?
Sure, but watch out for the large MolDyn runs as 1GB or less of memory 
is not enough for 244 mol runs. 

Ioan
>
> - Mike
>
>
> Ben Clifford wrote:
>> sounds like viper had firewall configuration changed recently. viper 
>> sysadmin needs to help debug basic job submission with simple globus 
>> tools before that machine is worth using again.
>>
>> On Tue, 18 Sep 2007, Michael Wilde wrote:
>>
>>> does the cog equivalent of globus_tcp_source_range also need to be set?
>>> is that only for gridftp, or gram as well?  or could this be a 
>>> gridftp hang?
>>>
>>> - mike
>>>
>>> Ben Clifford wrote:
>>>> can you submit a job using globus-job-run?
>>>>
>>>> On Tue, 18 Sep 2007, Veronika Nefedova wrote:
>>>>
>>>>> I set tcp.port.range in swift properties but even a simple helloworld
>>>>> workflow
>>>>> hangs (the  submit host doesn't receive the notification from the 
>>>>> compute
>>>>> host
>>>>> that the job has finished).
>>>>> tcp.port.range=50000,60000
>>>>>
>>>>> Not sure what else has changed on viper? It used to be a very good 
>>>>> submit
>>>>> host, I never had any problems with it );
>>>>>
>>>>> Nika
>>>>>
>>>>> On Sep 18, 2007, at 9:13 AM, Mihael Hategan wrote:
>>>>>
>>>>>> Should pick that one. If not ~/.globus/cog.properties ->
>>>>>> tcp.port.range=begin,end
>>>>>>
>>>>>> On Tue, 2007-09-18 at 07:42 +0000, Ben Clifford wrote:
>>>>>>> Not sure if cog picks up the GLOBUS_whatever environment variables.
>>>>>>> Mihael
>>>>>>> presumably knows.
>>>>>>>
>>>>>>> On Mon, 17 Sep 2007, Ioan Raicu wrote:
>>>>>>>
>>>>>>>> There is a firewall on viper.  Ports 50000 - 60000 are open for 
>>>>>>>> TCP.
>>>>>>>> You
>>>>>>>> might want to set the TCP_PORT_RANGE (I am not sure this is the
>>>>>>>> exact
>>>>>>>> environment variable, but something like that) to be between 
>>>>>>>> 50K and
>>>>>>>> 60K
>>>>>>>> ports
>>>>>>>> to ensure that GT4 uses one of these open ports.
>>>>>>>> Ioan
>>>>>>>>
>>>>>>>> Veronika Nefedova wrote:
>>>>>>>>> The same. You can check the job's status in its log on viper in
>>>>>>>>> ~nefedova/alamines/MolDyn-244-loops-20070917-1356-h95gxij8.log.
>>>>>>>>>
>>>>>>>>> The job is still runnning (i.e. hanging) with the same symptom as
>>>>>>>>> before:
>>>>>>>>> the first jobs is done and then nothing else gets submitted (the
>>>>>>>>> submit host
>>>>>>>>> doesn't receive any notification that the job has finished).
>>>>>>>>>
>>>>>>>>> NIka
>>>>>>>>>
>>>>>>>>> On Sep 17, 2007, at 9:51 AM, Mihael Hategan wrote:
>>>>>>>>>
>>>>>>>>>> On Mon, 2007-09-17 at 09:41 -0500, Veronika Nefedova wrote:
>>>>>>>>>>> I did 'svn up' in cog directory and then did 'ant dist' in the
>>>>>>>>>>> same
>>>>>>>>>>> directory.
>>>>>>>>>> 'ant dist' should be done in the swift directory.
>>>>>>>>>>
>>>>>>>>>>> My 'svn info' gives me r1740.
>>>>>>>>>>>
>>>>>>>>>>> On Sep 17, 2007, at 8:55 AM, Mihael Hategan wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Did you update cog?
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, 2007-09-17 at 08:38 -0500, Veronika Nefedova wrote:
>>>>>>>>>>>>> No, I've tried with r1740, it still hanged (timed out).
>>>>>>>>>>>>> the log is on viper:/home/nefedova/alamines/MolDyn-244-
>>>>>>>>>>>>> loops-20070914-1834-pvhyji75.log
>>>>>>>>>>>>>
>>>>>>>>>>>>> NIka
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sep 15, 2007, at 10:59 AM, Mihael Hategan wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, 2007-09-15 at 09:06 +0000, Ben Clifford wrote:
>>>>>>>>>>>>>>> On Fri, 14 Sep 2007, Mihael Hategan wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, 2007-09-13 at 16:41 -0500, Mihael Hategan
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Ok, so there's something in.
>>>>>>>>>>>>>>>> That something was throttling a bit too much (not
>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>> jobs,
>>>>>>>>>>>>>>>> but all
>>>>>>>>>>>>>>>> tasks on that site). I need to take a second look at
>>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>> Is that fixed by cog r1740? It looks like that commit
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> intended to.
>>>>>>>>>>>>>> It's an attempt to fix it, but it needs to be confirmed
>>>>>>>>>>>>>> by
>>>>>>>>>>>>>> Nika.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Swift-devel mailing list
>>>>>>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Swift-devel mailing list
>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Swift-devel mailing list
>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>
>>>>
>>>
>>
>>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel



More information about the Swift-devel mailing list