[Swift-devel] bug 53
Michael Wilde
wilde at mcs.anl.gov
Tue Sep 18 11:15:15 CDT 2007
Its not clear when this happened, as Nika and Ioan's workflow submission
from viper has afaik been mostly through Falkon for quite a while now.
Nika, perhaps you can shift back to trying the two Falkon approaches
(with higher prio on testing Ioan's retry code) in the meantime.
Ioan, is CI Support / Ti supporting viper, or are you the "sysadmin" Ben
is referring to?
Ive also suggested in the past that we focus on using evitable and
terminable (and swift03/04) as our main submit hosts, primarily for
support and coordination reasons. Is this a good time to try the
GRAM/non-Falkon workfow there?
- Mike
Ben Clifford wrote:
> sounds like viper had firewall configuration changed recently. viper
> sysadmin needs to help debug basic job submission with simple globus tools
> before that machine is worth using again.
>
> On Tue, 18 Sep 2007, Michael Wilde wrote:
>
>> does the cog equivalent of globus_tcp_source_range also need to be set?
>> is that only for gridftp, or gram as well? or could this be a gridftp hang?
>>
>> - mike
>>
>> Ben Clifford wrote:
>>> can you submit a job using globus-job-run?
>>>
>>> On Tue, 18 Sep 2007, Veronika Nefedova wrote:
>>>
>>>> I set tcp.port.range in swift properties but even a simple helloworld
>>>> workflow
>>>> hangs (the submit host doesn't receive the notification from the compute
>>>> host
>>>> that the job has finished).
>>>> tcp.port.range=50000,60000
>>>>
>>>> Not sure what else has changed on viper? It used to be a very good submit
>>>> host, I never had any problems with it );
>>>>
>>>> Nika
>>>>
>>>> On Sep 18, 2007, at 9:13 AM, Mihael Hategan wrote:
>>>>
>>>>> Should pick that one. If not ~/.globus/cog.properties ->
>>>>> tcp.port.range=begin,end
>>>>>
>>>>> On Tue, 2007-09-18 at 07:42 +0000, Ben Clifford wrote:
>>>>>> Not sure if cog picks up the GLOBUS_whatever environment variables.
>>>>>> Mihael
>>>>>> presumably knows.
>>>>>>
>>>>>> On Mon, 17 Sep 2007, Ioan Raicu wrote:
>>>>>>
>>>>>>> There is a firewall on viper. Ports 50000 - 60000 are open for TCP.
>>>>>>> You
>>>>>>> might want to set the TCP_PORT_RANGE (I am not sure this is the
>>>>>>> exact
>>>>>>> environment variable, but something like that) to be between 50K and
>>>>>>> 60K
>>>>>>> ports
>>>>>>> to ensure that GT4 uses one of these open ports.
>>>>>>> Ioan
>>>>>>>
>>>>>>> Veronika Nefedova wrote:
>>>>>>>> The same. You can check the job's status in its log on viper in
>>>>>>>> ~nefedova/alamines/MolDyn-244-loops-20070917-1356-h95gxij8.log.
>>>>>>>>
>>>>>>>> The job is still runnning (i.e. hanging) with the same symptom as
>>>>>>>> before:
>>>>>>>> the first jobs is done and then nothing else gets submitted (the
>>>>>>>> submit host
>>>>>>>> doesn't receive any notification that the job has finished).
>>>>>>>>
>>>>>>>> NIka
>>>>>>>>
>>>>>>>> On Sep 17, 2007, at 9:51 AM, Mihael Hategan wrote:
>>>>>>>>
>>>>>>>>> On Mon, 2007-09-17 at 09:41 -0500, Veronika Nefedova wrote:
>>>>>>>>>> I did 'svn up' in cog directory and then did 'ant dist' in the
>>>>>>>>>> same
>>>>>>>>>> directory.
>>>>>>>>> 'ant dist' should be done in the swift directory.
>>>>>>>>>
>>>>>>>>>> My 'svn info' gives me r1740.
>>>>>>>>>>
>>>>>>>>>> On Sep 17, 2007, at 8:55 AM, Mihael Hategan wrote:
>>>>>>>>>>
>>>>>>>>>>> Did you update cog?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 2007-09-17 at 08:38 -0500, Veronika Nefedova wrote:
>>>>>>>>>>>> No, I've tried with r1740, it still hanged (timed out).
>>>>>>>>>>>> the log is on viper:/home/nefedova/alamines/MolDyn-244-
>>>>>>>>>>>> loops-20070914-1834-pvhyji75.log
>>>>>>>>>>>>
>>>>>>>>>>>> NIka
>>>>>>>>>>>>
>>>>>>>>>>>> On Sep 15, 2007, at 10:59 AM, Mihael Hategan wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, 2007-09-15 at 09:06 +0000, Ben Clifford wrote:
>>>>>>>>>>>>>> On Fri, 14 Sep 2007, Mihael Hategan wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, 2007-09-13 at 16:41 -0500, Mihael Hategan
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Ok, so there's something in.
>>>>>>>>>>>>>>> That something was throttling a bit too much (not
>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>> jobs,
>>>>>>>>>>>>>>> but all
>>>>>>>>>>>>>>> tasks on that site). I need to take a second look at
>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>> Is that fixed by cog r1740? It looks like that commit
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> intended to.
>>>>>>>>>>>>> It's an attempt to fix it, but it needs to be confirmed
>>>>>>>>>>>>> by
>>>>>>>>>>>>> Nika.
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Swift-devel mailing list
>>>>>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Swift-devel mailing list
>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> Swift-devel mailing list
>>>>>> Swift-devel at ci.uchicago.edu
>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>>
>>>>> _______________________________________________
>>>>> Swift-devel mailing list
>>>>> Swift-devel at ci.uchicago.edu
>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>>
>
>
More information about the Swift-devel
mailing list