[Swift-devel] Re: the persistence of the persistent coaster service.
Allan Espinosa
aespinosa at cs.uchicago.edu
Wed Nov 17 15:35:40 CST 2010
Upon the client's connection, this gets registered in the service log:
...
...
Plan time: 1
Plan time: 1
GSSSChannel-null(0)[1175215772: {}]: Disabling heartbeats (config is null)
(1) Scheduling GSSSChannel-null(12)[1175215772: {}] for addition
nullChannel started
Channel id: u-20ccd0f-12c5bc25c45--8000-u-28c73091-12c5b774ab1--7ff5
MetaChannel: 682820082[1175215772: {}] -> null: Disabling heartbeats
(disabled in config)
MetaChannel: 682820082[1175215772: {}] -> null.bind ->
GSSSChannel-null(12)[1175215772: {}]
Plan time: 1
Congestion queue size: 0
runTime: 0, sleepTime: 10049
Plan time: 1
...
...
2010/11/17 Allan Espinosa <aespinosa at cs.uchicago.edu>:
> Bumping the thread. In an attempt to isolate the bug, I made this workflow:
>
> app (external o) sleep(int time) {
> sleep time;
> }
>
>
> /* Main program */
> external rups[];
>
> int t = 300;
> int a[];
>
> iterate ix {
> a[ix] = ix;
> } until (ix == 1300);
>
> foreach ai,i in a {
> rups[i] = sleep(t);
> }
>
>
> <config>
> <pool handle="localhost">
> <execution provider="coaster-persistent"
> url="https://communicado.ci.uchicago.edu:61999"
> jobmanager="local:local" />
>
> <profile namespace="globus" key="workerManager">passive</profile>
>
> <gridftp url="local://localhost"/>
> <workdirectory>/gpfs/pads/swift/aespinosa/swift-runs</workdirectory>
> </pool>
>
>
> </config>
>
> localhost sleep /bin/sleep INSTALLED INTEL32::LINUX null
>
> and still get the same type of error message:
> RunID: 20101117-1527-ui6i2lra
> Progress:
> Find: https://communicado.ci.uchicago.edu:61999
> Find: keepalive(120), reconnect - https://communicado.ci.uchicago.edu:61999
> Progress: Selecting site:1 Submitting:294
> Progress: Selecting site:3 Submitting:367
> Progress: Selecting site:3 Submitting:367
> Progress: Selecting site:3 Submitting:367
> Progress: Selecting site:3 Submitting:367
> Command(1, CHANNELCONFIG): handling reply timeout;
> sendReqTime=101117-152717.209, sendTime=101117
> -152717.211, now=101117-152917.232
> Progress: Selecting site:3 Submitting:366 Submitted:1
> Command(1, CHANNELCONFIG)fault was: Reply timeout
> org.globus.cog.karajan.workflow.service.ReplyTimeoutException
> at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.ja
> va:280)
> at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> Progress: Selecting site:3 Submitting:366 Failed but can retry:1
> Progress: Selecting site:3 Submitting:366 Failed but can retry:1
>
>
> 2010/10/21 Allan Espinosa <aespinosa at cs.uchicago.edu>:
>> Hi,
>>
>> When I'm reusing the coaster service onto the next swift session, i
>> get reply timeouts in the CHANNELCONFIG command:
>>
>>
>> swift-r3685 cog-r2913
>>
>> RunID: extract
>> Progress:
>> Progress: uninitialized:2 Finished in previous run:2
>> Progress: uninitialized:2 Finished in previous run:2
>> Progress: Stage in:99 Submitting:1 Finished in previous run:102
>> Find: https://communicado.ci.uchicago.edu:61999
>> Find: keepalive(120), reconnect - https://communicado.ci.uchicago.edu:61999
>> Progress: Stage in:92 Submitting:8 Finished in previous run:102
>> Passive queue processor initialized. Callback URI is http://128.135.125.17:60999
>> Progress: Stage in:71 Submitting:2 Submitted:27 Finished in previous run:102
>> Progress: Stage in:29 Submitting:1 Submitted:70 Finished in previous run:102
>>
>> **Abord** (Ctrl-C)
>> ** rerun/ resume workflow **
>> swift-r3685 cog-r2913
>>
>> RunID: extract
>> Progress:
>> Progress: uninitialized:3 Finished in previous run:2
>> Progress: Stage in:99 Submitting:1 Finished in previous run:102
>> Find: https://communicado.ci.uchicago.edu:61999
>> Find: keepalive(120), reconnect - https://communicado.ci.uchicago.edu:61999
>> Progress: Stage in:92 Submitting:8 Finished in previous run:102
>> Progress: Stage in:92 Submitting:8 Finished in previous run:102
>> Progress: Stage in:92 Submitting:8 Finished in previous run:102
>> Progress: Stage in:92 Submitting:8 Finished in previous run:102
>> Command(1, CHANNELCONFIG): handling reply timeout;
>> sendReqTime=101021-174124.460, sendTime=101021-174124.471,
>> now=101021-174324.492
>> Command(1, CHANNELCONFIG)fault was: Reply timeout
>> org.globus.cog.karajan.workflow.service.ReplyTimeoutException
>> at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:280)
>> at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285)
>> at java.util.TimerThread.mainLoop(Timer.java:512)
>> at java.util.TimerThread.run(Timer.java:462)
>> Progress: Stage in:92 Submitting:7 Submitted:1 Finished in previous run:102
>>
>> My sites.xml sets the persistent service to work in passive mode.
>>
>>
>> thanks,
>> -Allan
>>
>> --
>> Allan M. Espinosa <http://amespinosa.wordpress.com>
>> PhD student, Computer Science
>> University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
>>
>
>
>
> --
> Allan M. Espinosa <http://amespinosa.wordpress.com>
> PhD student, Computer Science
> University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
>
--
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
More information about the Swift-devel
mailing list