[Swift-devel] Re: the persistence of the persistent coaster service.

Allan Espinosa aespinosa at cs.uchicago.edu
Wed Nov 17 15:35:40 CST 2010


Upon the client's connection, this gets registered in the service log:

...
...
Plan time: 1
Plan time: 1
GSSSChannel-null(0)[1175215772: {}]: Disabling heartbeats (config is null)
(1) Scheduling GSSSChannel-null(12)[1175215772: {}] for addition
nullChannel started
Channel id: u-20ccd0f-12c5bc25c45--8000-u-28c73091-12c5b774ab1--7ff5
MetaChannel: 682820082[1175215772: {}] -> null: Disabling heartbeats
(disabled in config)
MetaChannel: 682820082[1175215772: {}] -> null.bind ->
GSSSChannel-null(12)[1175215772: {}]
Plan time: 1
Congestion queue size: 0
runTime: 0, sleepTime: 10049
Plan time: 1
...
...

2010/11/17 Allan Espinosa <aespinosa at cs.uchicago.edu>:
> Bumping the thread.  In an attempt to isolate the bug, I made this workflow:
>
> app (external o) sleep(int time) {
>  sleep time;
> }
>
>
> /* Main program */
> external rups[];
>
> int t = 300;
> int a[];
>
> iterate ix {
>  a[ix] = ix;
> } until (ix == 1300);
>
> foreach ai,i in a {
>  rups[i] = sleep(t);
> }
>
>
> <config>
>  <pool handle="localhost">
>    <execution provider="coaster-persistent"
> url="https://communicado.ci.uchicago.edu:61999"
>        jobmanager="local:local" />
>
>    <profile namespace="globus" key="workerManager">passive</profile>
>
>    <gridftp  url="local://localhost"/>
>    <workdirectory>/gpfs/pads/swift/aespinosa/swift-runs</workdirectory>
>  </pool>
>
>
> </config>
>
> localhost  sleep          /bin/sleep INSTALLED INTEL32::LINUX null
>
> and still get the same type of error message:
> RunID: 20101117-1527-ui6i2lra
> Progress:
> Find: https://communicado.ci.uchicago.edu:61999
> Find:  keepalive(120), reconnect - https://communicado.ci.uchicago.edu:61999
> Progress:  Selecting site:1  Submitting:294
> Progress:  Selecting site:3  Submitting:367
> Progress:  Selecting site:3  Submitting:367
> Progress:  Selecting site:3  Submitting:367
> Progress:  Selecting site:3  Submitting:367
> Command(1, CHANNELCONFIG): handling reply timeout;
> sendReqTime=101117-152717.209, sendTime=101117
> -152717.211, now=101117-152917.232
> Progress:  Selecting site:3  Submitting:366  Submitted:1
> Command(1, CHANNELCONFIG)fault was: Reply timeout
> org.globus.cog.karajan.workflow.service.ReplyTimeoutException
>        at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.ja
> va:280)
>        at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285)
>        at java.util.TimerThread.mainLoop(Timer.java:512)
>        at java.util.TimerThread.run(Timer.java:462)
> Progress:  Selecting site:3  Submitting:366 Failed but can retry:1
> Progress:  Selecting site:3  Submitting:366 Failed but can retry:1
>
>
> 2010/10/21 Allan Espinosa <aespinosa at cs.uchicago.edu>:
>> Hi,
>>
>> When I'm reusing the coaster service onto the next swift session, i
>> get reply timeouts in the CHANNELCONFIG command:
>>
>>
>> swift-r3685 cog-r2913
>>
>> RunID: extract
>> Progress:
>> Progress:  uninitialized:2  Finished in previous run:2
>> Progress:  uninitialized:2  Finished in previous run:2
>> Progress:  Stage in:99  Submitting:1  Finished in previous run:102
>> Find: https://communicado.ci.uchicago.edu:61999
>> Find:  keepalive(120), reconnect - https://communicado.ci.uchicago.edu:61999
>> Progress:  Stage in:92  Submitting:8  Finished in previous run:102
>> Passive queue processor initialized. Callback URI is http://128.135.125.17:60999
>> Progress:  Stage in:71  Submitting:2  Submitted:27  Finished in previous run:102
>> Progress:  Stage in:29  Submitting:1  Submitted:70  Finished in previous run:102
>>
>> **Abord** (Ctrl-C)
>> ** rerun/ resume workflow **
>> swift-r3685 cog-r2913
>>
>> RunID: extract
>> Progress:
>> Progress:  uninitialized:3  Finished in previous run:2
>> Progress:  Stage in:99  Submitting:1  Finished in previous run:102
>> Find: https://communicado.ci.uchicago.edu:61999
>> Find:  keepalive(120), reconnect - https://communicado.ci.uchicago.edu:61999
>> Progress:  Stage in:92  Submitting:8  Finished in previous run:102
>> Progress:  Stage in:92  Submitting:8  Finished in previous run:102
>> Progress:  Stage in:92  Submitting:8  Finished in previous run:102
>> Progress:  Stage in:92  Submitting:8  Finished in previous run:102
>> Command(1, CHANNELCONFIG): handling reply timeout;
>> sendReqTime=101021-174124.460, sendTime=101021-174124.471,
>> now=101021-174324.492
>> Command(1, CHANNELCONFIG)fault was: Reply timeout
>> org.globus.cog.karajan.workflow.service.ReplyTimeoutException
>>        at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:280)
>>        at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285)
>>        at java.util.TimerThread.mainLoop(Timer.java:512)
>>        at java.util.TimerThread.run(Timer.java:462)
>> Progress:  Stage in:92  Submitting:7  Submitted:1  Finished in previous run:102
>>
>> My sites.xml sets the persistent service to work in passive mode.
>>
>>
>> thanks,
>> -Allan
>>
>> --
>> Allan M. Espinosa <http://amespinosa.wordpress.com>
>> PhD student, Computer Science
>> University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
>>
>
>
>
> --
> Allan M. Espinosa <http://amespinosa.wordpress.com>
> PhD student, Computer Science
> University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
>



-- 
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>



More information about the Swift-devel mailing list