[Swift-devel] Re: the persistence of the persistent coaster service.

Allan Espinosa aespinosa at cs.uchicago.edu
Wed Nov 17 15:32:32 CST 2010


Bumping the thread.  In an attempt to isolate the bug, I made this workflow:

app (external o) sleep(int time) {
  sleep time;
}


/* Main program */
external rups[];

int t = 300;
int a[];

iterate ix {
  a[ix] = ix;
} until (ix == 1300);

foreach ai,i in a {
  rups[i] = sleep(t);
}


<config>
  <pool handle="localhost">
    <execution provider="coaster-persistent"
url="https://communicado.ci.uchicago.edu:61999"
        jobmanager="local:local" />

    <profile namespace="globus" key="workerManager">passive</profile>

    <gridftp  url="local://localhost"/>
    <workdirectory>/gpfs/pads/swift/aespinosa/swift-runs</workdirectory>
  </pool>


</config>

localhost  sleep          /bin/sleep INSTALLED INTEL32::LINUX null

and still get the same type of error message:
RunID: 20101117-1527-ui6i2lra
Progress:
Find: https://communicado.ci.uchicago.edu:61999
Find:  keepalive(120), reconnect - https://communicado.ci.uchicago.edu:61999
Progress:  Selecting site:1  Submitting:294
Progress:  Selecting site:3  Submitting:367
Progress:  Selecting site:3  Submitting:367
Progress:  Selecting site:3  Submitting:367
Progress:  Selecting site:3  Submitting:367
Command(1, CHANNELCONFIG): handling reply timeout;
sendReqTime=101117-152717.209, sendTime=101117
-152717.211, now=101117-152917.232
Progress:  Selecting site:3  Submitting:366  Submitted:1
Command(1, CHANNELCONFIG)fault was: Reply timeout
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
        at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.ja
va:280)
        at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285)
        at java.util.TimerThread.mainLoop(Timer.java:512)
        at java.util.TimerThread.run(Timer.java:462)
Progress:  Selecting site:3  Submitting:366 Failed but can retry:1
Progress:  Selecting site:3  Submitting:366 Failed but can retry:1


2010/10/21 Allan Espinosa <aespinosa at cs.uchicago.edu>:
> Hi,
>
> When I'm reusing the coaster service onto the next swift session, i
> get reply timeouts in the CHANNELCONFIG command:
>
>
> swift-r3685 cog-r2913
>
> RunID: extract
> Progress:
> Progress:  uninitialized:2  Finished in previous run:2
> Progress:  uninitialized:2  Finished in previous run:2
> Progress:  Stage in:99  Submitting:1  Finished in previous run:102
> Find: https://communicado.ci.uchicago.edu:61999
> Find:  keepalive(120), reconnect - https://communicado.ci.uchicago.edu:61999
> Progress:  Stage in:92  Submitting:8  Finished in previous run:102
> Passive queue processor initialized. Callback URI is http://128.135.125.17:60999
> Progress:  Stage in:71  Submitting:2  Submitted:27  Finished in previous run:102
> Progress:  Stage in:29  Submitting:1  Submitted:70  Finished in previous run:102
>
> **Abord** (Ctrl-C)
> ** rerun/ resume workflow **
> swift-r3685 cog-r2913
>
> RunID: extract
> Progress:
> Progress:  uninitialized:3  Finished in previous run:2
> Progress:  Stage in:99  Submitting:1  Finished in previous run:102
> Find: https://communicado.ci.uchicago.edu:61999
> Find:  keepalive(120), reconnect - https://communicado.ci.uchicago.edu:61999
> Progress:  Stage in:92  Submitting:8  Finished in previous run:102
> Progress:  Stage in:92  Submitting:8  Finished in previous run:102
> Progress:  Stage in:92  Submitting:8  Finished in previous run:102
> Progress:  Stage in:92  Submitting:8  Finished in previous run:102
> Command(1, CHANNELCONFIG): handling reply timeout;
> sendReqTime=101021-174124.460, sendTime=101021-174124.471,
> now=101021-174324.492
> Command(1, CHANNELCONFIG)fault was: Reply timeout
> org.globus.cog.karajan.workflow.service.ReplyTimeoutException
>        at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:280)
>        at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285)
>        at java.util.TimerThread.mainLoop(Timer.java:512)
>        at java.util.TimerThread.run(Timer.java:462)
> Progress:  Stage in:92  Submitting:7  Submitted:1  Finished in previous run:102
>
> My sites.xml sets the persistent service to work in passive mode.
>
>
> thanks,
> -Allan
>
> --
> Allan M. Espinosa <http://amespinosa.wordpress.com>
> PhD student, Computer Science
> University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
>



-- 
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>



More information about the Swift-devel mailing list