[Swift-devel] Re: the persistence of the persistent coaster service.
Allan Espinosa
aespinosa at cs.uchicago.edu
Wed Nov 17 15:32:32 CST 2010
Bumping the thread. In an attempt to isolate the bug, I made this workflow:
app (external o) sleep(int time) {
sleep time;
}
/* Main program */
external rups[];
int t = 300;
int a[];
iterate ix {
a[ix] = ix;
} until (ix == 1300);
foreach ai,i in a {
rups[i] = sleep(t);
}
<config>
<pool handle="localhost">
<execution provider="coaster-persistent"
url="https://communicado.ci.uchicago.edu:61999"
jobmanager="local:local" />
<profile namespace="globus" key="workerManager">passive</profile>
<gridftp url="local://localhost"/>
<workdirectory>/gpfs/pads/swift/aespinosa/swift-runs</workdirectory>
</pool>
</config>
localhost sleep /bin/sleep INSTALLED INTEL32::LINUX null
and still get the same type of error message:
RunID: 20101117-1527-ui6i2lra
Progress:
Find: https://communicado.ci.uchicago.edu:61999
Find: keepalive(120), reconnect - https://communicado.ci.uchicago.edu:61999
Progress: Selecting site:1 Submitting:294
Progress: Selecting site:3 Submitting:367
Progress: Selecting site:3 Submitting:367
Progress: Selecting site:3 Submitting:367
Progress: Selecting site:3 Submitting:367
Command(1, CHANNELCONFIG): handling reply timeout;
sendReqTime=101117-152717.209, sendTime=101117
-152717.211, now=101117-152917.232
Progress: Selecting site:3 Submitting:366 Submitted:1
Command(1, CHANNELCONFIG)fault was: Reply timeout
org.globus.cog.karajan.workflow.service.ReplyTimeoutException
at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.ja
va:280)
at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285)
at java.util.TimerThread.mainLoop(Timer.java:512)
at java.util.TimerThread.run(Timer.java:462)
Progress: Selecting site:3 Submitting:366 Failed but can retry:1
Progress: Selecting site:3 Submitting:366 Failed but can retry:1
2010/10/21 Allan Espinosa <aespinosa at cs.uchicago.edu>:
> Hi,
>
> When I'm reusing the coaster service onto the next swift session, i
> get reply timeouts in the CHANNELCONFIG command:
>
>
> swift-r3685 cog-r2913
>
> RunID: extract
> Progress:
> Progress: uninitialized:2 Finished in previous run:2
> Progress: uninitialized:2 Finished in previous run:2
> Progress: Stage in:99 Submitting:1 Finished in previous run:102
> Find: https://communicado.ci.uchicago.edu:61999
> Find: keepalive(120), reconnect - https://communicado.ci.uchicago.edu:61999
> Progress: Stage in:92 Submitting:8 Finished in previous run:102
> Passive queue processor initialized. Callback URI is http://128.135.125.17:60999
> Progress: Stage in:71 Submitting:2 Submitted:27 Finished in previous run:102
> Progress: Stage in:29 Submitting:1 Submitted:70 Finished in previous run:102
>
> **Abord** (Ctrl-C)
> ** rerun/ resume workflow **
> swift-r3685 cog-r2913
>
> RunID: extract
> Progress:
> Progress: uninitialized:3 Finished in previous run:2
> Progress: Stage in:99 Submitting:1 Finished in previous run:102
> Find: https://communicado.ci.uchicago.edu:61999
> Find: keepalive(120), reconnect - https://communicado.ci.uchicago.edu:61999
> Progress: Stage in:92 Submitting:8 Finished in previous run:102
> Progress: Stage in:92 Submitting:8 Finished in previous run:102
> Progress: Stage in:92 Submitting:8 Finished in previous run:102
> Progress: Stage in:92 Submitting:8 Finished in previous run:102
> Command(1, CHANNELCONFIG): handling reply timeout;
> sendReqTime=101021-174124.460, sendTime=101021-174124.471,
> now=101021-174324.492
> Command(1, CHANNELCONFIG)fault was: Reply timeout
> org.globus.cog.karajan.workflow.service.ReplyTimeoutException
> at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:280)
> at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:285)
> at java.util.TimerThread.mainLoop(Timer.java:512)
> at java.util.TimerThread.run(Timer.java:462)
> Progress: Stage in:92 Submitting:7 Submitted:1 Finished in previous run:102
>
> My sites.xml sets the persistent service to work in passive mode.
>
>
> thanks,
> -Allan
>
> --
> Allan M. Espinosa <http://amespinosa.wordpress.com>
> PhD student, Computer Science
> University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
>
--
Allan M. Espinosa <http://amespinosa.wordpress.com>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
More information about the Swift-devel
mailing list