[Swift-user] Problems with PBS on Beagle for swift.0.94-2012.1102
Michael Wilde
wilde at mcs.anl.gov
Wed Dec 5 17:53:52 CST 2012
Hi Lorenzo,
The "513" error comes from the coaster worker, and means that the app() call exceeded is maxwalltime. This maxwalltime is specified first in the sites file (which applies to all apps on the site) and can be overridden for a given app by specifying maxwalltime on the app's entry in the tc file.
So, yes, if you specified maxwalltime as 16:40:00 in the sites file and did *not* override that in the tc file, then that is how long the app() call should be allowed to run on the compute node before the coaster worker kills it and returns a 513 error.
>From your logs, assuming the app started shortly after Swift started:
Started at: Tue, 04 Dec 2012 15:05:14
Got 513 at: Wed, 05 Dec 2012 07:44:53
24:00 - 15:05 = 8:55
+ 7:44
15:99 = 16:39
which is what you asked for, mine the 1 min that Swift subtracts for overhead.
- Mike
----- Original Message -----
> From: "Lorenzo Pesce" <lpesce at uchicago.edu>
> To: swift-user at ci.uchicago.edu
> Cc: "Jason J. Pitt" <pittjj at uchicago.edu>
> Sent: Wednesday, December 5, 2012 9:18:39 AM
> Subject: [Swift-user] Problems with PBS on Beagle for swift.0.94-2012.1102
> From the logs:
>
>
> Swift trunk swift-r6003 cog-r3497
> RunID: 20121204-1504-llt6yhhb
> Progress: time: Tue, 04 Dec 2012 15:04:44 +0000
> Progress: time: Tue, 04 Dec 2012 15:04:53 +0000 Submitted:1 Active:1
> Progress: time: Tue, 04 Dec 2012 15:05:14 +0000 Active:2
>
>
> <snip>
>
>
>
> Progress: time: Wed, 05 Dec 2012 07:44:53 +0000 Active:1 Failed but
> can retry:1
> Execution failed:
> Walltime exceeded
> org.globus.cog.abstraction.impl.common.execution.JobException:
> Walltime exceeded (exit code: 513)
> at
> org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:38)
> at
> org.globus.cog.karajan.workflow.service.handlers.RequestHandler.receiveCompleted(RequestHandler.java:91)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractKarajanChannel.handleRequest(AbstractKarajanChannel.java:505)
> at
> org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.stepNIO(AbstractStreamKarajanChannel.java:238)
> at
> org.globus.cog.karajan.workflow.service.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97)
> at
> org.globus.cog.karajan.workflow.service.channels.NIOMultiplexer.run(NIOMultiplexer.java:56)
> (exit code: 513)
>
>
>
>
> By my reckoning it is 15 hours or 900 minutes or 54000 seconds
>
>
> From my sites file:
>
>
>
> <profile namespace="globus" key="maxTime">172800</profile>
>
> <profile namespace="globus" key="maxwalltime">16:40:00</profile>
>
>
> Isn't maxwalltime the time expected from a single run app as opposed
> to the time allocated to the coaster?
>
>
> Has something changed in the way swift works?
>
>
> Lorenzo
>
>
>
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-user
mailing list