[Swift-devel] trunk-cobalt block task ended prematurely

Mihael Hategan hategan at mcs.anl.gov
Tue Mar 3 23:17:29 CST 2015


You are using coasters, so what gets queued is the block, not the job.

You should specify execution.options.maxJobTime = "00:59:00".

Then you can probably do a walltime of about "00:50:00". But 7 minutes
vs. 5 minutes isn't much of a difference.

Mihael

On Tue, 2015-03-03 at 22:28 -0600, Ketan Maheshwari wrote:
> Attached is a log for maxWalltime set to 7 minutes beyond which the job
> does not get submitted because of the 1 hour walltime limit of Cetus.
> --Ketan
> 
> On Tue, Mar 3, 2015 at 10:15 PM, Ketan Maheshwari <ketan at mcs.anl.gov> wrote:
> 
> > When I check queue with qstat, I see the job is submitted for 40 minutes.
> > When I try to increase maxWallTime the workflow does not get submitted
> > because on Cetus maximum allowed walltime is 60 minutes. --Ketan
> >
> > On Tue, Mar 3, 2015 at 10:03 PM, Hategan-Marandiuc, Philip M. <
> > hategan at mcs.anl.gov> wrote:
> >
> >> Hi,
> >>
> >> Looks like almost exactly 5 minutes to me:
> >>
> >> 2015-03-04 01:45:43,943+0000 INFO  Execute TASK_STATUS_CHANGE
> >> taskid=urn:R-3-0-2-1425432781969 status=2
> >> workerid=0304-3301040-000000:000000
> >> 2015-03-04 01:50:44,676+0000 INFO  Execute TASK_STATUS_CHANGE
> >> taskid=urn:R-3-0-2-1425432781969 status=5 Walltime exceeded
> >>
> >> Which is what the config file is asking for:
> >>
> >> app.bgsh {
> >>   env.SUBBLOCK_SIZE: "16"                                 # [R] line 27
> >>   executable: "/home/ketan/SwiftApps/subjobs/bg.sh"       # [R] line 25
> >>   maxWallTime: "00:05:00"                                 # [R] line 26
> >> }
> >>
> >> Again, the wrapper log shows the app as still running. Last line is:
> >> Progress  2015-03-04 01:45:43.971393118+0000  EXECUTE
> >>
> >> Please do me a favor and increase the walltime to one hour and let's see
> >> what happens then.
> >>
> >> If it still doesn't finish after one hour, we could try to strace it and
> >> see what is happening there.
> >>
> >> Mihael
> >>
> >> On Tue, 2015-03-03 at 19:53 -0600, Ketan Maheshwari wrote:
> >> > Please find the log attached. --Ketan
> >> >
> >> > On Tue, Mar 3, 2015 at 7:03 PM, Hategan-Marandiuc, Philip M. <
> >> > hategan at mcs.anl.gov> wrote:
> >> >
> >> > > On Tue, 2015-03-03 at 15:42 -0600, Ketan Maheshwari wrote:
> >> > > > Slow network looks unlikely to be a cause:
> >> > >
> >> > > It's the only variable obvious, so I wouldn't say that.
> >>
> >> I meant "only obvious variable" there.
> >>
> >>
> >>
> >





More information about the Swift-devel mailing list