[Swift-devel] trunk-cobalt block task ended prematurely

Ketan Maheshwari ketan at mcs.anl.gov
Wed Mar 4 08:50:38 CST 2015


I added stdin="/dev/null" to app invocation line and it has worked now.
--Ketan

On Wed, Mar 4, 2015 at 8:44 AM, Ketan Maheshwari <ketan at mcs.anl.gov> wrote:

> Please find one with 59 minutes attached. --Ketan
>
> On Tue, Mar 3, 2015 at 11:17 PM, Mihael Hategan <hategan at mcs.anl.gov>
> wrote:
>
>> You are using coasters, so what gets queued is the block, not the job.
>>
>> You should specify execution.options.maxJobTime = "00:59:00".
>>
>> Then you can probably do a walltime of about "00:50:00". But 7 minutes
>> vs. 5 minutes isn't much of a difference.
>>
>> Mihael
>>
>> On Tue, 2015-03-03 at 22:28 -0600, Ketan Maheshwari wrote:
>> > Attached is a log for maxWalltime set to 7 minutes beyond which the job
>> > does not get submitted because of the 1 hour walltime limit of Cetus.
>> > --Ketan
>> >
>> > On Tue, Mar 3, 2015 at 10:15 PM, Ketan Maheshwari <ketan at mcs.anl.gov>
>> wrote:
>> >
>> > > When I check queue with qstat, I see the job is submitted for 40
>> minutes.
>> > > When I try to increase maxWallTime the workflow does not get submitted
>> > > because on Cetus maximum allowed walltime is 60 minutes. --Ketan
>> > >
>> > > On Tue, Mar 3, 2015 at 10:03 PM, Hategan-Marandiuc, Philip M. <
>> > > hategan at mcs.anl.gov> wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> Looks like almost exactly 5 minutes to me:
>> > >>
>> > >> 2015-03-04 01:45:43,943+0000 INFO  Execute TASK_STATUS_CHANGE
>> > >> taskid=urn:R-3-0-2-1425432781969 status=2
>> > >> workerid=0304-3301040-000000:000000
>> > >> 2015-03-04 01:50:44,676+0000 INFO  Execute TASK_STATUS_CHANGE
>> > >> taskid=urn:R-3-0-2-1425432781969 status=5 Walltime exceeded
>> > >>
>> > >> Which is what the config file is asking for:
>> > >>
>> > >> app.bgsh {
>> > >>   env.SUBBLOCK_SIZE: "16"                                 # [R] line
>> 27
>> > >>   executable: "/home/ketan/SwiftApps/subjobs/bg.sh"       # [R] line
>> 25
>> > >>   maxWallTime: "00:05:00"                                 # [R] line
>> 26
>> > >> }
>> > >>
>> > >> Again, the wrapper log shows the app as still running. Last line is:
>> > >> Progress  2015-03-04 01:45:43.971393118+0000  EXECUTE
>> > >>
>> > >> Please do me a favor and increase the walltime to one hour and let's
>> see
>> > >> what happens then.
>> > >>
>> > >> If it still doesn't finish after one hour, we could try to strace it
>> and
>> > >> see what is happening there.
>> > >>
>> > >> Mihael
>> > >>
>> > >> On Tue, 2015-03-03 at 19:53 -0600, Ketan Maheshwari wrote:
>> > >> > Please find the log attached. --Ketan
>> > >> >
>> > >> > On Tue, Mar 3, 2015 at 7:03 PM, Hategan-Marandiuc, Philip M. <
>> > >> > hategan at mcs.anl.gov> wrote:
>> > >> >
>> > >> > > On Tue, 2015-03-03 at 15:42 -0600, Ketan Maheshwari wrote:
>> > >> > > > Slow network looks unlikely to be a cause:
>> > >> > >
>> > >> > > It's the only variable obvious, so I wouldn't say that.
>> > >>
>> > >> I meant "only obvious variable" there.
>> > >>
>> > >>
>> > >>
>> > >
>>
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20150304/3b3df373/attachment.html>


More information about the Swift-devel mailing list