[Swift-devel] trunk-cobalt block task ended prematurely

Ketan Maheshwari ketan at mcs.anl.gov
Wed Mar 4 15:22:56 CST 2015


Hi Mihael,

The code in _swiftwrap.staging branches based on if STDIN is present or
not. See the snippet below:

if [ "$STDIN" == "" ]; then
    if [ "$SWIFT_GEN_SCRIPTS" != "" ]; then
        echo "#!/bin/bash" > run.sh
        echo "\"$EXEC\" \"${CMDARGS[@]}\" 1>\"$STDOUT\" 2>\"$STDERR\"" >>
run.sh
        chmod +x run.sh
    fi
    "$EXEC" "${CMDARGS[@]}" 1>"$STDOUT" 2>"$STDERR"
else
    if [ "$SWIFT_GEN_SCRIPTS" != "" ]; then
        echo "#!/bin/bash" > run.sh
        echo "\"$EXEC\" \"${CMDARGS[@]}\" 1>\"$STDOUT\" 2>\"$STDERR\"
<\"$STDIN\"" >> run.sh
        chmod +x run.sh
    fi
    "$EXEC" "${CMDARGS[@]}" 1>"$STDOUT" 2>"$STDERR" <"$STDIN"
fi

When "stdin=" is not provided the code takes the first branch and hangs. It
works otherwise.

It is possible that it hangs because if mpich bug Mike mentioned.

I agree we should stick in a </dev/null in there.

--Ketan


On Wed, Mar 4, 2015 at 3:12 PM, Hategan-Marandiuc, Philip M. <
hategan at mcs.anl.gov> wrote:

> I'm still confused. I don't see any difference in stdin handling between
> _swiftwrap and _swiftwrap.staging (which is used for direct staging).
>
> Maybe we should always feed the app a /dev/null if there is no stdin=
> specified.
>
> Mihael
>
> On Wed, 2015-03-04 at 08:50 -0600, Ketan Maheshwari wrote:
> > I added stdin="/dev/null" to app invocation line and it has worked now.
> > --Ketan
> >
> > On Wed, Mar 4, 2015 at 8:44 AM, Ketan Maheshwari <ketan at mcs.anl.gov>
> wrote:
> >
> > > Please find one with 59 minutes attached. --Ketan
> > >
> > > On Tue, Mar 3, 2015 at 11:17 PM, Mihael Hategan <hategan at mcs.anl.gov>
> > > wrote:
> > >
> > >> You are using coasters, so what gets queued is the block, not the job.
> > >>
> > >> You should specify execution.options.maxJobTime = "00:59:00".
> > >>
> > >> Then you can probably do a walltime of about "00:50:00". But 7 minutes
> > >> vs. 5 minutes isn't much of a difference.
> > >>
> > >> Mihael
> > >>
> > >> On Tue, 2015-03-03 at 22:28 -0600, Ketan Maheshwari wrote:
> > >> > Attached is a log for maxWalltime set to 7 minutes beyond which the
> job
> > >> > does not get submitted because of the 1 hour walltime limit of
> Cetus.
> > >> > --Ketan
> > >> >
> > >> > On Tue, Mar 3, 2015 at 10:15 PM, Ketan Maheshwari <
> ketan at mcs.anl.gov>
> > >> wrote:
> > >> >
> > >> > > When I check queue with qstat, I see the job is submitted for 40
> > >> minutes.
> > >> > > When I try to increase maxWallTime the workflow does not get
> submitted
> > >> > > because on Cetus maximum allowed walltime is 60 minutes. --Ketan
> > >> > >
> > >> > > On Tue, Mar 3, 2015 at 10:03 PM, Hategan-Marandiuc, Philip M. <
> > >> > > hategan at mcs.anl.gov> wrote:
> > >> > >
> > >> > >> Hi,
> > >> > >>
> > >> > >> Looks like almost exactly 5 minutes to me:
> > >> > >>
> > >> > >> 2015-03-04 01:45:43,943+0000 INFO  Execute TASK_STATUS_CHANGE
> > >> > >> taskid=urn:R-3-0-2-1425432781969 status=2
> > >> > >> workerid=0304-3301040-000000:000000
> > >> > >> 2015-03-04 01:50:44,676+0000 INFO  Execute TASK_STATUS_CHANGE
> > >> > >> taskid=urn:R-3-0-2-1425432781969 status=5 Walltime exceeded
> > >> > >>
> > >> > >> Which is what the config file is asking for:
> > >> > >>
> > >> > >> app.bgsh {
> > >> > >>   env.SUBBLOCK_SIZE: "16"                                 # [R]
> line
> > >> 27
> > >> > >>   executable: "/home/ketan/SwiftApps/subjobs/bg.sh"       # [R]
> line
> > >> 25
> > >> > >>   maxWallTime: "00:05:00"                                 # [R]
> line
> > >> 26
> > >> > >> }
> > >> > >>
> > >> > >> Again, the wrapper log shows the app as still running. Last line
> is:
> > >> > >> Progress  2015-03-04 01:45:43.971393118+0000  EXECUTE
> > >> > >>
> > >> > >> Please do me a favor and increase the walltime to one hour and
> let's
> > >> see
> > >> > >> what happens then.
> > >> > >>
> > >> > >> If it still doesn't finish after one hour, we could try to
> strace it
> > >> and
> > >> > >> see what is happening there.
> > >> > >>
> > >> > >> Mihael
> > >> > >>
> > >> > >> On Tue, 2015-03-03 at 19:53 -0600, Ketan Maheshwari wrote:
> > >> > >> > Please find the log attached. --Ketan
> > >> > >> >
> > >> > >> > On Tue, Mar 3, 2015 at 7:03 PM, Hategan-Marandiuc, Philip M. <
> > >> > >> > hategan at mcs.anl.gov> wrote:
> > >> > >> >
> > >> > >> > > On Tue, 2015-03-03 at 15:42 -0600, Ketan Maheshwari wrote:
> > >> > >> > > > Slow network looks unlikely to be a cause:
> > >> > >> > >
> > >> > >> > > It's the only variable obvious, so I wouldn't say that.
> > >> > >>
> > >> > >> I meant "only obvious variable" there.
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >
> > >>
> > >>
> > >> _______________________________________________
> > >> Swift-devel mailing list
> > >> Swift-devel at ci.uchicago.edu
> > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >>
> > >
> > >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20150304/6029f3fd/attachment.html>


More information about the Swift-devel mailing list