[Swift-user] trunk-cobalt block task ended prematurely

Ketan Maheshwari ketan at mcs.anl.gov
Mon Mar 2 18:55:22 CST 2015


I do not see any logs in ~/.globus/coasters; yes, /home is mounted on
service nodes and is writable from there.

I added "--mode script" as a default arg to qsub in provider code, but
still getting the same error. Attached is the new log.

About the manual option, would we also need coaster service to be running?
Or just invoking worker would suffice (for troubleshooting purposes)?

--Ketan

On Mon, Mar 2, 2015 at 6:25 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:

> On Mon, 2015-03-02 at 18:11 -0600, Ketan Maheshwari wrote:
> > I tried this option but did not seem to work. Attached is the log.
>
> Check /home/ketan/.globus/coasters for worker logs. If there aren't any,
> it means that worker.pl isn't being started (I'm assuming that /home is
> mounted on compute/service nodes).
>
> If that's the case, I would suggest troubleshooting by manually running
> the qsub command and seeing why the worker doesn't start.
>
> Mihael
>
> >
> > On Mon, Mar 2, 2015 at 5:27 PM, Mihael Hategan <hategan at mcs.anl.gov>
> wrote:
> >
> > > It would really be much more useful if you posted the full log.
> > >
> > > Anyway, I believe that what you need to do is:
> > > site.cluster.execution.options.workerLoggingLevel = "DEBUG"
> > >
> > > Mihael
> > >
> > > On Mon, 2015-03-02 at 16:37 -0600, Ketan Maheshwari wrote:
> > > > The qsub command from the log says:
> > > >
> > > > qsub -e WORKER_LOGGING_LEVEL=NONE --proccount 32 -n 32 -t 40 --cwd
> ...
> > > >
> > > > So, the env variable on swift.conf does not seem to take effect.
> > > >
> > > > On Mon, Mar 2, 2015 at 4:33 PM, Hategan-Marandiuc, Philip M. <
> > > > hategan at mcs.anl.gov> wrote:
> > > >
> > > > > Well, we need to figure out why. Since the qsub command line is in
> the
> > > > > swift log, and the qsub command line should reflect the setting, it
> > > > > would be useful if you posted the swift log.
> > > > >
> > > > > Mihael
> > > > >
> > > > > On Mon, 2015-03-02 at 16:27 -0600, Ketan Maheshwari wrote:
> > > > > > For workerlogs, I am trying:
> > > > > >
> > > > > >  app.bgsh {
> > > > > >         executable: "/home/ketan/SwiftApps/subjobs/bg.sh"
> > > > > >         maxWallTime: "00:04:00"
> > > > > >         env.ENABLE_WORKER_LOGGING="TRUE"
> > > > > >         env.WORKER_LOGGING_LEVEL="DEBUG"
> > > > > >         env.WORKER_LOG_DIR="/home/ketan/workerlogs"
> > > > > >     }
> > > > > >
> > > > > > Does not seem to trigger logging.
> > > > > >
> > > > > > Thanks,
> > > > > > Ketan
> > > > > >
> > > > > > On Mon, Mar 2, 2015 at 4:07 PM, Hategan-Marandiuc, Philip M. <
> > > > > > hategan at mcs.anl.gov> wrote:
> > > > > >
> > > > > > > I would recommend enabling worker logging to see if we get any
> info
> > > > > from
> > > > > > > the worker process. Could be some simple thing, like the wrong
> IP
> > > > > > > address.
> > > > > > >
> > > > > > > Mihael
> > > > > > >
> > > > > > > On Mon, 2015-03-02 at 15:47 -0600, Ketan Maheshwari wrote:
> > > > > > > > I trying to run on BG/Q with local:cobalt with trunk but
> Swift
> > > > > crashes
> > > > > > > with
> > > > > > > > the following error:
> > > > > > > >
> > > > > > > > Caused by: Exception in bgsh:
> > > > > > > >     Arguments:
> > > > > [/home/ketan/SwiftApps/subjobs/mpicatsnsleep/mpicatnap,
> > > > > > > >
> /gpfs/mira-home/ketan/SwiftApps/subjobs/mpicatsnsleep/./data.txt,
> > > > > > > >
> > > > > > >
> > > > >
> > >
> /gpfs/mira-home/ketan/SwiftApps/subjobs/mpicatsnsleep/./outdir/f.0002.out,
> > > > > > > > 1]
> > > > > > > >     Host: cluster
> > > > > > > >     Directory: catsnsleepmpi-run001/jobs/b/bgsh-3nq3uc5m
> > > > > > > > exception @ swift-int-staging.k, line: 165
> > > > > > > > Caused by:
> > > > > > > > exception @ swift-int-staging.k, line: 160
> > > > > > > > Caused by: Block task failed: 0302-2109420-000000 Block task
> > > ended
> > > > > > > > prematurely
> > > > > > > >
> > > > > > > > In the log, I see the qsub call being made and a jobid is
> > > returned.
> > > > > > > > However, I could not figure what is the cause for the task to
> > > fail.
> > > > > > > >
> > > > > > > > One more thing I noticed when translating from old sites
> conf to
> > > new
> > > > > is
> > > > > > > > that the new conf did not accept the property "globus:mode =
> > > script".
> > > > > > > >
> > > > > > > > A full run log is attached. Thanks for any suggestions.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Ketan
> > > > > > > > _______________________________________________
> > > > > > > > Swift-user mailing list
> > > > > > > > Swift-user at ci.uchicago.edu
> > > > > > > >
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
> > > _______________________________________________
> > > Swift-user mailing list
> > > Swift-user at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > >
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20150302/b317ce33/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run007.tgz
Type: application/x-gzip
Size: 9545 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20150302/b317ce33/attachment.bin>


More information about the Swift-user mailing list