[Swift-devel] Walltime exceeded error
Michael Wilde
wilde at mcs.anl.gov
Wed Feb 22 15:56:24 CST 2012
Hi Jon, I think Mondays Mihael is pretty swamped with school commitments.
The only other thing I can think of grabbing is worker logs, but I doubt that any provision was made to request worker logging for this run.
I'd go ahead and terminate the run.
- Mike
----- Original Message -----
> From: "Jonathan Monette" <jonmon at mcs.anl.gov>
> To: "Mihael Hategan" <hategan at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Wednesday, February 22, 2012 3:45:53 PM
> Subject: Re: [Swift-devel] Walltime exceeded error
> Mihael,
> I have a hung Java process showing this error right now, 2 jobs are
> stuck in the initializing state. I have a jstack -l <pid> of this hung
> java process. Is there anything else you need before I kill it? Do you
> need any other probing information from this process other than this
> jstack output?
>
> On Feb 20, 2012, at 4:27 PM, Jonathan Monette wrote:
>
> > Correction, Beagle does have jstack. Do not know why I thought it
> > did not have it.
> >
> > On Feb 20, 2012, at 4:26 PM, Jonathan Monette wrote:
> >
> >> No. This was a run Ketan did a while back. I have been using this
> >> as a reference when trying to re-create the issue with a simple
> >> catsnsleep job.
> >>
> >> This run was also done on Beagle using the pre-installed java
> >> package, which does not have jstack.
> >>
> >> On Feb 20, 2012, at 4:24 PM, Mihael Hategan wrote:
> >>
> >>> I'm not sure if I asked this, but did you happen to get a jstack
> >>> of the
> >>> hanging swift?
> >>>
> >>> On Mon, 2012-02-20 at 16:19 -0600, Jonathan Monette wrote:
> >>>> No. The last run was run using Beagle. That is the more
> >>>> interesting one. That shows jobs failed but the "Failed but can
> >>>> retry" count was not printed very often. You can see that in the
> >>>> swift.out file. Eventually the workflow just hung and the hang
> >>>> checker kicked in. You can also see that Swift got stuck in the
> >>>> initializing state with a count of 61.
> >>>>
> >>>> On Feb 20, 2012, at 4:16 PM, Mihael Hategan wrote:
> >>>>
> >>>>> On Mon, 2012-02-20 at 16:14 -0600, Jonathan Monette wrote:
> >>>>>> /gpfs/pads/swift/jonmon/Swift/tests/catsnsleep <----- on
> >>>>>> /gpfs/pads
> >>>>>> /home/jonmon/public_html/Swift/bugs/SciColSim/run002 <----- on
> >>>>>> any CI machine
> >>>>>
> >>>>> Ok. Sorry. I thought the last one was on beagle.
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list