[Swift-devel] Walltime exceeded error
Jonathan Monette
jonmon at mcs.anl.gov
Wed Feb 22 16:00:34 CST 2012
Ok. I shall kill it.
> Hi Jon, I think Mondays Mihael is pretty swamped with school commitments.
>
> The only other thing I can think of grabbing is worker logs, but I doubt that any provision was made to request worker logging for this run.
>
> I'd go ahead and terminate the run.
>
> - Mike
>
> ----- Original Message -----
>> From: "Jonathan Monette" <jonmon at mcs.anl.gov>
>> To: "Mihael Hategan" <hategan at mcs.anl.gov>
>> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
>> Sent: Wednesday, February 22, 2012 3:45:53 PM
>> Subject: Re: [Swift-devel] Walltime exceeded error
>> Mihael,
>> I have a hung Java process showing this error right now, 2 jobs are
>> stuck in the initializing state. I have a jstack -l <pid> of this hung
>> java process. Is there anything else you need before I kill it? Do you
>> need any other probing information from this process other than this
>> jstack output?
>>
>> On Feb 20, 2012, at 4:27 PM, Jonathan Monette wrote:
>>
>>> Correction, Beagle does have jstack. Do not know why I thought it
>>> did not have it.
>>>
>>> On Feb 20, 2012, at 4:26 PM, Jonathan Monette wrote:
>>>
>>>> No. This was a run Ketan did a while back. I have been using this
>>>> as a reference when trying to re-create the issue with a simple
>>>> catsnsleep job.
>>>>
>>>> This run was also done on Beagle using the pre-installed java
>>>> package, which does not have jstack.
>>>>
>>>> On Feb 20, 2012, at 4:24 PM, Mihael Hategan wrote:
>>>>
>>>>> I'm not sure if I asked this, but did you happen to get a jstack
>>>>> of the
>>>>> hanging swift?
>>>>>
>>>>> On Mon, 2012-02-20 at 16:19 -0600, Jonathan Monette wrote:
>>>>>> No. The last run was run using Beagle. That is the more
>>>>>> interesting one. That shows jobs failed but the "Failed but can
>>>>>> retry" count was not printed very often. You can see that in the
>>>>>> swift.out file. Eventually the workflow just hung and the hang
>>>>>> checker kicked in. You can also see that Swift got stuck in the
>>>>>> initializing state with a count of 61.
>>>>>>
>>>>>> On Feb 20, 2012, at 4:16 PM, Mihael Hategan wrote:
>>>>>>
>>>>>>> On Mon, 2012-02-20 at 16:14 -0600, Jonathan Monette wrote:
>>>>>>>> /gpfs/pads/swift/jonmon/Swift/tests/catsnsleep <----- on
>>>>>>>> /gpfs/pads
>>>>>>>> /home/jonmon/public_html/Swift/bugs/SciColSim/run002 <----- on
>>>>>>>> any CI machine
>>>>>>>
>>>>>>> Ok. Sorry. I thought the last one was on beagle.
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
More information about the Swift-devel
mailing list