[Swift-devel] Walltime exceeded error

Jonathan Monette jonmon at mcs.anl.gov
Wed Feb 22 16:00:34 CST 2012


Ok.  I shall kill it.

> Hi Jon, I think Mondays Mihael is pretty swamped with school commitments.
> 
> The only other thing I can think of grabbing is worker logs, but I doubt that any provision was made to request worker logging for this run.
> 
> I'd go ahead and terminate the run.
> 
> - Mike
> 
> ----- Original Message -----
>> From: "Jonathan Monette" <jonmon at mcs.anl.gov>
>> To: "Mihael Hategan" <hategan at mcs.anl.gov>
>> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
>> Sent: Wednesday, February 22, 2012 3:45:53 PM
>> Subject: Re: [Swift-devel] Walltime exceeded error
>> Mihael,
>> I have a hung Java process showing this error right now, 2 jobs are
>> stuck in the initializing state. I have a jstack -l <pid> of this hung
>> java process. Is there anything else you need before I kill it? Do you
>> need any other probing information from this process other than this
>> jstack output?
>> 
>> On Feb 20, 2012, at 4:27 PM, Jonathan Monette wrote:
>> 
>>> Correction, Beagle does have jstack. Do not know why I thought it
>>> did not have it.
>>> 
>>> On Feb 20, 2012, at 4:26 PM, Jonathan Monette wrote:
>>> 
>>>> No. This was a run Ketan did a while back. I have been using this
>>>> as a reference when trying to re-create the issue with a simple
>>>> catsnsleep job.
>>>> 
>>>> This run was also done on Beagle using the pre-installed java
>>>> package, which does not have jstack.
>>>> 
>>>> On Feb 20, 2012, at 4:24 PM, Mihael Hategan wrote:
>>>> 
>>>>> I'm not sure if I asked this, but did you happen to get a jstack
>>>>> of the
>>>>> hanging swift?
>>>>> 
>>>>> On Mon, 2012-02-20 at 16:19 -0600, Jonathan Monette wrote:
>>>>>> No. The last run was run using Beagle. That is the more
>>>>>> interesting one. That shows jobs failed but the "Failed but can
>>>>>> retry" count was not printed very often. You can see that in the
>>>>>> swift.out file. Eventually the workflow just hung and the hang
>>>>>> checker kicked in. You can also see that Swift got stuck in the
>>>>>> initializing state with a count of 61.
>>>>>> 
>>>>>> On Feb 20, 2012, at 4:16 PM, Mihael Hategan wrote:
>>>>>> 
>>>>>>> On Mon, 2012-02-20 at 16:14 -0600, Jonathan Monette wrote:
>>>>>>>> /gpfs/pads/swift/jonmon/Swift/tests/catsnsleep <----- on
>>>>>>>> /gpfs/pads
>>>>>>>> /home/jonmon/public_html/Swift/bugs/SciColSim/run002 <----- on
>>>>>>>> any CI machine
>>>>>>> 
>>>>>>> Ok. Sorry. I thought the last one was on beagle.
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>> 
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>> 
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 




More information about the Swift-devel mailing list