[Swift-devel] Walltime exceeded error

Jonathan Monette jonmon at mcs.anl.gov
Sun Feb 26 17:12:18 CST 2012


I have again updated the bug: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=720
There are steps now how to reproduce it with a small test from the application we are running.  The steps outlined sets up the application to be run on whatever machine you are testing on.

This turns out not to be a coaster bug but a swift bug.  The test in /gpfs/pads/swift/jonmon/Swift/bugs/SciColSim/run014 is a local test, it did not use coasters at all and still the hang checker kicked in.

On Feb 24, 2012, at 4:09 PM, Jonathan Monette wrote:

> I have updated the bugzilla bug with the below directories: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=720
> 
> I have also added another directory showing the same behavior
> /gpfs/pads/swift/jonmon/Swift/bugs/SciColSim/run054
> 
> There is a jstack.log file in that directory.  All three of the run directories show that jobs get stuck in the initialized state and the hang checker kicks in.
> 
> On Feb 22, 2012, at 5:05 PM, Jonathan Monette wrote:
> 
>> This has been done.  I have also moved the run that Ketan had produced to PADS.
>> 
>> /gpfs/pads/swift/jonmon/Swift/bugs/SciColSim/run002    <-----Ketan's run
>> /gpfs/pads/swift/jonmon/Swift/bugs/SciColSim/run047    <-----My run(has a jstack.log file, also more recent)
>> 
>> On Feb 22, 2012, at 4:33 PM, Jonathan Monette wrote:
>> 
>>> Ok.  I have killed the process and I am in the process of copying the run directory from the lustre file system on Beagle to /gpfs/pads
>>> 
>>> On Feb 22, 2012, at 4:28 PM, Mihael Hategan wrote:
>>> 
>>>> On Wed, 2012-02-22 at 15:45 -0600, Jonathan Monette wrote:
>>>>> Mihael,
>>>>> I have a hung Java process showing this error right now, 2 jobs are
>>>>> stuck in the initializing state.  I have a jstack -l <pid> of this
>>>>> hung java process.  Is there anything else you need before I kill it?
>>>>> Do you need any other probing information from this process other than
>>>>> this jstack output?
>>>> 
>>>> I don't think so.
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>> 
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120226/884a9c91/attachment.html>


More information about the Swift-devel mailing list