[Swift-devel] Index out of bounds
Jonathan Monette
jonmon at mcs.anl.gov
Mon Aug 22 21:48:17 CDT 2011
Yes as a matter of fact. The output files of the app that was failing are both 2.6GB.
On Aug 22, 2011, at 9:47 PM, Mihael Hategan wrote:
> On Mon, 2011-08-22 at 21:37 -0500, Jonathan Monette wrote:
>> Ok. I have ran the test case after updating the and rebuilding the
>> 0.93 release. I am not sure why the IndexOutOfBounds error was
>> appearing but now it is not.
>
> Ok, then I might know. Was any of your files over 2G in size?
>
>> I have ran my scripts around 10 times and the error has not appeared.
>> I am not really sure what happened but I cannot reproduce the error.
>> I am not sure why it was appearing in the first place.
>>
>> On Aug 20, 2011, at 9:03 PM, Michael Wilde wrote:
>>
>>> Jon, the list you want for Beagle issue notifications is
>>> beagle-users. You can subscribe via the link:
>>>
>>>
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/beagle-users
>>>
>>>
>>> - Mike
>>>
>>>
>>> ----- Forwarded Message -----
>>> From: "Greg Cross" <grog at ci.uchicago.edu>
>>> To: beagle-users at ci.uchicago.edu
>>> Sent: Saturday, August 20, 2011 2:12:45 PM
>>> Subject: [beagle-users] Outage update
>>>
>>>
>>> Lustre is mounting properly but there is a communication failure
>>> between the Moab and ALPS scheduler components. This issue is under
>>> investigation and has been escalated to Cray.
>>>
>>>
>>> As a reminder, please DO NOT attempt to log into the system during
>>> this or any other maintenance period. While logins should be denied
>>> at this time, any user processes found running on login or sandbox
>>> nodes will be terminated without warning. Users who do not respect
>>> this may be contacted individually.
>>>
>>>
>>> Definitive notification will be sent to this mailing list when the
>>> system is available for use.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> beagle-users mailing list
>>> beagle-users at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/beagle-users
>>>
>>>
>>>
>>>
>>> ____________________________________________________________________
>>> From: "Jonathan Monette" <jonmon at mcs.anl.gov>
>>> To: "Daniel S. Katz" <dsk at ci.uchicago.edu>
>>> Cc: swift-devel at ci.uchicago.edu
>>> Sent: Saturday, August 20, 2011 4:20:35 PM
>>> Subject: Re: [Swift-devel] Index out of bounds
>>>
>>> Thanks. In the meantime could someone let me know when
>>> beagle is back in production so I can check my run?
>>>
>>> ----- Reply message -----
>>> From: "Daniel S. Katz" <dsk at ci.uchicago.edu>
>>> Date: Sat, Aug 20, 2011 3:14 pm
>>> Subject: [Swift-devel] Index out of bounds
>>> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
>>> Cc: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>,
>>> "swift-devel at ci.uchicago.edu" <swift-devel at ci.uchicago.edu>
>>>
>>>
>>>
>>> Yes, write to beagle-support.
>>>
>>> On Aug 20, 2011, at 14:52, "Jonathan Monette"
>>> <jonmon at mcs.anl.gov> wrote:
>>>
>>>
>>>
>>> Ok thanks. It seems that I was not added to the
>>> beagle-notify list. Could someone point me to a link
>>> I can subscribe to? Or do I subscribe by sending
>>> mail to beagle-support?
>>>
>>> ----- Reply message -----
>>> From: "Ketan Maheshwari"
>>> <ketancmaheshwari at gmail.com>
>>> Date: Sat, Aug 20, 2011 7:45 am
>>> Subject: [Swift-devel] Index out of bounds
>>> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
>>> Cc: <swift-devel at ci.uchicago.edu>
>>>
>>>
>>>
>>> Yes, Beagle went down yesterday. There was a notice.
>>>
>>>
>>> Current status as of Aug 19, 5.30PM:
>>>
>>>
>>> ==
>>> At this time, Lustre is not starting properly on
>>> Beagle. This may be related to a configuration
>>> change that was made during the last outage. The
>>> effort to restore system availability is still in
>>> active progress.
>>> ==
>>>
>>>
>>>
>>>
>>> Ketan
>>>
>>> On Sat, Aug 20, 2011 at 12:03 AM, Jonathan
>>> Monette <jonmon at mcs.anl.gov> wrote:
>>> I updated and rebuilt and added that line to
>>> my log4j properties. Does anyone know if
>>> Beagle is down? showq says there is no
>>> service listening to sdb:<number>. qstat
>>> shows that I have a job sitting in the queue
>>> but it doesn't look like jobs are running.
>>>
>>> I am using both PADS and Beagle for this
>>> execution. In this case where jobs are not
>>> executing on Beagle shouldn't Swift start
>>> submitting jobs to PADS? I do not see that
>>> behavior.
>>>
>>> This run is still executing. But if you
>>> would like to look at the log it is
>>> at www.ci.uchicago.edu/~jonmon/logs/montage-2.log. Only 23 tasks have finished before it just sits there waiting for Beagle to run.
>>>
>>> On Aug 19, 2011, at 2:46 PM, Jonathan
>>> Monette wrote:
>>>
>>>> Sure can. I add that line to the log4j
>>> file or in a different properties file.
>>>>
>>>> ----- Reply message -----
>>>> From: "Mihael Hategan"
>>> <hategan at mcs.anl.gov>
>>>> Date: Fri, Aug 19, 2011 2:03 pm
>>>> Subject: Index out of bounds
>>>> To: "Jonathan Monette"
>>> <jonmon at mcs.anl.gov>
>>>> Cc: <swift-devel at ci.uchicago.edu>
>>>>
>>>>
>>>> Hmm. So I can't see how this manages to
>>> happen.
>>>>
>>>> I added some checks and debugging
>>> statements. Can you update, set log
>>>> level of
>>> org.globus.cog.abstraction.impl.file.local
>>> to DEBUG, re-run and
>>>> then post the log when the exception pops
>>> up?
>>>>
>>>> Mihael
>>>>
>>>> On Thu, 2011-08-18 at 23:14 -0500,
>>> Jonathan Monette wrote:
>>>>> Ok. The log is at
>>>>
>>>> www.ci.uchicago.edu/~jonmon/logs/montage-1.log
>>>>> On Aug 18, 2011, at 5:56 PM, Mihael
>>> Hategan wrote:
>>>>>
>>>>>> It's probably a good idea to post the
>>> stack trace of that exception now
>>>>>> rather than later.
>>>>>>
>>>>>> On Thu, 2011-08-18 at 13:09 -0500,
>>> Jonathan Monette wrote:
>>>>>>> Hello,
>>>>>>> I was running 0.93 with one a
>>> relatively small run, a 350 task run.
>>>>>>> The run failed on one of the final
>>> tasks. I checked the log file and
>>>>>>> saw some index out of bounds errors.
>>> I tried with a smaller run and
>>>>>>> didn't see the error.
>>>>>>>
>>>>>>> This run was using beagle, pads, and
>>> communicado. I was also using
>>>>>>> cdm. I will post the log in a bit. I
>>> am seeing if I cam replicate it
>>>>>>> without using cdm and with a smaller
>>> site pool.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>>
>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Ketan
>>>
>>>
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>
>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>
>>>
>>>
>>> --
>>> Michael Wilde
>>> Computation Institute, University of Chicago
>>> Mathematics and Computer Science Division
>>> Argonne National Laboratory
>>>
>>>
>>
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
>
More information about the Swift-devel
mailing list