[Swift-devel] Index out of bounds

Jonathan Monette jonmon at mcs.anl.gov
Mon Aug 22 21:48:17 CDT 2011


Yes as a matter of fact.  The output files of the app that was failing are both 2.6GB.
On Aug 22, 2011, at 9:47 PM, Mihael Hategan wrote:

> On Mon, 2011-08-22 at 21:37 -0500, Jonathan Monette wrote:
>> Ok.  I have ran the test case after updating the and rebuilding the
>> 0.93 release.  I am not sure why the IndexOutOfBounds error was
>> appearing but now it is not. 
> 
> Ok, then I might know. Was any of your files over 2G in size?
> 
>> I have ran my scripts around 10 times and the error has not appeared.
>> I am not really sure what happened but I cannot reproduce the error.
>> I am not sure why it was appearing in the first place.
>> 
>> On Aug 20, 2011, at 9:03 PM, Michael Wilde wrote:
>> 
>>> Jon,  the list you want for Beagle issue notifications is
>>> beagle-users. You can subscribe via the link:
>>> 
>>> 
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/beagle-users
>>> 
>>> 
>>> - Mike
>>> 
>>> 
>>> ----- Forwarded Message -----
>>> From: "Greg Cross" <grog at ci.uchicago.edu>
>>> To: beagle-users at ci.uchicago.edu
>>> Sent: Saturday, August 20, 2011 2:12:45 PM
>>> Subject: [beagle-users] Outage update
>>> 
>>> 
>>> Lustre is mounting properly but there is a communication failure
>>> between the Moab and ALPS scheduler components.  This issue is under
>>> investigation and has been escalated to Cray.
>>> 
>>> 
>>> As a reminder, please DO NOT attempt to log into the system during
>>> this or any other maintenance period.  While logins should be denied
>>> at this time, any user processes found running on login or sandbox
>>> nodes will be terminated without warning.  Users who do not respect
>>> this may be contacted individually.
>>> 
>>> 
>>> Definitive notification will be sent to this mailing list when the
>>> system is available for use.
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> beagle-users mailing list
>>> beagle-users at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/beagle-users
>>> 
>>> 
>>> 
>>> 
>>> ____________________________________________________________________
>>>        From: "Jonathan Monette" <jonmon at mcs.anl.gov>
>>>        To: "Daniel S. Katz" <dsk at ci.uchicago.edu>
>>>        Cc: swift-devel at ci.uchicago.edu
>>>        Sent: Saturday, August 20, 2011 4:20:35 PM
>>>        Subject: Re: [Swift-devel] Index out of bounds
>>> 
>>>        Thanks. In the meantime could someone let me know when
>>>        beagle is back in production so I can check my run? 
>>> 
>>>        ----- Reply message -----
>>>        From: "Daniel S. Katz" <dsk at ci.uchicago.edu>
>>>        Date: Sat, Aug 20, 2011 3:14 pm
>>>        Subject: [Swift-devel] Index out of bounds
>>>        To: "Jonathan Monette" <jonmon at mcs.anl.gov>
>>>        Cc: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>,
>>>        "swift-devel at ci.uchicago.edu" <swift-devel at ci.uchicago.edu>
>>> 
>>> 
>>> 
>>>        Yes, write to beagle-support. 
>>> 
>>>        On Aug 20, 2011, at 14:52, "Jonathan Monette"
>>>        <jonmon at mcs.anl.gov> wrote:
>>> 
>>> 
>>> 
>>>                Ok thanks. It seems that I was not added to the
>>>                beagle-notify list. Could someone point me to a link
>>>                I can subscribe to? Or do I subscribe by sending
>>>                mail to beagle-support?
>>> 
>>>                ----- Reply message -----
>>>                From: "Ketan Maheshwari"
>>>                <ketancmaheshwari at gmail.com>
>>>                Date: Sat, Aug 20, 2011 7:45 am
>>>                Subject: [Swift-devel] Index out of bounds
>>>                To: "Jonathan Monette" <jonmon at mcs.anl.gov>
>>>                Cc: <swift-devel at ci.uchicago.edu>
>>> 
>>> 
>>> 
>>>                Yes, Beagle went down yesterday. There was a notice.
>>> 
>>> 
>>>                Current status as of Aug 19, 5.30PM:
>>> 
>>> 
>>>                ==
>>>                At this time, Lustre is not starting properly on
>>>                Beagle.  This may be related to a configuration
>>>                change that was made during the last outage.  The
>>>                effort to restore system availability is still in
>>>                active progress.
>>>                ==
>>> 
>>> 
>>> 
>>> 
>>>                Ketan
>>> 
>>>                On Sat, Aug 20, 2011 at 12:03 AM, Jonathan
>>>                Monette <jonmon at mcs.anl.gov> wrote:
>>>                        I updated and rebuilt and added that line to
>>>                        my log4j properties.  Does anyone know if
>>>                        Beagle is down?  showq says there is no
>>>                        service listening to sdb:<number>.  qstat
>>>                        shows that I have a job sitting in the queue
>>>                        but it doesn't look like jobs are running.
>>> 
>>>                        I am using both PADS and Beagle for this
>>>                        execution.  In this case where jobs are not
>>>                        executing on Beagle shouldn't Swift start
>>>                        submitting jobs to PADS?  I do not see that
>>>                        behavior.
>>> 
>>>                        This run is still executing.  But if you
>>>                        would like to look at the log it is
>>>                        at www.ci.uchicago.edu/~jonmon/logs/montage-2.log.  Only 23 tasks have finished before it just sits there waiting for Beagle to run.
>>> 
>>>                        On Aug 19, 2011, at 2:46 PM, Jonathan
>>>                        Monette wrote:
>>> 
>>>> Sure can. I add that line to the log4j
>>>                        file or in a different properties file.
>>>> 
>>>> ----- Reply message -----
>>>> From: "Mihael Hategan"
>>>                        <hategan at mcs.anl.gov>
>>>> Date: Fri, Aug 19, 2011 2:03 pm
>>>> Subject: Index out of bounds
>>>> To: "Jonathan Monette"
>>>                        <jonmon at mcs.anl.gov>
>>>> Cc: <swift-devel at ci.uchicago.edu>
>>>> 
>>>> 
>>>> Hmm. So I can't see how this manages to
>>>                        happen.
>>>> 
>>>> I added some checks and debugging
>>>                        statements. Can you update, set log
>>>> level of
>>>                        org.globus.cog.abstraction.impl.file.local
>>>                        to DEBUG, re-run and
>>>> then post the log when the exception pops
>>>                        up?
>>>> 
>>>> Mihael
>>>> 
>>>> On Thu, 2011-08-18 at 23:14 -0500,
>>>                        Jonathan Monette wrote:
>>>>> Ok.  The log is at
>>>> 
>>>> www.ci.uchicago.edu/~jonmon/logs/montage-1.log
>>>>> On Aug 18, 2011, at 5:56 PM, Mihael
>>>                        Hategan wrote:
>>>>> 
>>>>>> It's probably a good idea to post the
>>>                        stack trace of that exception now
>>>>>> rather than later.
>>>>>> 
>>>>>> On Thu, 2011-08-18 at 13:09 -0500,
>>>                        Jonathan Monette wrote:
>>>>>>> Hello,
>>>>>>> I was running 0.93 with one a
>>>                        relatively small run, a 350 task run.
>>>>>>> The run failed on one of the final
>>>                        tasks. I checked the log file and
>>>>>>> saw some index out of bounds errors.
>>>                        I tried with a smaller run and
>>>>>>> didn't see the error.
>>>>>>> 
>>>>>>> This run was using beagle, pads, and
>>>                        communicado. I was also using
>>>>>>> cdm. I will post the log in a bit. I
>>>                        am seeing if I cam replicate it
>>>>>>> without using cdm and with a smaller
>>>                        site pool.
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>>> 
>>>                        _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>> 
>>>                        _______________________________________________
>>>                        Swift-devel mailing list
>>>                        Swift-devel at ci.uchicago.edu
>>>                        https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>> 
>>> 
>>> 
>>> 
>>> 
>>>                -- 
>>>                Ketan
>>> 
>>> 
>>> 
>>>                _______________________________________________
>>>                Swift-devel mailing list
>>>                Swift-devel at ci.uchicago.edu
>>>                https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>> 
>>> 
>>>        _______________________________________________
>>>        Swift-devel mailing list
>>>        Swift-devel at ci.uchicago.edu
>>>        https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>> 
>>> 
>>> 
>>> -- 
>>> Michael Wilde
>>> Computation Institute, University of Chicago
>>> Mathematics and Computer Science Division
>>> Argonne National Laboratory
>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> 




More information about the Swift-devel mailing list