[Swift-devel] Index out of bounds
Mihael Hategan
hategan at mcs.anl.gov
Mon Aug 22 21:47:05 CDT 2011
On Mon, 2011-08-22 at 21:37 -0500, Jonathan Monette wrote:
> Ok. I have ran the test case after updating the and rebuilding the
> 0.93 release. I am not sure why the IndexOutOfBounds error was
> appearing but now it is not.
Ok, then I might know. Was any of your files over 2G in size?
> I have ran my scripts around 10 times and the error has not appeared.
> I am not really sure what happened but I cannot reproduce the error.
> I am not sure why it was appearing in the first place.
>
> On Aug 20, 2011, at 9:03 PM, Michael Wilde wrote:
>
> > Jon, the list you want for Beagle issue notifications is
> > beagle-users. You can subscribe via the link:
> >
> >
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/beagle-users
> >
> >
> > - Mike
> >
> >
> > ----- Forwarded Message -----
> > From: "Greg Cross" <grog at ci.uchicago.edu>
> > To: beagle-users at ci.uchicago.edu
> > Sent: Saturday, August 20, 2011 2:12:45 PM
> > Subject: [beagle-users] Outage update
> >
> >
> > Lustre is mounting properly but there is a communication failure
> > between the Moab and ALPS scheduler components. This issue is under
> > investigation and has been escalated to Cray.
> >
> >
> > As a reminder, please DO NOT attempt to log into the system during
> > this or any other maintenance period. While logins should be denied
> > at this time, any user processes found running on login or sandbox
> > nodes will be terminated without warning. Users who do not respect
> > this may be contacted individually.
> >
> >
> > Definitive notification will be sent to this mailing list when the
> > system is available for use.
> >
> >
> >
> >
> > _______________________________________________
> > beagle-users mailing list
> > beagle-users at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/beagle-users
> >
> >
> >
> >
> > ____________________________________________________________________
> > From: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > To: "Daniel S. Katz" <dsk at ci.uchicago.edu>
> > Cc: swift-devel at ci.uchicago.edu
> > Sent: Saturday, August 20, 2011 4:20:35 PM
> > Subject: Re: [Swift-devel] Index out of bounds
> >
> > Thanks. In the meantime could someone let me know when
> > beagle is back in production so I can check my run?
> >
> > ----- Reply message -----
> > From: "Daniel S. Katz" <dsk at ci.uchicago.edu>
> > Date: Sat, Aug 20, 2011 3:14 pm
> > Subject: [Swift-devel] Index out of bounds
> > To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > Cc: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>,
> > "swift-devel at ci.uchicago.edu" <swift-devel at ci.uchicago.edu>
> >
> >
> >
> > Yes, write to beagle-support.
> >
> > On Aug 20, 2011, at 14:52, "Jonathan Monette"
> > <jonmon at mcs.anl.gov> wrote:
> >
> >
> >
> > Ok thanks. It seems that I was not added to the
> > beagle-notify list. Could someone point me to a link
> > I can subscribe to? Or do I subscribe by sending
> > mail to beagle-support?
> >
> > ----- Reply message -----
> > From: "Ketan Maheshwari"
> > <ketancmaheshwari at gmail.com>
> > Date: Sat, Aug 20, 2011 7:45 am
> > Subject: [Swift-devel] Index out of bounds
> > To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > Cc: <swift-devel at ci.uchicago.edu>
> >
> >
> >
> > Yes, Beagle went down yesterday. There was a notice.
> >
> >
> > Current status as of Aug 19, 5.30PM:
> >
> >
> > ==
> > At this time, Lustre is not starting properly on
> > Beagle. This may be related to a configuration
> > change that was made during the last outage. The
> > effort to restore system availability is still in
> > active progress.
> > ==
> >
> >
> >
> >
> > Ketan
> >
> > On Sat, Aug 20, 2011 at 12:03 AM, Jonathan
> > Monette <jonmon at mcs.anl.gov> wrote:
> > I updated and rebuilt and added that line to
> > my log4j properties. Does anyone know if
> > Beagle is down? showq says there is no
> > service listening to sdb:<number>. qstat
> > shows that I have a job sitting in the queue
> > but it doesn't look like jobs are running.
> >
> > I am using both PADS and Beagle for this
> > execution. In this case where jobs are not
> > executing on Beagle shouldn't Swift start
> > submitting jobs to PADS? I do not see that
> > behavior.
> >
> > This run is still executing. But if you
> > would like to look at the log it is
> > at www.ci.uchicago.edu/~jonmon/logs/montage-2.log. Only 23 tasks have finished before it just sits there waiting for Beagle to run.
> >
> > On Aug 19, 2011, at 2:46 PM, Jonathan
> > Monette wrote:
> >
> > > Sure can. I add that line to the log4j
> > file or in a different properties file.
> > >
> > > ----- Reply message -----
> > > From: "Mihael Hategan"
> > <hategan at mcs.anl.gov>
> > > Date: Fri, Aug 19, 2011 2:03 pm
> > > Subject: Index out of bounds
> > > To: "Jonathan Monette"
> > <jonmon at mcs.anl.gov>
> > > Cc: <swift-devel at ci.uchicago.edu>
> > >
> > >
> > > Hmm. So I can't see how this manages to
> > happen.
> > >
> > > I added some checks and debugging
> > statements. Can you update, set log
> > > level of
> > org.globus.cog.abstraction.impl.file.local
> > to DEBUG, re-run and
> > > then post the log when the exception pops
> > up?
> > >
> > > Mihael
> > >
> > > On Thu, 2011-08-18 at 23:14 -0500,
> > Jonathan Monette wrote:
> > > > Ok. The log is at
> > >
> > > www.ci.uchicago.edu/~jonmon/logs/montage-1.log
> > > > On Aug 18, 2011, at 5:56 PM, Mihael
> > Hategan wrote:
> > > >
> > > > > It's probably a good idea to post the
> > stack trace of that exception now
> > > > > rather than later.
> > > > >
> > > > > On Thu, 2011-08-18 at 13:09 -0500,
> > Jonathan Monette wrote:
> > > > >> Hello,
> > > > >> I was running 0.93 with one a
> > relatively small run, a 350 task run.
> > > > >> The run failed on one of the final
> > tasks. I checked the log file and
> > > > >> saw some index out of bounds errors.
> > I tried with a smaller run and
> > > > >> didn't see the error.
> > > > >>
> > > > >> This run was using beagle, pads, and
> > communicado. I was also using
> > > > >> cdm. I will post the log in a bit. I
> > am seeing if I cam replicate it
> > > > >> without using cdm and with a smaller
> > site pool.
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > >
> >
> >
> > >
> > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> >
> >
> >
> >
> > --
> > Ketan
> >
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> >
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> >
>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
More information about the Swift-devel
mailing list