[Swift-devel] Re: Persistent coaster service fails after several runs

Michael Wilde wilde at mcs.anl.gov
Sun Nov 28 00:47:56 CST 2010


Will test this one tomorrow. I deleted logs and other junk as I was way over quota. Sorry, I forgot I had pointed you to these.

- Mike


----- Original Message -----
> I think some of the logs in /home/wilde/swift/lab are gone.
> Nonetheless,
> I believe that the lockup was caused by the following issue:
> 
> - when something bad happened on a channel, some method would be
> called
> to allow the channel implementation to handle that error.
> - an existing problem (which I thought I fixed, but it turns out I had
> not committed it) caused that method to throw an exception
> - that would in turn (because it was not in a try/catch block) kill
> the
> thread used to send messages on behalf of all channels of a given
> type.
> 
> This was fixed as follows:
> 1. I committed what I should have a while ago such that the triggering
> problem is gone
> 2. The handling of channel exceptions is now properly isolated
> 
> Mihael
> 
> On Sun, 2010-11-21 at 21:00 -0600, Michael Wilde wrote:
> > subject was: Re: [Swift-devel] misassignment of jobs
> >
> > Re the service-side timeout, OK, will do.
> >
> > Ive just re-created bug1, but its a little different than I thought.
> >
> > Swift runs to the persistent coaster server lock up (ie fail to
> > progress) and then get errors, not after a delay, but seemingly
> > randomly. Thats likely why I was misled into thinking it was delay
> > related.

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list