[Swift-devel] Re: Persistent coaster service fails after several runs

Mihael Hategan hategan at mcs.anl.gov
Sat Nov 27 23:58:24 CST 2010


I think some of the logs in /home/wilde/swift/lab are gone. Nonetheless,
I believe that the lockup was caused by the following issue:

- when something bad happened on a channel, some method would be called
to allow the channel implementation to handle that error.
- an existing problem (which I thought I fixed, but it turns out I had
not committed it) caused that method to throw an exception
- that would in turn (because it was not in a try/catch block) kill the
thread used to send messages on behalf of all channels of a given type.

This was fixed as follows:
1. I committed what I should have a while ago such that the triggering
problem is gone
2. The handling of channel exceptions is now properly isolated

Mihael

On Sun, 2010-11-21 at 21:00 -0600, Michael Wilde wrote:
> subject was: Re: [Swift-devel] misassignment of jobs
> 
> Re the service-side timeout, OK, will do.
> 
> Ive just re-created bug1, but its a little different than I thought.
> 
> Swift runs to the persistent coaster server lock up (ie fail to
> progress) and then get errors, not after a delay, but seemingly
> randomly. Thats likely why I was misled into thinking it was delay
> related.





More information about the Swift-devel mailing list