[Swift-devel] awf2 errors

Mihael Hategan hategan at mcs.anl.gov
Sat Oct 27 23:05:05 CDT 2007


On Sat, 2007-10-27 at 17:52 -0500, Michael Wilde wrote:
> 
> On 10/27/07 5:05 PM, Mihael Hategan wrote:
> > On Sat, 2007-10-27 at 21:57 +0000, Ben Clifford wrote:
> >> On Sat, 27 Oct 2007, Michael Wilde wrote:
> >>
> >>> One additional unexplained item is that in the run you analyzed with a 4-wide
> >>> transfer throttle, I was still getting a lot of I/O errors in the log, which I
> >>> dont thing have been explained yet.
> >> Can you past one? I don't immediately see them. I see lots of 
> >> APPLICATION_EXCEPTIONS but with not much detail about the cause.
> > 
> > Whenever a job fails, Swift will attempt to transfer the stdout and
> > stderr of that job. There is no guarantee that those files are created
> > by the job (i.e. they only get created when at least one character is
> > written to them). Hence the transfer of these may fail. It is not an
> > error at the Swift level. Again, it's a pattern of the following kind:
> > 
> 
> Thats what it was.  There were 12 APPLICATION_EXCEPTION errors out of 
> 1000 jobs, and 48 failures to get stderr. I didnt correlate these 
> because I had found the I/O errors via grep, and there were 400+ lines 
> with error/failure strings. And I didnt catch that they all pertained to 
> stderr.
> 
> I'm guessing (but need to check) that those 48 represent retries of some 
> sort on the 12 failed jobs.
> 
> So you're right, Ben, the slow data return rate is more likely due to 
> throttling or contention.
> 
> I think we should try to indicate top-level Swift-detected errors with a 
> distinct code to separate them from all the lower-level error details 
> that each incident produces.

It's an interesting point, but I don't quite see how that would be done
in theory :). Superficially, there would need to be a parameter which
goes all the way down from maybe() to whatever piece of software
implements what's under it for a specific call and say "This call is
special, so log it with some stuff before it". Proponents of information
hiding would shout "No! It gives you access to implementation details."

>  I realize in practice that this may not be 
> easy, as the details may get logged before the error propagates up tp 
> the "top" level. I wonder what the Globus developers have concluded 
> about error logging strategy. (Can discuss this later on relevant 
> bugzilla bugs - dont want to sidetrack discussion now).
> 
> 
> > try {
> >   optionalOperationWhichDoesNotHaveToSucceed();
> > }
> > catch (Exception e) {
> >   log(e);
> > }
> > 
> > 
> > 
> 




More information about the Swift-devel mailing list