[Swift-devel] awf2 errors
Mihael Hategan
hategan at mcs.anl.gov
Sat Oct 27 23:05:05 CDT 2007
On Sat, 2007-10-27 at 17:52 -0500, Michael Wilde wrote:
>
> On 10/27/07 5:05 PM, Mihael Hategan wrote:
> > On Sat, 2007-10-27 at 21:57 +0000, Ben Clifford wrote:
> >> On Sat, 27 Oct 2007, Michael Wilde wrote:
> >>
> >>> One additional unexplained item is that in the run you analyzed with a 4-wide
> >>> transfer throttle, I was still getting a lot of I/O errors in the log, which I
> >>> dont thing have been explained yet.
> >> Can you past one? I don't immediately see them. I see lots of
> >> APPLICATION_EXCEPTIONS but with not much detail about the cause.
> >
> > Whenever a job fails, Swift will attempt to transfer the stdout and
> > stderr of that job. There is no guarantee that those files are created
> > by the job (i.e. they only get created when at least one character is
> > written to them). Hence the transfer of these may fail. It is not an
> > error at the Swift level. Again, it's a pattern of the following kind:
> >
>
> Thats what it was. There were 12 APPLICATION_EXCEPTION errors out of
> 1000 jobs, and 48 failures to get stderr. I didnt correlate these
> because I had found the I/O errors via grep, and there were 400+ lines
> with error/failure strings. And I didnt catch that they all pertained to
> stderr.
>
> I'm guessing (but need to check) that those 48 represent retries of some
> sort on the 12 failed jobs.
>
> So you're right, Ben, the slow data return rate is more likely due to
> throttling or contention.
>
> I think we should try to indicate top-level Swift-detected errors with a
> distinct code to separate them from all the lower-level error details
> that each incident produces.
It's an interesting point, but I don't quite see how that would be done
in theory :). Superficially, there would need to be a parameter which
goes all the way down from maybe() to whatever piece of software
implements what's under it for a specific call and say "This call is
special, so log it with some stuff before it". Proponents of information
hiding would shout "No! It gives you access to implementation details."
> I realize in practice that this may not be
> easy, as the details may get logged before the error propagates up tp
> the "top" level. I wonder what the Globus developers have concluded
> about error logging strategy. (Can discuss this later on relevant
> bugzilla bugs - dont want to sidetrack discussion now).
>
>
> > try {
> > optionalOperationWhichDoesNotHaveToSucceed();
> > }
> > catch (Exception e) {
> > log(e);
> > }
> >
> >
> >
>
More information about the Swift-devel
mailing list