[Swift-devel] awf2 errors
Michael Wilde
wilde at mcs.anl.gov
Sat Oct 27 17:52:08 CDT 2007
On 10/27/07 5:05 PM, Mihael Hategan wrote:
> On Sat, 2007-10-27 at 21:57 +0000, Ben Clifford wrote:
>> On Sat, 27 Oct 2007, Michael Wilde wrote:
>>
>>> One additional unexplained item is that in the run you analyzed with a 4-wide
>>> transfer throttle, I was still getting a lot of I/O errors in the log, which I
>>> dont thing have been explained yet.
>> Can you past one? I don't immediately see them. I see lots of
>> APPLICATION_EXCEPTIONS but with not much detail about the cause.
>
> Whenever a job fails, Swift will attempt to transfer the stdout and
> stderr of that job. There is no guarantee that those files are created
> by the job (i.e. they only get created when at least one character is
> written to them). Hence the transfer of these may fail. It is not an
> error at the Swift level. Again, it's a pattern of the following kind:
>
Thats what it was. There were 12 APPLICATION_EXCEPTION errors out of
1000 jobs, and 48 failures to get stderr. I didnt correlate these
because I had found the I/O errors via grep, and there were 400+ lines
with error/failure strings. And I didnt catch that they all pertained to
stderr.
I'm guessing (but need to check) that those 48 represent retries of some
sort on the 12 failed jobs.
So you're right, Ben, the slow data return rate is more likely due to
throttling or contention.
I think we should try to indicate top-level Swift-detected errors with a
distinct code to separate them from all the lower-level error details
that each incident produces. I realize in practice that this may not be
easy, as the details may get logged before the error propagates up tp
the "top" level. I wonder what the Globus developers have concluded
about error logging strategy. (Can discuss this later on relevant
bugzilla bugs - dont want to sidetrack discussion now).
> try {
> optionalOperationWhichDoesNotHaveToSucceed();
> }
> catch (Exception e) {
> log(e);
> }
>
>
>
More information about the Swift-devel
mailing list