[Swift-devel] awf2 errors

Michael Wilde wilde at mcs.anl.gov
Sat Oct 27 17:52:08 CDT 2007

On 10/27/07 5:05 PM, Mihael Hategan wrote:
> On Sat, 2007-10-27 at 21:57 +0000, Ben Clifford wrote:
>> On Sat, 27 Oct 2007, Michael Wilde wrote:
>>> One additional unexplained item is that in the run you analyzed with a 4-wide
>>> transfer throttle, I was still getting a lot of I/O errors in the log, which I
>>> dont thing have been explained yet.
>> Can you past one? I don't immediately see them. I see lots of 
>> APPLICATION_EXCEPTIONS but with not much detail about the cause.
> Whenever a job fails, Swift will attempt to transfer the stdout and
> stderr of that job. There is no guarantee that those files are created
> by the job (i.e. they only get created when at least one character is
> written to them). Hence the transfer of these may fail. It is not an
> error at the Swift level. Again, it's a pattern of the following kind:

Thats what it was.  There were 12 APPLICATION_EXCEPTION errors out of 
1000 jobs, and 48 failures to get stderr. I didnt correlate these 
because I had found the I/O errors via grep, and there were 400+ lines 
with error/failure strings. And I didnt catch that they all pertained to 

I'm guessing (but need to check) that those 48 represent retries of some 
sort on the 12 failed jobs.

So you're right, Ben, the slow data return rate is more likely due to 
throttling or contention.

I think we should try to indicate top-level Swift-detected errors with a 
distinct code to separate them from all the lower-level error details 
that each incident produces. I realize in practice that this may not be 
easy, as the details may get logged before the error propagates up tp 
the "top" level. I wonder what the Globus developers have concluded 
about error logging strategy. (Can discuss this later on relevant 
bugzilla bugs - dont want to sidetrack discussion now).

> try {
>   optionalOperationWhichDoesNotHaveToSucceed();
> }
> catch (Exception e) {
>   log(e);
> }

More information about the Swift-devel mailing list