[Swift-devel] Re: [Swift-user] Resending: I/O errors in swift script

Mihael Hategan hategan at mcs.anl.gov
Thu Aug 30 14:13:55 CDT 2007


On Thu, 2007-08-30 at 14:03 -0500, Mihael Hategan wrote:
> On Thu, 2007-08-30 at 13:23 -0500, Michael Wilde wrote:
> > 
> > I went back to the log/out-err files and I think I see where I was 
> > confused: the indication of nonzero exit codes comes out much later in 
> > the log; it seems like the earlier jobs failed on output file retreival 
> > long before there was any indication of a non-zero job exitcode.
> 
> The exit code is checked first. So exit code errors and missing file
> errors for a given job are mutually exclusive. Normally these are only
> reported at the end of the workflow. Anyway, I'll try to put in the
> stamp file, to distinguish between application failures and filesystem
> failures.

Makes me think though. Could this be a sfs synchronization problem? I
know Globus waits for a similar stamp file from a job to be visible on
the head node. But does that guarantee that all files produced by the
job will be visible and their contents up to date? Can we assume that
individual items in the set of things that can be observed from a sfs on
a node have the same ordering as their individual causes on another
node?

> 





More information about the Swift-devel mailing list