[Swift-user] Data transfer error
Mihael Hategan
hategan at mcs.anl.gov
Fri May 23 15:23:13 CDT 2014
On Fri, 2014-05-23 at 19:32 +0000, Bronevetsky, Greg wrote:
> I've now had a little more experience with this and have gotten a
> partial workaround. Whatever the underlying cause, it seems to happen
> a lot less when I disable my mechanisms to avoid re-executing tasks
> that I've already completed. Right now my guess for the root cause is
> that I'm hitting the Lustre meta-data servers too hard and they're
> throwing back occasional errors.
That sounds plausible.
> Specifically, I just got yelled at by our admins about performing
> thousands of file openings per second.
:)
>
> I just did a small run and got some failures. e.g.:
> Progress: time: Fri, 23 May 2014 12:25:54 -0700 Selecting site:2723 Submitted:216 Active:119 Stage out:16 Finished successfully:58 Failed but can retry:144
>
> However, when I looked at the log files generated when I set
> workerLoggingLevel to DEBUG as well as the stdout and stderr of the
> SLURM scripts I didn't find any failures or errors. What should I be
> looking for?
Those are probably swift-level errors, and the details would be in the
swift log (or on stdout once the run finished).
Mihael
More information about the Swift-user
mailing list