[Swift-user] Re: Errors in 13-site OSG run: lazy error question

Glen Hocky hockyg at uchicago.edu
Thu Aug 26 23:54:22 CDT 2010


log is on engage-submit
/home/hockyg/swift_logs/glassRunCavities-20100826-1718-7gi0dzs1.log

On Fri, Aug 27, 2010 at 12:35 AM, Glen Hocky <hockyg at gmail.com> wrote:

> Yes nominally the same error but it's not at the beginning but in the
> middle now for some reason. I think it's a mid-stated error message.
> I'll attach the log soon
>
> On Aug 27, 2010, at 12:11 AM, Michael Wilde <wilde at mcs.anl.gov> wrote:
>
> > Glen, I wonder if whats happening here is that Swift will retry and
> lazily run past *job* errors, but the error below (a mapping error) is maybe
> being treated as an error in Swift's interpretation of the script itself,
> and this causes an immediate halt to execution?
> >
> > Can anyone confirm that this is whats happening, and if it is the
> expected behavior?
> >
> > Also, Glen, 2 questions:
> >
> > 1) Isn't the error below the one that was fixed by Mihael in a recent
> revision - the same one I looked at earlier in the week?
> >
> > 2) Do you know what errors the "Failed but can retry:8" message is
> referring to?
> >
> > Where is the log/run directory for this run?  How long did it take to get
> the 589 jobs finished?  It would be good to start plotting these large
> multi-site runs to get a sense of how the scheduler is doing.
> >
> > - Mike
> >
> >
> > ----- "Glen Hocky" <hockyg at uchicago.edu> wrote:
> >
> >> here's the result of my 13 site run that ran while i was out this
> >> evening. It did pretty well!
> >> but seems to have that problem of not quite lazy errors
> >> ........
> >> Progress: Submitting:3 Submitted:262 Active:147 Checking status:3
> >> Stage out:1 Finished successfully:586
> >> Progress: Submitting:3 Submitted:262 Active:144 Checking status:4
> >> Stage out:2 Finished successfully:587
> >> Progress: Submitting:3 Submitted:262 Active:142 Stage out:2 Finished
> >> successfully:587 Failed but can retry:6
> >> Progress: Submitting:3 Submitted:262 Active:140 Finished
> >> successfully:589 Failed but can retry:8
> >> Failed to transfer wrapper log from
> >> glassRunCavities-20100826-1718-7gi0dzs1/info/5 on
> >> UCHC_CBG_vdgateway.vcell.uchc.edu
> >> Execution failed:
> >> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (..logfile)
> >> for org.griphyn.vdl.mapping.DataNode identifier
> >> tag:benc at ci.uchicago.edu <tag%3Abenc at ci.uchicago.edu>
> >> ,2008:swift:dataset:20100826-1718-sznq1qr2:720000002968 type GlassOut
> >> with no value at dataset=modelOut path=[3][1][11] (not closed)
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20100827/ba9b1fa3/attachment.html>


More information about the Swift-user mailing list