[Swift-user] Re: Errors in 13-site OSG run: lazy error question

Michael Wilde wilde at mcs.anl.gov
Fri Aug 27 10:06:03 CDT 2010


Glen, as I recall, in the previous incident of this error we re-created with a simpler script, using only the "cat" app(), correct?

Is it possible to re-create this similar error in a similar test script?

Mihael, any thoughts on whether its likely that the prior fix did not address all cases?

Thanks,

- Mike


----- "Glen Hocky" <hockyg at gmail.com> wrote:

> Yes nominally the same error but it's not at the beginning but in the
> middle now for some reason. I think it's a mid-stated error message.
> I'll attach the log soon
> 
> On Aug 27, 2010, at 12:11 AM, Michael Wilde <wilde at mcs.anl.gov>
> wrote:
> 
> > Glen, I wonder if whats happening here is that Swift will retry and
> lazily run past *job* errors, but the error below (a mapping error) is
> maybe being treated as an error in Swift's interpretation of the
> script itself, and this causes an immediate halt to execution?
> >
> > Can anyone confirm that this is whats happening, and if it is the
> expected behavior?
> >
> > Also, Glen, 2 questions:
> >
> > 1) Isn't the error below the one that was fixed by Mihael in a
> recent revision - the same one I looked at earlier in the week?
> >
> > 2) Do you know what errors the "Failed but can retry:8" message is
> referring to?
> >
> > Where is the log/run directory for this run?  How long did it take
> to get the 589 jobs finished?  It would be good to start plotting
> these large multi-site runs to get a sense of how the scheduler is
> doing.
> >
> > - Mike
> >
> >
> > ----- "Glen Hocky" <hockyg at uchicago.edu> wrote:
> >
> >> here's the result of my 13 site run that ran while i was out this
> >> evening. It did pretty well!
> >> but seems to have that problem of not quite lazy errors
> >> ........
> >> Progress: Submitting:3 Submitted:262 Active:147 Checking status:3
> >> Stage out:1 Finished successfully:586
> >> Progress: Submitting:3 Submitted:262 Active:144 Checking status:4
> >> Stage out:2 Finished successfully:587
> >> Progress: Submitting:3 Submitted:262 Active:142 Stage out:2
> Finished
> >> successfully:587 Failed but can retry:6
> >> Progress: Submitting:3 Submitted:262 Active:140 Finished
> >> successfully:589 Failed but can retry:8
> >> Failed to transfer wrapper log from
> >> glassRunCavities-20100826-1718-7gi0dzs1/info/5 on
> >> UCHC_CBG_vdgateway.vcell.uchc.edu
> >> Execution failed:
> >> org.griphyn.vdl.mapping.InvalidPathException: Invalid path
> (..logfile)
> >> for org.griphyn.vdl.mapping.DataNode identifier
> >> tag:benc at ci.uchicago.edu
> >> ,2008:swift:dataset:20100826-1718-sznq1qr2:720000002968 type
> GlassOut
> >> with no value at dataset=modelOut path=[3][1][11] (not closed)
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list