[Swift-user] Errors in 13-site OSG run: lazy error question
Michael Wilde
wilde at mcs.anl.gov
Thu Aug 26 23:11:55 CDT 2010
Glen, I wonder if whats happening here is that Swift will retry and lazily run past *job* errors, but the error below (a mapping error) is maybe being treated as an error in Swift's interpretation of the script itself, and this causes an immediate halt to execution?
Can anyone confirm that this is whats happening, and if it is the expected behavior?
Also, Glen, 2 questions:
1) Isn't the error below the one that was fixed by Mihael in a recent revision - the same one I looked at earlier in the week?
2) Do you know what errors the "Failed but can retry:8" message is referring to?
Where is the log/run directory for this run? How long did it take to get the 589 jobs finished? It would be good to start plotting these large multi-site runs to get a sense of how the scheduler is doing.
- Mike
----- "Glen Hocky" <hockyg at uchicago.edu> wrote:
> here's the result of my 13 site run that ran while i was out this
> evening. It did pretty well!
> but seems to have that problem of not quite lazy errors
> ........
> Progress: Submitting:3 Submitted:262 Active:147 Checking status:3
> Stage out:1 Finished successfully:586
> Progress: Submitting:3 Submitted:262 Active:144 Checking status:4
> Stage out:2 Finished successfully:587
> Progress: Submitting:3 Submitted:262 Active:142 Stage out:2 Finished
> successfully:587 Failed but can retry:6
> Progress: Submitting:3 Submitted:262 Active:140 Finished
> successfully:589 Failed but can retry:8
> Failed to transfer wrapper log from
> glassRunCavities-20100826-1718-7gi0dzs1/info/5 on
> UCHC_CBG_vdgateway.vcell.uchc.edu
> Execution failed:
> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (..logfile)
> for org.griphyn.vdl.mapping.DataNode identifier
> tag:benc at ci.uchicago.edu
> ,2008:swift:dataset:20100826-1718-sznq1qr2:720000002968 type GlassOut
> with no value at dataset=modelOut path=[3][1][11] (not closed)
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-user
mailing list