[Swift-user] Re: Errors in 13-site OSG run: lazy error question

Glen Hocky hockyg at gmail.com
Thu Aug 26 23:36:23 CDT 2010


Yes nominally the same error but it's not at the beginning but in the
middle now for some reason. I think it's a mid-stated error message.
I'll attach the log soon

On Aug 27, 2010, at 12:11 AM, Michael Wilde <wilde at mcs.anl.gov> wrote:

> Glen, I wonder if whats happening here is that Swift will retry and lazily run past *job* errors, but the error below (a mapping error) is maybe being treated as an error in Swift's interpretation of the script itself, and this causes an immediate halt to execution?
>
> Can anyone confirm that this is whats happening, and if it is the expected behavior?
>
> Also, Glen, 2 questions:
>
> 1) Isn't the error below the one that was fixed by Mihael in a recent revision - the same one I looked at earlier in the week?
>
> 2) Do you know what errors the "Failed but can retry:8" message is referring to?
>
> Where is the log/run directory for this run?  How long did it take to get the 589 jobs finished?  It would be good to start plotting these large multi-site runs to get a sense of how the scheduler is doing.
>
> - Mike
>
>
> ----- "Glen Hocky" <hockyg at uchicago.edu> wrote:
>
>> here's the result of my 13 site run that ran while i was out this
>> evening. It did pretty well!
>> but seems to have that problem of not quite lazy errors
>> ........
>> Progress: Submitting:3 Submitted:262 Active:147 Checking status:3
>> Stage out:1 Finished successfully:586
>> Progress: Submitting:3 Submitted:262 Active:144 Checking status:4
>> Stage out:2 Finished successfully:587
>> Progress: Submitting:3 Submitted:262 Active:142 Stage out:2 Finished
>> successfully:587 Failed but can retry:6
>> Progress: Submitting:3 Submitted:262 Active:140 Finished
>> successfully:589 Failed but can retry:8
>> Failed to transfer wrapper log from
>> glassRunCavities-20100826-1718-7gi0dzs1/info/5 on
>> UCHC_CBG_vdgateway.vcell.uchc.edu
>> Execution failed:
>> org.griphyn.vdl.mapping.InvalidPathException: Invalid path (..logfile)
>> for org.griphyn.vdl.mapping.DataNode identifier
>> tag:benc at ci.uchicago.edu
>> ,2008:swift:dataset:20100826-1718-sznq1qr2:720000002968 type GlassOut
>> with no value at dataset=modelOut path=[3][1][11] (not closed)
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>



More information about the Swift-user mailing list