[Swift-devel] execution.retries

lixi at uchicago.edu lixi at uchicago.edu
Tue Jun 10 11:20:39 CDT 2008


Hi,

Just now I ran a workflow, site "OSG_LIGO_MIT" had a GridFTP 
error for a little while exactly after it was selected for 
Swift job. So the message "Could not initialize shared 
directory on OSG_LIGO_MIT" was issued and the whole workflow 
exited. The log file is on 
CI: /home/lixi/newswift/latest/score/3500/workflowtest-
20080610-1045-58kc7p6f.log

After investigating the log file, I found that this failed 
job produced a execute event with id of 0-1-307. When it was 
staging files, a temp GridFTP error on OSG_LIGO_MIT just 
happened, so 0-1-307 didn't result in any execute2 event. 
Finally, the whole workflow failed. My understanding is that 
in Swift, the execution.retries just mean the retrying times 
for execute2 events, is that right?  Then currently how to 
avoid or handle this kind of error? Is there is a way to do 
with it in Swift now?

Thanks, 

Xi



More information about the Swift-devel mailing list