[Swift-user] consultation about error messages, coaster usage

Glen Hocky hockyg at uchicago.edu
Mon Apr 6 21:42:08 CDT 2009

Hi Guys,
I just ran (and killed) too big runs w/ swift, one on ranger, one on 
abe. I stopped them because in each case there were many "Failed but can 
retry" jobs, several "Failed to transfer wrapper log" errors and at the 
point where i stopped them, many more cpu's allocated than "Active" 
jobs. E.g. on ranger there were 14 running jobs in the queue w/ over an 
hour left (so 224 cpus) but only 76 "Active" jobs.

Could someone take a look at the logs and tell me if things are working 
properly? It's a little hard to tell from a user end...
On a ci home machine,
All run related files for abe are in
> /home/hockyg/oops/swift/output/abeoutdir.5/
and for ranger

> /home/hockyg/oops/swift/output/rangeroutdir.5/
In those directories, there will be a file $site.out.5 which has the stdout
and xout.XXXXX which has a log of all the commands run including the 
swift invocation
the tc.data file used is $site.data and the sites.xml file is $site.xml


