[Swift-user] consultation about error messages, coaster usage
hockyg at uchicago.edu
Mon Apr 6 21:42:08 CDT 2009
I just ran (and killed) too big runs w/ swift, one on ranger, one on
abe. I stopped them because in each case there were many "Failed but can
retry" jobs, several "Failed to transfer wrapper log" errors and at the
point where i stopped them, many more cpu's allocated than "Active"
jobs. E.g. on ranger there were 14 running jobs in the queue w/ over an
hour left (so 224 cpus) but only 76 "Active" jobs.
Could someone take a look at the logs and tell me if things are working
properly? It's a little hard to tell from a user end...
On a ci home machine,
All run related files for abe are in
and for ranger
In those directories, there will be a file $site.out.5 which has the stdout
and xout.XXXXX which has a log of all the commands run including the
the tc.data file used is $site.data and the sites.xml file is $site.xml
More information about the Swift-user