[Swift-devel] angle-1000 second run
Michael Wilde
wilde at mcs.anl.gov
Mon Nov 5 23:56:57 CST 2007
I just ran a second run of angle-1000, this time with clustering.
I thought I had the throttles at default values but missed one.
I killed the run after a few hundred data files were produced because it
was running too slowly and seemed to have reached a steady state.
The logs are in wilde/run154.
Here;s what I noted seemed wrong with this run:
1. only 4 jobs max ran at a time (as seen by qstat over many many spot
checks)
2. only ONE data file came back before I killed the run - yet hundreds
were produced (as seen on the server size). Surely these should have
started trickling in by now?
3. The cluster sizes were extremely small about 4 - should have been
10-20 by my calcs.
4. I still got over a dozen PBS job aborted messages
--
Im going to start another run and let this one go till it finishes.
I'll use totally default throttles and increase my cluster params (but I
dont understand why the current values didnt work).
One more note: this run is using executable script angle4.fast.sh which
has a sleep 3 as its main action. It logs misc stuff to its 2 output
files, but otherwise takes the same args as the real angle4.sh.
Its running out of ~wilde/angle/data on tg-login1.
- Mike
More information about the Swift-devel
mailing list