[Swift-devel] angle-1000 second run

Michael Wilde wilde at mcs.anl.gov
Mon Nov 5 23:56:57 CST 2007


I just ran a second run of angle-1000, this time with clustering.
I thought I had the throttles at default values but missed one.

I killed the run after a few hundred data files were produced because it 
was running too slowly and seemed to have reached a steady state.

The logs are in wilde/run154.

Here;s what I noted seemed wrong with this run:

1. only 4 jobs max ran at a time (as seen by qstat over many many spot 
checks)

2. only ONE data file came back before I killed the run - yet hundreds 
were produced (as seen on the server size). Surely these should have 
started trickling in by now?

3. The cluster sizes were extremely small about 4 - should have been 
10-20 by my calcs.

4. I still got over a dozen PBS job aborted messages

--

Im going to start another run and let this one go till it finishes.

I'll use totally default throttles and increase my cluster params (but I 
dont understand why the current values didnt work).

One more note: this run is using executable script angle4.fast.sh which 
has a sleep 3 as its main action. It logs misc stuff to its 2 output 
files, but otherwise takes the same args as the real angle4.sh.

Its running out of ~wilde/angle/data on tg-login1.

- Mike






More information about the Swift-devel mailing list