[Swift-devel] coaster status report

Michael Wilde wilde at mcs.anl.gov
Sat Apr 4 16:59:44 CDT 2009


With OOPS Glen was able to get some promising runs queued on Ranger, 
using the default properties and the sites setting from the SEM runs.

Looking great so far, and above all was very easy to get it going.

Thats very exciting!

One run shows a few (3 out of 100 or so) failures that were retried 
successfully. We need to trak these down, and see if it was a transient 
app failure or something in swift etc.

Then we turned to Abe and Queenbee. That was amazingly easy to configure 
and get running. Glen is scaling it up as we speak, trying for 2 sites x 
40 jobs x 8 cores = 640 cores tween the two.

In initial small tests, though - 50 parallel app() calls - its sending 
all jobs to abe, none to queenbee. We checked the usual sites, tc 
things, *seems* ok there. Possibly either a bg or a scheduler anomaly?
We'll try with more jobs, and see; will send logs and sites etc files if 
that anomaly persists at larger scales.

Seems like both these sites have WS-GRAM enabled; we'd like to try that 
as well, to expand beyond the 40-job per site suggested limit. Would 
like to get 1000 cores active on this problem. 2 x 60 x 8 or so.

Then will add in a few more fruitful TG sites.

Towards this end, Mihael, if you have the urge to probe at a 
setting/config that lets us start coasters in 4-8 node batches, this 
would be a great time to try that. I suspect you dont know yet if that 
will be easy, hard, or in between?

Another note on coaster boot:

- old problems on Abe with funky limitations on non-login shells seems 
to have gone away, either from the latest coaster strategy (-l issues?) 
or from Abe changes.

- on queenbee, initial run got this error:

	Could not start coaster service
Caused by:
	Task ended before registration was received.
STDOUT: Warning: -jar not understood. Ignoring.
Exception in thread "main" java.lang.NoClassDefFoundError: 
.tmp.bootstrap.y10420
    at gnu.gcj.runtime.FirstThread.run() (/usr/lib64/libgcj.so.5.0.0)

Turns out default java was 1.4.2 something.

We added @default to .soft to get Java 1.6.
Then coasters bootstrapped fine. This was nice to see, that a simple 
workaround was easy!

At any rate, very productive, very promising, very pleasing to use.

Nice work!

- Mike










More information about the Swift-devel mailing list