[Swift-devel] runaway workers on teraport coaster test of

Michael Wilde wilde at mcs.anl.gov
Sun Feb 8 23:36:31 CST 2009


Im testing coasters with 
http://www.ci.uchicago.edu/~benc/tmp/coaster-head-elsewhere

This worked for me once Fri at noon but not since.

I put a .soft entry for java into the ~osg .soft file, to deal with 
issues discussed off-list.

I made a few smll changes in bootstrap.sh from that patched version - 
some for logging, and one to make the X509_CERT_DIR variable conditional 
on whether that directory exists.

The coaster service now starts, but it went into a loop spawning 
short-lived workers, and not getting anything done.

I saw dozens of workers start, with about 10-20 or so running at a time.

These logs and all files related to the run are in 
~wilde/oops/oopstest/runaway-workers.

In coasters.log I see 50+ messages "WorkerManager No suitable worker 
found. Attempting to start a new one."

Ayy thoughts on this?  TeraPort has been too saturated this evening to 
test any further, but it would be good to have some sense of whats 
causing this.







More information about the Swift-devel mailing list