[Swift-devel] Problem with coaster workers shutting down early

wilde at mcs.anl.gov wilde at mcs.anl.gov
Tue Mar 2 18:31:13 CST 2010


Mihael, I dont yet have all the evidence for this issue collected nice and clean, but I want to send you what I have to start looking at this.

Ive been trying to recreate a problem that Zhao is encountering where he's trying to run >15,000 short (~ 1-second) jobs on PADS under coasters.

Basically, the worker jobs seem to be exiting for no reason that I can discern.

Ive re-created something that looks similar using this:

cd ~wilde/swift/lab
swift -tc.file tc -sites.file pbscoast.xml cats.swift

Log is /home/wilde/swift/lab/cats-20100302-1751-8qy7m21c.log

Coaster worker logs are in ~wilde/globus.coasters

Seems to work OK when I request 1 node blocks
With 2-node blocks, the workers seem to shutdown for no apparent reason, after about 2 seconds.

...more details later when I get a chance.

- Mike




More information about the Swift-devel mailing list