[Swift-devel] Problem with coaster workers shutting down early
wilde at mcs.anl.gov
wilde at mcs.anl.gov
Tue Mar 2 18:31:13 CST 2010
Mihael, I dont yet have all the evidence for this issue collected nice and clean, but I want to send you what I have to start looking at this.
Ive been trying to recreate a problem that Zhao is encountering where he's trying to run >15,000 short (~ 1-second) jobs on PADS under coasters.
Basically, the worker jobs seem to be exiting for no reason that I can discern.
Ive re-created something that looks similar using this:
cd ~wilde/swift/lab
swift -tc.file tc -sites.file pbscoast.xml cats.swift
Log is /home/wilde/swift/lab/cats-20100302-1751-8qy7m21c.log
Coaster worker logs are in ~wilde/globus.coasters
Seems to work OK when I request 1 node blocks
With 2-node blocks, the workers seem to shutdown for no apparent reason, after about 2 seconds.
...more details later when I get a chance.
- Mike
More information about the Swift-devel
mailing list