[Swift-user] Problems with PBS on Beagle for swift.0.94-2012.1102

Michael Wilde wilde at mcs.anl.gov
Wed Dec 5 20:55:36 CST 2012


To close this issue: I discussed the problem with Lorenzo off-list and realized the confusion was that in 0.93, maxwalltime was just used by the coaster provider to fit app() invocations into coaster worker jobs. At some point after 0.93 though, coasters started enforcing maxwalltime. I think this resolves the question. Lorenzo said off-list

> Actually, I think that now I understand it. I always assumed that it
> was an "indicative" quantity and not an actual lethal threshold.
> It kind of makes sense both way, but it works as lethal too.
> Maybe I never hit the kill zone before so I never realized it.

Ah - I see that part of the confusion. Yes, the 513 error and the terminating of coaster jobs that go over their maxwalltime *is* new since 0.93.  Its been in trunk for many months I think (since Spring?) but if youve been running 0.93 till now then yes, the "kill" part is a change.

The problem with the old "advisory" semantics is that when PBS killed the coaster worker, it was much harder for Swift to recover cleanly in all cases. So this 513-kill was instituted to make things both more consistent and more reliable.

Sorry for missing that part of the change. Ive been running trunk typically, so I forgot that the 513-kill was new since 0.93.

- Mike



More information about the Swift-user mailing list