I also noticed this behavior recently, so I'm glad this was explained. I have a follow up question<div><br></div><div>You already discussed the maxwalltime parameter as set by the sites file and as set by the tc file.<br>
</div><div><br></div><div>At some point, I added the following to my app() call </div><div><br></div><div><div> profile "maxwalltime"=maxwalltime;</div></div><div><br></div><div>where maxwalltime is an argument passed to app<br>
</div><div><br></div><div>This was supposed to allow individual app calls to have separate (expected) durations.</div><div><br></div><div>Is coasters currently taking this into account or only the values set in sites and tc?</div>
<div><br></div><div>Thanks</div><div>Glen</div><div><br></div><div><br></div><div><br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Dec 5, 2012 at 9:55 PM, Michael Wilde <span dir="ltr"><<a href="mailto:wilde@mcs.anl.gov" target="_blank">wilde@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">To close this issue: I discussed the problem with Lorenzo off-list and realized the confusion was that in 0.93, maxwalltime was just used by the coaster provider to fit app() invocations into coaster worker jobs. At some point after 0.93 though, coasters started enforcing maxwalltime. I think this resolves the question. Lorenzo said off-list<br>
<br>
> Actually, I think that now I understand it. I always assumed that it<br>
> was an "indicative" quantity and not an actual lethal threshold.<br>
> It kind of makes sense both way, but it works as lethal too.<br>
> Maybe I never hit the kill zone before so I never realized it.<br>
<br>
Ah - I see that part of the confusion. Yes, the 513 error and the terminating of coaster jobs that go over their maxwalltime *is* new since 0.93. Its been in trunk for many months I think (since Spring?) but if youve been running 0.93 till now then yes, the "kill" part is a change.<br>
<br>
The problem with the old "advisory" semantics is that when PBS killed the coaster worker, it was much harder for Swift to recover cleanly in all cases. So this 513-kill was instituted to make things both more consistent and more reliable.<br>
<br>
Sorry for missing that part of the change. Ive been running trunk typically, so I forgot that the 513-kill was new since 0.93.<br>
<div class="HOEnZb"><div class="h5"><br>
- Mike<br>
_______________________________________________<br>
Swift-user mailing list<br>
<a href="mailto:Swift-user@ci.uchicago.edu">Swift-user@ci.uchicago.edu</a><br>
<a href="https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user" target="_blank">https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user</a><br>
</div></div></blockquote></div><br></div>