[Swift-user] Problems with PBS on Beagle for swift.0.94-2012.1102

Glen Hocky hockyg at gmail.com
Wed Dec 5 21:23:12 CST 2012


I also noticed this behavior recently, so I'm glad this was explained. I
have a follow up question

You already discussed the maxwalltime parameter as set by the sites file
and as set by the tc file.

At some point, I added the following to my app() call

    profile "maxwalltime"=maxwalltime;

where maxwalltime is an argument passed to app

This was supposed to allow individual app calls to have separate
(expected) durations.

Is coasters currently taking this into account or only the values set in
sites and tc?

Thanks
Glen





On Wed, Dec 5, 2012 at 9:55 PM, Michael Wilde <wilde at mcs.anl.gov> wrote:

> To close this issue: I discussed the problem with Lorenzo off-list and
> realized the confusion was that in 0.93, maxwalltime was just used by the
> coaster provider to fit app() invocations into coaster worker jobs. At some
> point after 0.93 though, coasters started enforcing maxwalltime. I think
> this resolves the question. Lorenzo said off-list
>
> > Actually, I think that now I understand it. I always assumed that it
> > was an "indicative" quantity and not an actual lethal threshold.
> > It kind of makes sense both way, but it works as lethal too.
> > Maybe I never hit the kill zone before so I never realized it.
>
> Ah - I see that part of the confusion. Yes, the 513 error and the
> terminating of coaster jobs that go over their maxwalltime *is* new since
> 0.93.  Its been in trunk for many months I think (since Spring?) but if
> youve been running 0.93 till now then yes, the "kill" part is a change.
>
> The problem with the old "advisory" semantics is that when PBS killed the
> coaster worker, it was much harder for Swift to recover cleanly in all
> cases. So this 513-kill was instituted to make things both more consistent
> and more reliable.
>
> Sorry for missing that part of the change. Ive been running trunk
> typically, so I forgot that the 513-kill was new since 0.93.
>
> - Mike
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20121205/4f392259/attachment.html>


More information about the Swift-user mailing list