[Swift-user] Problems with PBS on Beagle for swift.0.94-2012.1102

Michael Wilde wilde at mcs.anl.gov
Wed Dec 5 21:43:11 CST 2012


Glen,

Swift *should* be honoring profile parameters passed on the app() declaration as if they were passed on the tc or sites entries.  Profile entries passed on the app() call should override those passed on the tc, which override the site entries.

If any of that is not working, I would consider it a bug.

- Mike


----- Original Message -----
> From: "Glen Hocky" <hockyg at gmail.com>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "Lorenzo Pesce" <lpesce at uchicago.edu>, "Jason J. Pitt" <pittjj at uchicago.edu>, "Swift User Discussion List"
> <swift-user at ci.uchicago.edu>
> Sent: Wednesday, December 5, 2012 9:23:12 PM
> Subject: Re: [Swift-user] Problems with PBS on Beagle for swift.0.94-2012.1102
> I also noticed this behavior recently, so I'm glad this was explained.
> I have a follow up question
> 
> 
> You already discussed the maxwalltime parameter as set by the sites
> file and as set by the tc file.
> 
> 
> 
> At some point, I added the following to my app() call
> 
> 
> 
> profile "maxwalltime"=maxwalltime;
> 
> 
> where maxwalltime is an argument passed to app
> 
> 
> 
> This was supposed to allow individual app calls to have separate
> (expected) durations.
> 
> 
> Is coasters currently taking this into account or only the values set
> in sites and tc?
> 
> 
> Thanks
> Glen
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Dec 5, 2012 at 9:55 PM, Michael Wilde < wilde at mcs.anl.gov >
> wrote:
> 
> 
> To close this issue: I discussed the problem with Lorenzo off-list and
> realized the confusion was that in 0.93, maxwalltime was just used by
> the coaster provider to fit app() invocations into coaster worker
> jobs. At some point after 0.93 though, coasters started enforcing
> maxwalltime. I think this resolves the question. Lorenzo said off-list
> 
> > Actually, I think that now I understand it. I always assumed that it
> > was an "indicative" quantity and not an actual lethal threshold.
> > It kind of makes sense both way, but it works as lethal too.
> > Maybe I never hit the kill zone before so I never realized it.
> 
> Ah - I see that part of the confusion. Yes, the 513 error and the
> terminating of coaster jobs that go over their maxwalltime *is* new
> since 0.93. Its been in trunk for many months I think (since Spring?)
> but if youve been running 0.93 till now then yes, the "kill" part is a
> change.
> 
> The problem with the old "advisory" semantics is that when PBS killed
> the coaster worker, it was much harder for Swift to recover cleanly in
> all cases. So this 513-kill was instituted to make things both more
> consistent and more reliable.
> 
> Sorry for missing that part of the change. Ive been running trunk
> typically, so I forgot that the 513-kill was new since 0.93.
> 
> 
> 
> - Mike
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list