[Swift-devel] [Swift-user] gram on ranger

David Kelly davidk at ci.uchicago.edu
Sat Nov 12 01:53:09 CST 2011


Sarah,

I just submitted a fix that might help. There was an issue with the provider not always correctly detecting when the job was completed. The fix is in the 0.93 source. Can you give it a try and let me know if you still see any issues? Thanks.

David


----- Original Message -----
> From: "Sarah Kenny" <skenny at uchicago.edu>
> To: "Justin M Wozniak" <wozniak at mcs.anl.gov>
> Cc: "David Kelly" <davidk at ci.uchicago.edu>, "Swift Devel" <swift-devel at ci.uchicago.edu>, "Anjali Raja"
> <anjraja at gmail.com>
> Sent: Tuesday, November 8, 2011 4:36:42 PM
> Subject: Re: [Swift-devel] [Swift-user] gram on ranger
> thought i'd revisit this since anjali re-ran this workflow with fewer
> jobs (~85K) and perhaps the info would be useful. it showed a similar
> pattern in that it finished all jobs but one (that is, we were missing
> a single output file) and hung indefinitely on the last 'finished
> successfully...'
> 
> so this discussion seems to have turned mostly to how coasters
> requests cores. however, i have to say that *generally* in the past
> when swift/coasters has requested too many cores for the given queue
> gram complains and you see it in the gram log, which is not the case
> here.
> 
> that said, if you want em: the swift log is in /home/skenny/swift_logs
> on ci and the coaster log was too big for my home on ci (and has since
> been appended to so make sure to match the dates with the swift log),
> but if someone has access to ranger it's in /var/tmp/skenny_swift on
> login3
> 
> we're continuing to use the same swift version and sites file since
> it's at least helping us push thru much of the work (doing manual
> resumes/restarts).
> 
> ~sk
> 
> 
> On Fri, Oct 28, 2011 at 11:02 AM, Justin M Wozniak <
> wozniak at mcs.anl.gov > wrote:
> 
> 
> 
> I think count is the number of processes. PBSExecutor uses it, that
> may
> be a good place to look. In the Coasters context, I think it is the
> number of invocations of worker.pl .
> 
> 
> 
> 
> On Fri, 28 Oct 2011, David Kelly wrote:
> 
> > Just to clarify - when coasters is being used, count represents the
> > number of coaster blocks? Then to get the number of cores to
> > request, I
> > should use count*workersPerNode?
> >
> > What about in the case where coasters is not used?
> >
> > ----- Original Message -----
> >> From: "Mihael Hategan" < hategan at mcs.anl.gov >
> >> To: "David Kelly" < davidk at ci.uchicago.edu >
> >> Cc: "Anjali Raja" < anjraja at gmail.com >, "Swift Devel" <
> >> swift-devel at ci.uchicago.edu >, "Swift User"
> >> < swift-user at ci.uchicago.edu >, "Ketan Maheshwari" <
> >> ketancmaheshwari at gmail.com >
> >> Sent: Thursday, October 20, 2011 9:08:46 PM
> >> Subject: Re: [Swift-devel] [Swift-user] gram on ranger
> >> On Thu, 2011-10-20 at 21:03 -0500, David Kelly wrote:
> >>> Yep, this is using coasters
> >>>
> >>
> >> Then no. Count is whatever the block allocation algorithm decides
> >> it
> >> should be.
> >>
> >>>>>
> >>>>> Should count=32 in the second case? Am I misunderstanding what
> >>>>> 'count' is? Is there any way to get the exact number of
> >>>>> applications?
> >>>>
> >>>> Coasters?
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> 
> --
> Justin M Wozniak
> 
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> 
> 
> --
> Sarah Kenny
> Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
> University of California Irvine, Dept. of Neurology ~ 773-818-8300



More information about the Swift-devel mailing list