[Swift-devel] [Swift-user] gram on ranger

Sarah Kenny skenny at uchicago.edu
Mon Nov 21 18:01:22 CST 2011


hi david, she reran and apparently got the same error. the log file is in
/home/skenny/swift_logs/corr_multisubj-20111116-1131-dqy537b3.log

~sk

On Fri, Nov 11, 2011 at 11:53 PM, David Kelly <davidk at ci.uchicago.edu>wrote:

> Sarah,
>
> I just submitted a fix that might help. There was an issue with the
> provider not always correctly detecting when the job was completed. The fix
> is in the 0.93 source. Can you give it a try and let me know if you still
> see any issues? Thanks.
>
> David
>
>
> ----- Original Message -----
> > From: "Sarah Kenny" <skenny at uchicago.edu>
> > To: "Justin M Wozniak" <wozniak at mcs.anl.gov>
> > Cc: "David Kelly" <davidk at ci.uchicago.edu>, "Swift Devel" <
> swift-devel at ci.uchicago.edu>, "Anjali Raja"
> > <anjraja at gmail.com>
> > Sent: Tuesday, November 8, 2011 4:36:42 PM
> > Subject: Re: [Swift-devel] [Swift-user] gram on ranger
> > thought i'd revisit this since anjali re-ran this workflow with fewer
> > jobs (~85K) and perhaps the info would be useful. it showed a similar
> > pattern in that it finished all jobs but one (that is, we were missing
> > a single output file) and hung indefinitely on the last 'finished
> > successfully...'
> >
> > so this discussion seems to have turned mostly to how coasters
> > requests cores. however, i have to say that *generally* in the past
> > when swift/coasters has requested too many cores for the given queue
> > gram complains and you see it in the gram log, which is not the case
> > here.
> >
> > that said, if you want em: the swift log is in /home/skenny/swift_logs
> > on ci and the coaster log was too big for my home on ci (and has since
> > been appended to so make sure to match the dates with the swift log),
> > but if someone has access to ranger it's in /var/tmp/skenny_swift on
> > login3
> >
> > we're continuing to use the same swift version and sites file since
> > it's at least helping us push thru much of the work (doing manual
> > resumes/restarts).
> >
> > ~sk
> >
> >
> > On Fri, Oct 28, 2011 at 11:02 AM, Justin M Wozniak <
> > wozniak at mcs.anl.gov > wrote:
> >
> >
> >
> > I think count is the number of processes. PBSExecutor uses it, that
> > may
> > be a good place to look. In the Coasters context, I think it is the
> > number of invocations of worker.pl .
> >
> >
> >
> >
> > On Fri, 28 Oct 2011, David Kelly wrote:
> >
> > > Just to clarify - when coasters is being used, count represents the
> > > number of coaster blocks? Then to get the number of cores to
> > > request, I
> > > should use count*workersPerNode?
> > >
> > > What about in the case where coasters is not used?
> > >
> > > ----- Original Message -----
> > >> From: "Mihael Hategan" < hategan at mcs.anl.gov >
> > >> To: "David Kelly" < davidk at ci.uchicago.edu >
> > >> Cc: "Anjali Raja" < anjraja at gmail.com >, "Swift Devel" <
> > >> swift-devel at ci.uchicago.edu >, "Swift User"
> > >> < swift-user at ci.uchicago.edu >, "Ketan Maheshwari" <
> > >> ketancmaheshwari at gmail.com >
> > >> Sent: Thursday, October 20, 2011 9:08:46 PM
> > >> Subject: Re: [Swift-devel] [Swift-user] gram on ranger
> > >> On Thu, 2011-10-20 at 21:03 -0500, David Kelly wrote:
> > >>> Yep, this is using coasters
> > >>>
> > >>
> > >> Then no. Count is whatever the block allocation algorithm decides
> > >> it
> > >> should be.
> > >>
> > >>>>>
> > >>>>> Should count=32 in the second case? Am I misunderstanding what
> > >>>>> 'count' is? Is there any way to get the exact number of
> > >>>>> applications?
> > >>>>
> > >>>> Coasters?
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >
> >
> > --
> > Justin M Wozniak
> >
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> >
> >
> > --
> > Sarah Kenny
> > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
> > University of California Irvine, Dept. of Neurology ~ 773-818-8300
>



-- 
Sarah Kenny
Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
University of California Irvine, Dept. of Neurology ~ 773-818-8300
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20111121/ca7eb407/attachment.html>


More information about the Swift-devel mailing list