[Swift-devel] sites.xml for ranger sge coasters

Glen Hocky hockyg at uchicago.edu
Thu Mar 8 09:34:20 CST 2012


I noticed a problem like this when I was working with Ketan in July
(checked my email, it was 7/13+/- 1 day) using his scripts for starting
blocks of persistent coasters on ranger. Perhaps he has records of that or
can try to reproduce the same thing we saw then. (I probably have the
scripts we were using somewhere as well).

On Thu, Mar 8, 2012 at 10:21 AM, Michael Wilde <wilde at mcs.anl.gov> wrote:

> David, thanks for addressing this problem.  Does it affect any of the
> other local providers: pbs, condor, (sge), cobalt?
>
> (I need to do some cobalt runs on Eureka for a user today, so I hope that
> provider is OK).
>
> You should describe the issue and fix on swift-devel.
>
> We should start a convention where we can document known issues for
> releases, so that users dont have to discover these bugs on their own.  Can
> you make an action item to propose and start such a place (probably
> crosslinked to both Downloads and Documentation). Not urgent for today, but
> next week would be good.
>
> Thanks,
>
> - Mike
>
>
>
> ----- Original Message -----
> > From: "David Kelly" <davidk at ci.uchicago.edu>
> > To: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > Cc: "Michael Wilde" <wilde at mcs.anl.gov>
> > Sent: Thursday, March 8, 2012 8:45:00 AM
> > Subject: Re: sites.xml for ranger sge coasters
> > 0.93 is frozen, but I committed the same change to 0.93.1 this
> > morning.
> >
> > ----- Original Message -----
> > > From: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > Cc: "Michael Wilde" <wilde at mcs.anl.gov>
> > > Sent: Wednesday, March 7, 2012 6:39:09 PM
> > > Subject: Re: sites.xml for ranger sge coasters
> > > is it committed in 0.93 too?
> > >
> > >
> > > On Wed, Mar 7, 2012 at 6:10 PM, David Kelly < davidk at ci.uchicago.edu
> > > >
> > > wrote:
> > >
> > >
> > > I submitted a fix to trunk for the SGE provider. The submit script
> > > was
> > > wrong - it started one worker per core, rather than one worker per
> > > host. (Oddly it's been like that for years without anybody
> > > noticing).
> > > I ran a few sleep/hostname tests and it seems to be working. Can you
> > > please give it a try?
> > >
> > > Below is the sites.xml I used for my test:
> > >
> > > <config>
> > > <pool handle="ranger">
> > > <execution jobmanager="local:sge" provider="coaster" url="none"/>
> > >
> > > <filesystem provider="local" url="none" />
> > > <profile namespace="globus" key="maxWallTime">5</profile>
> > > <profile namespace="globus" key="maxTime">600</profile>
> > > <profile key="jobsPerNode" namespace="globus">16</profile>
> > > <profile key="slots" namespace="globus">1</profile>
> > > <profile key="nodeGranularity" namespace="globus">3</profile>
> > > <profile key="pe" namespace="globus">16way</profile>
> > > <profile key="maxNodes" namespace="globus">3</profile>
> > > <profile key="queue" namespace="globus">development</profile>
> > > <profile key="jobThrottle" namespace="karajan">0.4799</profile>
> > > <profile key="initialScore" namespace="karajan">10000</profile>
> > > <profile namespace="globus" key="project">TG-DBS080004N</profile>
> > > <workdirectory>/share/home/01503/davidkel/swiftwork</workdirectory>
> > > </pool>
> > > </config>
> > >
> > > Thanks,
> > > David
> > >
> > >
> > >
> > >
> > > --
> > > Ketan
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120308/73f9eec8/attachment.html>


More information about the Swift-devel mailing list