[Swift-devel] [Swift-user] gram on ranger

Sarah Kenny skenny at uchicago.edu
Thu Oct 20 06:07:09 CDT 2011


hi all, one of our users, anjali (cc'd here) is trying to submit this ~400k
job workflow to ranger...thought i'd see if you felt like having a look :)

log is here:
/home/skenny/swift_logs/corr_multisubj-20111018-1321-ihf8hz5g.log

sites file:

<config>
<pool handle="RANGER">
     <execution provider="coaster" jobManager="gt2:SGE" url="
gatekeeper.ranger.tacc.teragrid.org"/>
     <filesystem provider="gsiftp" url="gsiftp://
gridftp.ranger.tacc.teragrid.org"/>
     <profile namespace="globus" key="maxtime">7200</profile>
     <profile namespace="globus" key="maxWallTime">00:20:00</profile>
     <profile namespace="globus" key="jobsPerNode">1</profile>
     <profile namespace="globus" key="nodeGranularity">64</profile>
     <profile namespace="globus" key="maxNodes">256</profile>
     <profile namespace="globus" key="queue">development</profile>
     <profile namespace="karajan" key="jobThrottle">1.28</profile>
     <profile namespace="globus" key="project">TG-DBS080004N</profile>
     <profile namespace="globus" key="pe">16way</profile>
     <profile namespace="karajan" key="initialScore">10000</profile>
     <workdirectory>/work/00926/tg459516/swiftwork</workdirectory>
</pool>
</config>

On Wed, Oct 12, 2011 at 12:13 PM, Mihael Hategan <hategan at mcs.anl.gov>wrote:

> On Tue, 2011-10-11 at 17:13 -0700, Sarah Kenny wrote:
> >
> >
> > On Tue, Oct 11, 2011 at 4:23 PM, Mihael Hategan <hategan at mcs.anl.gov>
> > wrote:
> >         Is this with a persistent coaster service?
> >
> > admittedly i have not used persistent coaster service...should i?
>
> No. I was just trying to figure out whether it might be something
> related to the persistent version.
>
> >  i feel like it's documented *somewhere* (?)
> >
> > for now i've tried setting 'sitedir.keep=true' in the config so maybe
> > it won't try to run the cleanup job...we'll see (waiting in q)
> >
> >
> >
> >         On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote:
> >         >
> >         >
> >         > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly
> >         <davidk at ci.uchicago.edu>
> >         > wrote:
> >         >
> >         >         That could be it.. maybe a cleanup script is not
> >         getting the
> >         >         right parameters and failing. Do you happen to have
> >         a copy of
> >         >         the coaster log?
> >         >
> >         > just put it in /home/skenny/swift_logs
> >         >
> >         >
> >         >         Maybe there will be some clues in there.
> >         >
> >         >         ----- Original Message -----
> >         >         > From: "Sarah Kenny" <skenny at uchicago.edu>
> >         >
> >         >         > To: "David Kelly" <davidk at ci.uchicago.edu>
> >         >         > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>,
> >         "Swift
> >         >         User" <swift-user at ci.uchicago.edu>, "Justin M
> >         Wozniak"
> >         >         > <wozniak at mcs.anl.gov>
> >         >
> >         >         > Sent: Tuesday, October 11, 2011 1:32:37 PM
> >         >         > Subject: Re: [Swift-user] gram on ranger
> >         >
> >         >         > so, this workflow completes all the jobs but then
> >         just hangs
> >         >         > indefinitely at the end...maybe a stray cleanup
> >         job?
> >         >         >
> >         >         > log is here:
> >         >         >
> >         >
> >         > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log
> >         >         >
> >         >         > just tweaked the sites file a bit from what david
> >         sent me:
> >         >         >
> >         >         > <config>
> >         >         > <pool handle="RANGER">
> >         >         > <execution provider="coaster" jobManager="gt2:SGE"
> >         url="
> >         >         > gatekeeper.ranger.tacc.teragrid.org "/>
> >         >         > <filesystem provider="gsiftp" url="gsiftp://
> >         >
> >         >         > gridftp.ranger.tacc.teragrid.org "/>
> >         >
> >         >         > <profile namespace="globus"
> >         key="maxtime">28800</profile>
> >         >         > <profile namespace="globus"
> >         >         key="maxWallTime">00:15:00</profile>
> >         >         > <profile namespace="globus"
> >         key="jobsPerNode">1</profile>
> >         >         > <profile namespace="globus"
> >         >         key="nodeGranularity">64</profile>
> >         >         > <profile namespace="globus"
> >         key="maxNodes">256</profile>
> >         >         > <profile namespace="globus"
> >         key="queue">normal</profile>
> >         >         > <profile namespace="karajan"
> >         key="jobThrottle">1</profile>
> >         >         > <profile namespace="globus"
> >         >         key="project">TG-DBS080004N</profile>
> >         >         > <profile namespace="globus"
> >         key="pe">16way</profile>
> >         >         > <profile namespace="karajan"
> >         >         key="initialScore">10000</profile>
> >         >         >
> >         >
> >
> <workdirectory>/work/00043/tg457040/sidgrid_out/skenny</workdirectory>
> >         >         > </pool>
> >         >         > </config>
> >         >         >
> >         >         >
> >         >         >
> >         >         > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny <
> >         >         skenny at uchicago.edu >
> >         >         > wrote:
> >         >         >
> >         >         >
> >         >         > ok, thanks, got in the queue now...also, realized
> >         my last
> >         >         run may have
> >         >         > been using the old swift. apparently i had
> >         SWIFT_HOME set in
> >         >         my env
> >         >         > and that overrides the newer swift i had set in my
> >         PATH.
> >         >         >
> >         >         > ~sk
> >         >         >
> >         >         >
> >         >         >
> >         >         > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly <
> >         >         davidk at ci.uchicago.edu
> >         >         > > wrote:
> >         >         >
> >         >         >
> >         >         >
> >         >         >
> >         >         >
> >         >         > Sarah,
> >         >         >
> >         >         > Can you give this another try with the latest
> >         0.93? I made
> >         >         some
> >         >         > changes to the coaster and sge providers and was
> >         able to get
> >         >         it
> >         >         > working with a simple catns script. Here is the
> >         >         configuration file I
> >         >         > was using:
> >         >         >
> >         >         > <config>
> >         >         > <pool handle="ranger">
> >         >         > <execution provider="coaster" jobManager="gt2:SGE"
> >         url="
> >         >         > gatekeeper.ranger.tacc.teragrid.org "/>
> >         >         >
> >         >         > <filesystem provider="gsiftp" url="gsiftp://
> >         >
> >         >         > gridftp.ranger.tacc.teragrid.org "/>
> >         >
> >         >         > <profile namespace="globus"
> >         key="maxtime">3600</profile>
> >         >         > <profile namespace="globus"
> >         >         key="maxWallTime">00:00:03</profile>
> >         >         > <profile namespace="globus"
> >         key="jobsPerNode">1</profile>
> >         >         > <profile namespace="globus"
> >         >         key="nodeGranularity">16</profile>
> >         >         > <profile namespace="globus"
> >         key="maxNodes">16</profile>
> >         >         > <profile namespace="globus"
> >         >         key="queue">development</profile>
> >         >         > <profile namespace="karajan"
> >         key="jobThrottle">0.9</profile>
> >         >         >
> >         >         > <profile namespace="globus"
> >         >         key="project">TG-DBS080004N</profile>
> >         >         >
> >         >         > <profile namespace="globus"
> >         key="pe">16way</profile>
> >         >         >
> >         >
> >
> <workdirectory>/share/home/01503/davidkel/swiftwork</workdirectory>
> >         >         > </pool>
> >         >         > </config>
> >         >         >
> >         >         > Thanks,
> >         >         >
> >         >         > David
> >         >         >
> >         >         > ----- Original Message -----
> >         >         >
> >         >         > > From: "Sarah Kenny" < skenny at uchicago.edu >
> >         >         > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov >
> >         >         > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu
> >         >, "Swift
> >         >         User" <
> >         >         > > swift-user at ci.uchicago.edu >
> >         >         >
> >         >         >
> >         >         >
> >         >         > > Sent: Friday, October 7, 2011 3:13:57 PM
> >         >         > > Subject: Re: [Swift-user] gram on ranger
> >         >         >
> >         > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log
> >         >         > >
> >         >         > > on ci
> >         >         > >
> >         >         > >
> >         >         > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak
> >         <
> >         >         > > wozniak at mcs.anl.gov
> >         >         > > > wrote:
> >         >         > >
> >         >         > >
> >         >         > >
> >         >         > > Can I take a look at the log?
> >         >         > >
> >         >         > >
> >         >         > >
> >         >         > >
> >         >         > > On Thu, 6 Oct 2011, Sarah Kenny wrote:
> >         >         > >
> >         >         > >
> >         >         > >
> >         >         > > hey all, i'm trying to submit to gram on ranger
> >         using the
> >         >         latest
> >         >         > > swift
> >         >         > > (built from trunk). it failes like so:
> >         >         > >
> >         >         > > Cannot submit job
> >         >         > > Caused by:
> >         >         > > org.globus.cog.abstraction. impl.common.task.
> >         >         > > TaskSubmissionException:
> >         >         > > Cannot
> >         >         > > submit job
> >         >         > > Caused by: org.globus.gram.GramException:
> >         Parameter not
> >         >         supported
> >         >         > > Cannot submit job
> >         >         > >
> >         >         > > the gram log was saying first that 'jobsPerNode'
> >         is not
> >         >         supported so
> >         >         > > i
> >         >         > > changed it to workersPerNode and then it was
> >         saying
> >         >         'maxnodes' is
> >         >         > > not
> >         >         > > supported. here's my sites file:
> >         >         > >
> >         >         > > <config>
> >         >         > > <pool handle="RANGER">
> >         >         > > <profile namespace="karajan"
> >         key="initialScore">10000</
> >         >         profile>
> >         >         > > <profile namespace="karajan"
> >         key="jobThrottle">1</profile>
> >         >         > > <profile namespace="globus"
> >         key="maxWallTime">00:15:00</
> >         >         profile>
> >         >         > > <profile namespace="globus"
> >         key="maxTime">86400</profile>
> >         >         > > <profile namespace="globus"
> >         key="slots">1</profile>
> >         >         > > <profile namespace="globus"
> >         key="maxNodes">256</profile>
> >         >         > > <profile namespace="globus"
> >         key="pe">16way</profile>
> >         >         > > <profile namespace="globus"
> >         key="workersPerNode">1</
> >         >         profile>
> >         >         > > <profile namespace="globus"
> >         key="nodeGranularity">64</
> >         >         profile>
> >         >         > > <profile namespace="globus"
> >         key="queue">normal</profile>
> >         >         > > <profile namespace="globus"
> >         key="project">TG-DBS080004N</
> >         >         profile>
> >         >         > > <filesystem provider="gsiftp" url="gsiftp://
> >         >         > > gridftp.ranger.tacc.teragrid. org "/>
> >         >         >
> >         >         > > <execution provider="coaster"
> >         jobManager="gt2:gt2:SGE"
> >         >         url="
> >         >         > > gatekeeper.ranger.tacc. teragrid.org "/>
> >         >         >
> >         >         > > <execution provider="gt2" jobManager="SGE" url="
> >         >         > > gatekeeper.ranger.tacc. teragrid.org "/>
> >         >         > > <workdirectory>/work/00043/
> >         tg457040</workdirectory>
> >         >         >
> >         >         > > </pool>
> >         >         > > </config>
> >         >         > >
> >         >         > > thoughts? ideas?
> >         >         > >
> >         >         > > --
> >         >         > > Justin M Wozniak
> >         >         > >
> >         >         > >
> >         >         > >
> >         >         > > --
> >         >         > > Sarah Kenny
> >         >         > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224
> >         Bio Sci
> >         >         III
> >         >         > > University of California Irvine, Dept. of
> >         Neurology ~
> >         >         773-818-8300
> >         >         > >
> >         >         > >
> >         >         > > _______________________________________________
> >         >         > > Swift-user mailing list
> >         >         > > Swift-user at ci.uchicago.edu
> >         >         > >
> >         >
> >
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >         >         >
> >         >         >
> >         >         >
> >         >         >
> >         >         >
> >         >         >
> >         >         > --
> >         >         > Sarah Kenny
> >         >         > Programmer ~ Brain Circuits Laboratory ~ Rm 2224
> >         Bio Sci III
> >         >         > University of California Irvine, Dept. of
> >         Neurology ~
> >         >         773-818-8300
> >         >         >
> >         >         >
> >         >         >
> >         >         >
> >         >         > --
> >         >         > Sarah Kenny
> >         >         > Programmer ~ Brain Circuits Laboratory ~ Rm 2224
> >         Bio Sci III
> >         >         > University of California Irvine, Dept. of
> >         Neurology ~
> >         >         773-818-8300
> >         >
> >         >
> >         >
> >         >
> >         > --
> >         > Sarah Kenny
> >         > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
> >         > University of California Irvine, Dept. of Neurology ~
> >         773-818-8300
> >         >
> >         > _______________________________________________
> >         > Swift-user mailing list
> >         > Swift-user at ci.uchicago.edu
> >         >
> >
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> >
> >
> >
> >
> >
> >
> > --
> > Sarah Kenny
> > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
> > University of California Irvine, Dept. of Neurology ~ 773-818-8300
> >
>
>
>


-- 
Sarah Kenny
Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
University of California Irvine, Dept. of Neurology ~ 773-818-8300
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20111020/b9008d7f/attachment.html>


More information about the Swift-devel mailing list