[Swift-devel] burnin' up ranger w/the latest coasters
skenny at uchicago.edu
skenny at uchicago.edu
Fri Oct 23 13:02:00 CDT 2009
>> however...when i use the configs here and i try to run a
>> workflow with 196,608 jobs it seems that coasters starts to
>> ramp up nicely, but maybe a little too well :) as it begins
>> requesting more cores than i'm allowed in the normal queue on
>> ranger. that is, the limit is 4096. i tried changing maxNodes
>> to 4096 which did not work.
>
>Shouldn't that be 4096/workersPerNode?
don't think i'm understanding you here...the workersPerNode
you originally suggested was 32. why would i increase that to
4096 when what i'm trying to do is request fewer total cores?
>
>> i'm wondering if workers per node
>> should actually be 16 instead (?) but i know you've gotten it
>> to work well with the setting at 32 so i'm not sure...
>
>You could set it to 16. My reasoning for doubling it was that
if the
>processes you run are slightly I/O bound, then you'd get
slightly better
>performance by running two processes per core.
>
>>
>> anyway, it ramped up nicely (and was only like 8 jobs away
>> from finishing the whole thing) i just need to know how to cap
>> it off so it won't ask for more than 4096 cores.
>>
>> thanks
>> ~sk
>>
>> ---- Original message ----
>> >Date: Wed, 14 Oct 2009 15:39:15 -0500 (CDT)
>> >From: <skenny at uchicago.edu>
>> >Subject: Re: [Swift-devel] burnin' up ranger w/the latest
>> coasters
>> >To: swift-user at ci.uchicago.edu, swift-devel at ci.uchicago.edu
>> >
>> >for those interested, here are the config files used for this
>> run:
>> >
>> >swift.properties:
>> >
>> >sites.file=config/coaster_ranger.xml
>> >tc.file=/ci/projects/cnari/config/tc.data
>> >lazy.errors=false
>> >caching.algorithm=LRU
>> >pgraph=false
>> >pgraph.graph.options=splines="compound", rankdir="TB"
>> >pgraph.node.options=color="seagreen", style="filled"
>> >clustering.enabled=false
>> >clustering.queue.delay=4
>> >clustering.min.time=60
>> >kickstart.enabled=maybe
>> >kickstart.always.transfer=false
>> >wrapperlog.always.transfer=false
>> >throttle.submit=3
>> >throttle.host.submit=8
>> >throttle.score.job.factor=64
>> >throttle.transfers=16
>> >throttle.file.operations=16
>> >sitedir.keep=false
>> >execution.retries=3
>> >replication.enabled=false
>> >replication.min.queue.time=60
>> >replication.limit=3
>> >foreach.max.threads=16384
>> >
>> >coaster_ranger.xml:
>> >
>> ><config>
>> > <import file="sys.xml"/>
>> ><import file="vdl-lib.xml"/>
>> ><set name="username">
>> > <arg><vdl:new type="string" value="user"/></arg>
>> ></set>
>> ><pool handle="RANGER">
>> > <profile namespace="karajan"
>> >key="jobThrottle">1000.0</profile>
>> > <filesystem provider="coaster"
>> >url="gt2://gatekeeper.ranger.tacc.teragrid.org"/>
>> > <profile namespace="globus" key="queue">normal</profile>
>> > <profile namespace="globus"
key="workersPerNode">32</profile>
>> > <profile namespace="globus"
key="nodeGranularity">1</profile>
>> > <profile namespace="globus" key="slots">16</profile>
>> > <profile namespace="globus" key="maxNodes">8192</profile>
>> > <profile namespace="globus" key="maxTime">72000</profile>
>> > <profile namespace="globus"
>> >key="project">TG-DBS080004N</profile>
>> > <execution provider="coaster"
>> >url="gatekeeper.ranger.tacc.teragrid.org"
>> >jobManager="gt2:gt2:SGE"/>
>> > <!--
>>
>workdirectory>/work/00043/tg457040/sidgrid_out/{username}</workdirectory
>> >-->
>> >
>>
><workdirectory>/work/00926/tg459516/sidgrid_out/{username}</workdirectory>
>> > </pool>
>> ></config>
>> >
>> >
>> >
>> >---- Original message ----
>> >>Date: Tue, 13 Oct 2009 11:14:17 -0500 (CDT)
>> >>From: <skenny at uchicago.edu>
>> >>Subject: [Swift-devel] burnin' up ranger w/the latest
coasters
>> >>To: swift-devel at ci.uchicago.edu
>> >>
>> >>Final status: Finished successfully:131072
>> >>
>> >>re-running some of the workflows from our recent SEM
>> >>paper with the latest swift...sadly, queue time on ranger has
>> >>only gone up since those initial runs...but luckily coasters
>> >>has speeded things up, so it ends up evening out for time to
>> >>solution :)
>> >>
>> >>not sure i fully understand the plot:
>> >>
>> >>http://www.ci.uchicago.edu/~skenny/workflows/sem_131k/
>> >>
>> >>log is here:
>> >>
>>
>>/ci/projects/cnari/logs/skenny/4reg_2cond-20091012-1607-ugidm2s2.log
>> >>_______________________________________________
>> >>Swift-devel mailing list
>> >>Swift-devel at ci.uchicago.edu
>> >>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>> >_______________________________________________
>> >Swift-devel mailing list
>> >Swift-devel at ci.uchicago.edu
>> >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
More information about the Swift-devel
mailing list