[Swift-devel] Condor with coasters question

David Kelly davidk at ci.uchicago.edu
Sat Apr 28 16:54:46 CDT 2012


I adjusted the parameters a bit and tried again with this configuration:

<config>
   <pool handle="uc3">
     <execution jobmanager="local:condor" provider="coaster" url="none"/>
     <filesystem provider="local" url="none" />
     <workdirectory>_WORK_</workdirectory>
     <profile namespace="globus" key="maxNodes">1000</profile>
     <profile key="slots" namespace="globus">1000</profile>
     <profile key="maxTime" namespace="globus">3600</profile>
     <profile key="maxWalltime" namespace="globus">00:05:00</profile>
     <profile key="highOverallocation" namespace="globus">100</profile>
     <profile key="lowOverallocation" namespace="globus">100</profile>
     <profile key="nodeGranularity" namespace="globus">1</profile>
     <profile key="jobsPerNode" namespace="globus">1</profile>
     <profile namespace="karajan" key="jobThrottle">1000</profile>
     <profile namespace="karajan" key="initialScore">10000</profile>
   </pool>
</config>

The maximum number of active jobs maxed out at 101 with this.

Thanks,
David

----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "David Kelly" <davidk at ci.uchicago.edu>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Saturday, April 28, 2012 3:13:15 PM
> Subject: Re: [Swift-devel] Condor with coasters question
> I meant to cc this to swift-devel so am resending it.
> 
> I think multi-node jobs on Condor should work in principle but in
> practice may need to be tested and debugged.
> 
> I think we should first see if we can fill the UC3 cluster with
> maxnode=1 slots=500.
> 
> One possible reason that only 70 jobs were issued is that your prior
> test, David, looks like it was using default values for the times
> involved, and possible Swift "packed" the pending requests into the 70
> job slots you saw. Hence my suggestion to try the config below.
> 
> - Mike
> 
> On Sat, Apr 28, 2012 at 1:28 PM, Michael Wilde <wilde at mcs.anl.gov>
> wrote:
> > David, can you try a test that specifies:
> >
> > Maxtime 3600
> > Maxwalltime 00:00:10 (or as needed for your app)
> > High and lowoverallocation 100
> >
> > I would think each coaster ( x 480 ) should get a separate submit
> > file
> > with count 1, just as would be done for PBS.
> >
> > - Mike
> >
> > On 4/28/12, David Kelly <davidk at ci.uchicago.edu> wrote:
> >> Hello,
> >>
> >> I am trying to get Swift working well on a machine that uses
> >> condor. It has
> >> 480 available slots. I am using a swift script that will run 1000
> >> tasks.
> >>
> >> sites.xml:
> >> <config>
> >>    <pool handle="uc3">
> >>      <execution jobmanager="local:condor" provider="coaster"
> >>      url="none"/>
> >>      <filesystem provider="local" url="none" />
> >>
> >> <workdirectory>/home/davidk/test/benchmark-release/run012</workdirectory>
> >>      <profile namespace="globus" key="maxNodes">480</profile>
> >>      <profile key="slots" namespace="globus">480</profile>
> >>      <profile key="nodeGranularity" namespace="globus">1</profile>
> >>      <profile key="jobsPerNode" namespace="globus">1</profile>
> >>      <profile namespace="karajan" key="jobThrottle">1000</profile>
> >>      <profile namespace="karajan"
> >>      key="initialScore">10000</profile>
> >>    </pool>
> >> </config>
> >>
> >> cf:
> >> wrapperlog.always.transfer=true
> >> sitedir.keep=false
> >> execution.retries=0
> >> lazy.errors=false
> >> status.mode=provider
> >> use.provider.staging=false
> >> provider.staging.pin.swiftfiles=true
> >> foreach.max.threads=1000
> >>
> >> What I am seeing is that only ~70 tasks are active at once. When I
> >> look at
> >> condor_q, I see there are ~70 jobs that I have submitted, no more,
> >> none
> >> idle. Any ideas where this limit is coming from?
> >>
> >> I thought I would be around this by setting nodeGranularity to 50.
> >> But when
> >> I do this, what seems to happen is that there are 50 machines
> >> allocated per
> >> 1 worker.pl which would make sense for an MPI job, but not what I
> >> want here.
> >> (The condor submit script sets machine_count to 50, but only queues
> >> 1)
> >>
> >> I can get around this now by using the plain condor provider, but
> >> ideally
> >> would like to use coasters.
> >>
> >> Thanks,
> >> David
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >>
> >
> > --
> > Sent from my mobile device



More information about the Swift-devel mailing list