[Swift-devel] Condor with coasters question

Michael Wilde wilde at mcs.anl.gov
Sat Apr 28 15:13:15 CDT 2012


I meant to cc this to swift-devel so am resending it.

I think multi-node jobs on Condor should work in principle but in
practice may need to be tested and debugged.

I think we should first see if we can fill the UC3 cluster with
maxnode=1 slots=500.

One possible reason that only 70 jobs were issued is that your prior
test, David, looks like it was using default values for the times
involved, and possible Swift "packed" the pending requests into the 70
job slots you saw.  Hence my suggestion to try the config below.

- Mike

On Sat, Apr 28, 2012 at 1:28 PM, Michael Wilde <wilde at mcs.anl.gov> wrote:
> David, can you try a test that specifies:
>
> Maxtime 3600
> Maxwalltime 00:00:10 (or as needed for your app)
> High and lowoverallocation 100
>
> I would think each coaster ( x 480 ) should get a separate submit file
> with count 1, just as would be done for PBS.
>
> - Mike
>
> On 4/28/12, David Kelly <davidk at ci.uchicago.edu> wrote:
>> Hello,
>>
>> I am trying to get Swift working well on a machine that uses condor. It has
>> 480 available slots. I am using a swift script that will run 1000 tasks.
>>
>> sites.xml:
>> <config>
>>    <pool handle="uc3">
>>      <execution jobmanager="local:condor" provider="coaster" url="none"/>
>>      <filesystem provider="local" url="none" />
>>
>> <workdirectory>/home/davidk/test/benchmark-release/run012</workdirectory>
>>      <profile namespace="globus" key="maxNodes">480</profile>
>>      <profile key="slots" namespace="globus">480</profile>
>>      <profile key="nodeGranularity" namespace="globus">1</profile>
>>      <profile key="jobsPerNode" namespace="globus">1</profile>
>>      <profile namespace="karajan" key="jobThrottle">1000</profile>
>>      <profile namespace="karajan" key="initialScore">10000</profile>
>>    </pool>
>> </config>
>>
>> cf:
>> wrapperlog.always.transfer=true
>> sitedir.keep=false
>> execution.retries=0
>> lazy.errors=false
>> status.mode=provider
>> use.provider.staging=false
>> provider.staging.pin.swiftfiles=true
>> foreach.max.threads=1000
>>
>> What I am seeing is that only ~70 tasks are active at once. When I look at
>> condor_q, I see there are ~70 jobs that I have submitted, no more, none
>> idle. Any ideas where this limit is coming from?
>>
>> I thought I would be around this by setting nodeGranularity to 50. But when
>> I do this, what seems to happen is that there are 50 machines allocated per
>> 1 worker.pl which would make sense for an MPI job, but not what I want here.
>> (The condor submit script sets machine_count to 50, but only queues 1)
>>
>> I can get around this now by using the plain condor provider, but ideally
>> would like to use coasters.
>>
>> Thanks,
>> David
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>
>
> --
> Sent from my mobile device



More information about the Swift-devel mailing list