[Swift-devel] coaster is resubmitting LRM job for every burst of jobs

Mihael Hategan hategan at mcs.anl.gov
Sat Sep 21 20:10:26 CDT 2013


The service shuts down unused blocks. If you throttle is such that only
two jobs make it to the service at a time, there might be a delay
sufficiently large to cause the blocks to shut down. Increase you
throttle and let the service have a larger pool of jobs to run if you
want to avoid that.

Mihael

On Sat, 2013-09-21 at 19:50 -0500, Ketan Maheshwari wrote:
> Hi,
> 
> I am running a local:pbs swift-coaster setup on LCRC Blues with the
> jobspernode value of 2 and walltime of about 70 minutes.
> 
> There are about 120 jobs in this application.
> 
> What I observe is that PBS job is being killed and resubmitted every time a
> new burst of 2 jobs are run.
> 
> With coasters, my assumption is that the LRM job will continue to run as
> long as there are more Swift jobs in the wing which seems to be not
> happening.
> 
> Swift is trunk: swift-r7079 cog-r3778
> 
> Config files are as follows:
> $ cat sites.blues.xml
> <?xml version="1.0" encoding="UTF-8"?>
> <config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
> <pool handle="blues">
>   <execution jobmanager="local:pbs" provider="coaster" url="none"/>
>   <filesystem provider="local" url="none" />
>   <profile namespace="globus" key="maxtime">4300</profile>
>   <profile namespace="globus" key="maxWalltime">01:10:00</profile>
>   <profile namespace="globus" key="jobsPerNode">2</profile>
>   <profile namespace="globus" key="slots">1</profile>
>   <profile namespace="globus" key="ppn">2</profile>
>   <profile namespace="globus" key="nodeGranularity">1</profile>
>   <profile namespace="globus" key="maxnodes">1</profile>
>   <profile namespace="karajan" key="jobThrottle">0.01</profile>
>   <profile namespace="karajan" key="initialScore">10000</profile>
>   <workdirectory>/home/ketan/swift.workdir</workdirectory>
> </pool>
> </config>
> 
> $ cat cf
> wrapperlog.always.transfer=false
> sitedir.keep=true
> file.gc.enabled=false
> status.mode=provider
> execution.retries=0
> lazy.errors=false
> use.provider.staging=true
> provider.staging.pin.swiftfiles=true
> use.wrapper.staging=false
> 
> This looks like connected to a recent issue where just one round of jobs
> would get submitted.
> 
> Thanks,
> Hi,
> 
> 
> I am running a local:pbs swift-coaster setup on LCRC Blues with the
> jobspernode value of 2 and walltime of about 70 minutes. 
> 
> 
> There are about 120 jobs in this application.
> 
> 
> What I observe is that PBS job is being killed and resubmitted every
> time a new burst of 2 jobs are run.
> 
> 
> With coasters, my assumption is that the LRM job will continue to run
> as long as there are more Swift jobs in the wing which seems to be not
> happening.
> 
> 
> Swift is trunk: swift-r7079 cog-r3778 
> 
> 
> Config files are as follows:
> $ cat sites.blues.xml
> <?xml version="1.0" encoding="UTF-8"?>
> <config xmlns="http://www.ci.uchicago.edu/swift/SwiftSites">
> <pool handle="blues">
>   <execution jobmanager="local:pbs" provider="coaster" url="none"/>
>   <filesystem provider="local" url="none" />
>   <profile namespace="globus" key="maxtime">4300</profile>
>   <profile namespace="globus" key="maxWalltime">01:10:00</profile>
>   <profile namespace="globus" key="jobsPerNode">2</profile>
>   <profile namespace="globus" key="slots">1</profile>
>   <profile namespace="globus" key="ppn">2</profile>
>   <profile namespace="globus" key="nodeGranularity">1</profile>
>   <profile namespace="globus" key="maxnodes">1</profile>
>   <profile namespace="karajan" key="jobThrottle">0.01</profile>
>   <profile namespace="karajan" key="initialScore">10000</profile>
>   <workdirectory>/home/ketan/swift.workdir</workdirectory>
> </pool>
> </config>
> 
> $ cat cf
> wrapperlog.always.transfer=false
> sitedir.keep=true
> file.gc.enabled=false
> status.mode=provider
> execution.retries=0
> lazy.errors=false
> use.provider.staging=true
> provider.staging.pin.swiftfiles=true
> use.wrapper.staging=false
> 
> 
> This looks like connected to a recent issue where just one round of
> jobs would get submitted.
> 
> 
> Thanks,
> 
> -- 
> Ketan
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel





More information about the Swift-devel mailing list