[Swift-user] Coaster provider is not allocating dedicated nodes
Michael Wilde
wilde at mcs.anl.gov
Wed Jan 20 09:42:51 CST 2010
The log for the run below is in:
/home/wilde/protests/run.loops.3231/psim.loops-20100120-0802-tsvkj4e7.log
- Mike
On 1/20/10 9:38 AM, Michael Wilde wrote:
> Using the sites entry below, I see that coasters is allocating 8
> *shared* nods rather than *dedicated* nodes; hence its running many more
> processes per node than it should, causing the jobs to run longer than
> expected and exceed their walltime.
>
> using this sites entry:
>
> <pool handle="pbs">
> <execution provider="coaster" url="none" jobManager="local:pbs"/>
>
> <profile namespace="globus" key="maxtime">7500</profile>
> <profile namespace="globus" key="workersPerNode">8</profile>
>
> <profile namespace="globus" key="slots">12</profile>
> <profile namespace="globus" key="nodeGranularity">1</profile>
> <profile namespace="globus" key="maxNodes">1</profile>
>
> <profile namespace="karajan" key="jobThrottle">1.27</profile>
> <profile namespace="karajan" key="initialScore">10000</profile>
> <filesystem provider="local"/>
> <workdirectory>$rundir</workdirectory>
> </pool>
>
> qstat (below) shows the 12 coaster jobs I requested with "slots=12", but
> they are only using 2 different nodes, c45 and c46, between them, even
> though they are running 96 total coaster workers. (I can see that I have
> 96 jobs active).
>
> It seems like between coasters and the PBS provider, Swift is nt telling
> PBS that each of these jobs should get a dedicated node of 8 cores.
>
>
> Job ID Username Queue Jobname SessID NDS TSK
> Memory Time S Time
> -------------------- -------- -------- ---------------- ------ ----- ---
> ------ ----- - -----
> 1034.svc.pads.ci wilde extended null 13086 1 --
> -- 02:04 R 01:26
> c46
> 1035.svc.pads.ci wilde extended null 13168 1 --
> -- 02:04 R 01:26
> c46
> 1036.svc.pads.ci wilde extended null 13387 1 --
> -- 02:04 R 01:26
> c46
> 1037.svc.pads.ci wilde extended null 14060 1 --
> -- 02:04 R 01:26
> c46
> 1038.svc.pads.ci wilde extended null 14237 1 --
> -- 02:04 R 01:26
> c46
> 1039.svc.pads.ci wilde extended null 14640 1 --
> -- 02:04 R 01:26
> c46
> 1040.svc.pads.ci wilde extended null 15200 1 --
> -- 02:04 R 01:26
> c46
> 1041.svc.pads.ci wilde extended null 15753 1 --
> -- 02:04 R 01:26
> c46
> 1042.svc.pads.ci wilde extended null 23700 1 --
> -- 02:04 R 01:26
> c45
> 1043.svc.pads.ci wilde extended null 23781 1 --
> -- 02:04 R 01:26
> c45
> 1044.svc.pads.ci wilde extended null 24016 1 --
> -- 02:04 R 01:26
> c45
> 1045.svc.pads.ci wilde extended null 24796 1 --
> -- 02:04 R 01:26
> c45
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
More information about the Swift-user
mailing list