[Swift-user] Coaster provider is not allocating dedicated nodes

Michael Wilde wilde at mcs.anl.gov
Wed Jan 20 10:42:16 CST 2010


I ran a test of the same <pool> entry using a simple foreach/cat script 
and captured the PBS submit file. It shows:

login2$ more logs/PBS1883411659688642512.submit
#PBS -S /bin/sh
#PBS -N null
#PBS -m n
#PBS -l nodes=1
#PBS -l walltime=01:10:00
#PBS -o /home/wilde/.globus/scripts/PBS1883411659688642512.submit.stdout
#PBS -e /home/wilde/.globus/scripts/PBS1883411659688642512.submit.stderr
/usr/bin/perl /home/wilde/.globus/coasters/cscript2151716324069557151.pl 
http://192.5.86.
6:50003 0120-331021-000010 8 /home/wilde/.globus/coasters
/bin/echo $? 
 >/home/wilde/.globus/scripts/PBS1883411659688642512.submit.exitcode
login2$

It seems that the line "#PBS -l nodes=1" should be:
#PBS -l nodes=1:ppn=8

- Mike


On 1/20/10 9:42 AM, Michael Wilde wrote:
> The log for the run below is in:
> 
> /home/wilde/protests/run.loops.3231/psim.loops-20100120-0802-tsvkj4e7.log
> 
> - Mike
> 
> On 1/20/10 9:38 AM, Michael Wilde wrote:
>> Using the sites entry below, I see that coasters is allocating 8 
>> *shared* nods rather than *dedicated* nodes; hence its running many more 
>> processes per node than it should, causing the jobs to run longer than 
>> expected and exceed their walltime.
>>
>> using this sites entry:
>>
>>    <pool handle="pbs">
>>      <execution provider="coaster" url="none" jobManager="local:pbs"/>
>>
>>      <profile namespace="globus" key="maxtime">7500</profile>
>>      <profile namespace="globus" key="workersPerNode">8</profile>
>>
>>      <profile namespace="globus" key="slots">12</profile>
>>      <profile namespace="globus" key="nodeGranularity">1</profile>
>>      <profile namespace="globus" key="maxNodes">1</profile>
>>
>>      <profile namespace="karajan" key="jobThrottle">1.27</profile>
>>      <profile namespace="karajan" key="initialScore">10000</profile>
>>      <filesystem provider="local"/>
>>      <workdirectory>$rundir</workdirectory>
>>    </pool>
>>
>> qstat (below) shows the 12 coaster jobs I requested with "slots=12", but 
>> they are only using 2 different nodes, c45 and c46, between them, even 
>> though they are running 96 total coaster workers. (I can see that I have 
>> 96 jobs active).
>>
>> It seems like between coasters and the PBS provider, Swift is nt telling 
>> PBS that each of these jobs should get a dedicated node of 8 cores.
>>
>>
>> Job ID               Username Queue    Jobname          SessID NDS   TSK 
>> Memory Time  S Time
>> -------------------- -------- -------- ---------------- ------ ----- --- 
>> ------ ----- - -----
>> 1034.svc.pads.ci     wilde    extended null              13086     1  -- 
>>     --  02:04 R 01:26
>>     c46
>> 1035.svc.pads.ci     wilde    extended null              13168     1  -- 
>>     --  02:04 R 01:26
>>     c46
>> 1036.svc.pads.ci     wilde    extended null              13387     1  -- 
>>     --  02:04 R 01:26
>>     c46
>> 1037.svc.pads.ci     wilde    extended null              14060     1  -- 
>>     --  02:04 R 01:26
>>     c46
>> 1038.svc.pads.ci     wilde    extended null              14237     1  -- 
>>     --  02:04 R 01:26
>>     c46
>> 1039.svc.pads.ci     wilde    extended null              14640     1  -- 
>>     --  02:04 R 01:26
>>     c46
>> 1040.svc.pads.ci     wilde    extended null              15200     1  -- 
>>     --  02:04 R 01:26
>>     c46
>> 1041.svc.pads.ci     wilde    extended null              15753     1  -- 
>>     --  02:04 R 01:26
>>     c46
>> 1042.svc.pads.ci     wilde    extended null              23700     1  -- 
>>     --  02:04 R 01:26
>>     c45
>> 1043.svc.pads.ci     wilde    extended null              23781     1  -- 
>>     --  02:04 R 01:26
>>     c45
>> 1044.svc.pads.ci     wilde    extended null              24016     1  -- 
>>     --  02:04 R 01:26
>>     c45
>> 1045.svc.pads.ci     wilde    extended null              24796     1  -- 
>>     --  02:04 R 01:26
>>     c45
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user



More information about the Swift-user mailing list