[Swift-devel] block allocation

Wed May 27 16:31:13 CDT 2009

Hi, Mihael

I did a clean run of 100 scip jobs on ranger. (scip is a new application 
from MCS).
The log is at /home/zzhang/scip/ranger-logs/coasters.log

Mihael Hategan wrote:
> On Wed, 2009-05-27 at 13:42 -0500, Zhao Zhang wrote:
>   
>> Hi, Mihael
>>
>> I have been running the language-behavior and application test with 
>> coaster up to date from SVN.
>> Could you help double check I am using the right version of cog?
>> Swift svn swift-r2949 cog-r2406
>>     
>
> Looks right.
>
>   
>> Also, all my tests returned successful,
>>     
>
> That is a bit intriguing.
>
>   
>>   are there any run-time logs 
>> that I could see how many workers were
>> running on each site and monitoring their status?
>>     
>
> You'll find that information in the usual place:
> ~/.globus/coasters/coaster(s).log.
>
> In there, every time the schedule is (re)calculated, something like this
> is printed:
> BlockQueueProcessor Committed 0 new jobs
>
> * that's the number of new jobs that were added since the last planning
> step.
>   
This number is all 0 in the log files, is this normal?
> BlockQueueProcessor Cleaned 0 done blocks
>
> * blocks that finished since the last planning step.
>   
This value is all 0 too.
> BlockQueueProcessor Updated allocsize: 10332
>
> * that's how much time is available in all the blocks (running or to be
> started). It's the sum of all the remaining walltimes of all the
> workers.
>
> BlockQueueProcessor Queued 0 jobs to existing blocks
>
> * for how many jobs it was deemed that existing blocks have enough space
> to run them (this is really just a 2d box packing algorithm).
>
> BlockQueueProcessor allocsize = 10332, queuedsize = 9600, qsz = 16
>
> * queuedsize is the sum of walltimes of all jobs that are deemed to have
> some space in some block; qsz is the number of such jobs.
>
> BlockQueueProcessor Requeued 0 non-fitting jobs
>
> * When the plan doesn't go according to plan, it may be required that
> some jobs that were thought to fit in existing blocks won't really fit
> in existing blocks. This shows the number of such jobs.
>
> BlockQueueProcessor Required size: 0 for 0 jobs
>
> * That tells you, for all jobs that don't have block space, the sum of
> their walltimes and their number.
>   
Based on the above values, I could tell that coaster schedules jobs on 
job-wall-time. So before we
run an application, we need to have an estimated running time, is this 
correct? How about if we don't
know about the running time, or the running time for each job varies a lot?

> There are a few more coming from the same class when blocks are created.
> I'll let you discover those.
>   
Yes, I am reading through the logs and trying to find more fun from it.

zhao
>   
>>  Like how many 
>> registered, how many are idle
>> how many are busy and etc. I am also attaching two sites.xml definition 
>> for uc-teragrid and ranger.
>>
>> best
>> zhao
>>
>> [zzhang at communicado scip]$ cat 
>> /home/zzhang/swift_coaster/cog/modules/swift/tests/sites/coaster_new/tgranger-sge-gram2.xml
>> <config>
>>   <pool handle="tgtacc" >
>>     <gridftp  url="gsiftp://gridftp.ranger.tacc.teragrid.org" />
>>     <execution  provider="coaster" 
>> url="gatekeeper.ranger.tacc.teragrid.org" jobManager="gt2:gt2:SGE"/>
>>     <profile namespace="globus" key="project">TG-CCR080022N</profile>
>>     <workdirectory >/work/00946/zzhang/work</workdirectory>
>>     <profile namespace="env" 
>> key="SWIFT_JOBDIR_PATH">/tmp/zzhang/jobdir</profile>
>>     <profile namespace="globus" key="coastersPerNode">16</profile>
>>     <profile namespace="globus" key="queue">development</profile>
>>     <profile namespace="karajan" key="initialScore">50</profile>
>>     <profile namespace="karajan" key="jobThrottle">10</profile>
>>     <profile namespace="globus" key="slots">4</profile>
>>     
>
> You should probably up that to the number of safe gt2 jobs (20).
>
>   
>>     <profile namespace="globus" key="nodeGranularity">2</profile>
>>     
>
> You don't really need to specify a node granularity other than 1, since
> you can request any number of nodes on ranger.
>
>   
>>     <profile namespace="globus" key="lowOverAllocation">5</profile>
>>     <profile namespace="globus" key="highOverAllocation">1</profile>
>>     <profile namespace="globus" key="maxNodes">2</profile>
>>     
>
> You probably want to have a larger maxnodes. You probably don't event
> want to specify it, because there's nothing imposing a limit on that.
>
>   
>>     <profile namespace="globus"key="remoteMonitorEnabled">false</profile>
>>   </pool>
>>     
>
> You also don't need to specify the default.
>
>
>
>