[Swift-devel] block allocation

Mihael Hategan hategan at mcs.anl.gov
Wed May 27 14:23:10 CDT 2009


On Wed, 2009-05-27 at 13:42 -0500, Zhao Zhang wrote:
> Hi, Mihael
> 
> I have been running the language-behavior and application test with 
> coaster up to date from SVN.
> Could you help double check I am using the right version of cog?
> Swift svn swift-r2949 cog-r2406

Looks right.

> 
> Also, all my tests returned successful,

That is a bit intriguing.

>   are there any run-time logs 
> that I could see how many workers were
> running on each site and monitoring their status?

You'll find that information in the usual place:
~/.globus/coasters/coaster(s).log.

In there, every time the schedule is (re)calculated, something like this
is printed:
BlockQueueProcessor Committed 0 new jobs

* that's the number of new jobs that were added since the last planning
step.

BlockQueueProcessor Cleaned 0 done blocks

* blocks that finished since the last planning step.

BlockQueueProcessor Updated allocsize: 10332

* that's how much time is available in all the blocks (running or to be
started). It's the sum of all the remaining walltimes of all the
workers.

BlockQueueProcessor Queued 0 jobs to existing blocks

* for how many jobs it was deemed that existing blocks have enough space
to run them (this is really just a 2d box packing algorithm).

BlockQueueProcessor allocsize = 10332, queuedsize = 9600, qsz = 16

* queuedsize is the sum of walltimes of all jobs that are deemed to have
some space in some block; qsz is the number of such jobs.

BlockQueueProcessor Requeued 0 non-fitting jobs

* When the plan doesn't go according to plan, it may be required that
some jobs that were thought to fit in existing blocks won't really fit
in existing blocks. This shows the number of such jobs.

BlockQueueProcessor Required size: 0 for 0 jobs

* That tells you, for all jobs that don't have block space, the sum of
their walltimes and their number.

There are a few more coming from the same class when blocks are created.
I'll let you discover those.

>  Like how many 
> registered, how many are idle
> how many are busy and etc. I am also attaching two sites.xml definition 
> for uc-teragrid and ranger.
> 
> best
> zhao
> 
> [zzhang at communicado scip]$ cat 
> /home/zzhang/swift_coaster/cog/modules/swift/tests/sites/coaster_new/tgranger-sge-gram2.xml
> <config>
>   <pool handle="tgtacc" >
>     <gridftp  url="gsiftp://gridftp.ranger.tacc.teragrid.org" />
>     <execution  provider="coaster" 
> url="gatekeeper.ranger.tacc.teragrid.org" jobManager="gt2:gt2:SGE"/>
>     <profile namespace="globus" key="project">TG-CCR080022N</profile>
>     <workdirectory >/work/00946/zzhang/work</workdirectory>
>     <profile namespace="env" 
> key="SWIFT_JOBDIR_PATH">/tmp/zzhang/jobdir</profile>
>     <profile namespace="globus" key="coastersPerNode">16</profile>
>     <profile namespace="globus" key="queue">development</profile>
>     <profile namespace="karajan" key="initialScore">50</profile>
>     <profile namespace="karajan" key="jobThrottle">10</profile>
>     <profile namespace="globus" key="slots">4</profile>

You should probably up that to the number of safe gt2 jobs (20).

>     <profile namespace="globus" key="nodeGranularity">2</profile>

You don't really need to specify a node granularity other than 1, since
you can request any number of nodes on ranger.

>     <profile namespace="globus" key="lowOverAllocation">5</profile>
>     <profile namespace="globus" key="highOverAllocation">1</profile>
>     <profile namespace="globus" key="maxNodes">2</profile>

You probably want to have a larger maxnodes. You probably don't event
want to specify it, because there's nothing imposing a limit on that.

>     <profile namespace="globus"key="remoteMonitorEnabled">false</profile>
>   </pool>

You also don't need to specify the default.





More information about the Swift-devel mailing list