[Swift-user] Tuning parameters of coaster execution

Andriy Fedorov fedorov at bwh.harvard.edu
Tue Jan 19 09:38:45 CST 2010


Hi Mihael,

I've been playing with this following your suggestions, but I can't
get it to work.

Here's my site description:

<pool handle="Abe-GT2-coasters">
  <gridftp  url="local://localhost" />
  <execution provider="coaster" jobmanager="gt2:gt2:pbs"
url="grid-abe.ncsa.teragrid.org"/>
  <workdirectory>/u/ac/fedorov/scratch-global/scratch</workdirectory>
  <profile namespace="karajan" key="jobThrottle">2.55</profile>
  <profile namespace="karajan" key="initialScore">10000</profile>
  <profile namespace="globus" key="nodeGranularity">10</profile>
  <profile namespace="globus" key="remoteMonitorEnabled">false</profile>
  <profile namespace="globus" key="parallelism">0.1</profile>
  <profile namespace="globus" key="workersPerNode">2</profile>
  <profile namespace="globus" key="highOverallocation">10</profile>
</pool>

My maxWalltime for the job is 2, and I have 100 of them. When I run
the script, I see one job in the queue, with 10 nodes and 22 minutes
walltime. However, when the script is executing, it appears the jobs
are being scheduled one at a time. I have the current checkout of the
cog/swift trunk: Swift svn swift-r3202 cog-r2682. I attach the
coaster.log file for your reference.

Can you help me understand what I am doing wrong?

Also, I was trying to look in the code that does allocation, and it
seems that the code responsible for determining the block size for
allocation is in
modules/provider-coaster/src/org/globus/cog/abstraction/coaster/service/job/manager/BlockQueueProcessor.java.
Is this correct? And what is the piece of code that decides how to
schedule jobs within the allocated block?

I would appreciate any help. Thank you.

--
Andriy Fedorov, Ph.D.

Research Fellow
Brigham and Women's Hospital
Harvard Medical School
75 Francis Street
Boston, MA 02115 USA
fedorov at bwh.harvard.edu



On Tue, Oct 20, 2009 at 11:23, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> On Tue, 2009-10-20 at 12:04 -0400, Andriy Fedorov wrote:
>> On Tue, Oct 20, 2009 at 11:55, Mihael Hategan <hategan at mcs.anl.gov> wrote:
>> > You need a more recent version of the code.
>> >
>>
>> Mihael, I actually updated svn for both cog and swift yesterday prior
>> to running the tests. Here's what swift reports I have right now:
>>
>> Swift svn swift-r3170 cog-r2529
>
> Given that even when you have granularity=10 you still see 2 jobs, I
> suspect you are using swift site throttling parameters that force that.
> I would set the jobThrottle higher and possibly the initial score
> higher.
>
> For troubleshooting, what you could do is, on the remote side, say cat
> ~/.globus/coasters/coasters.log|grep "BlockQueueProcessor">bqp.log and
> post that. Also, you could set the remoteMonitorEnabled profile to
> "true" to get visual feedback of what's happening.
>
> The allocation time is 18 minutes because the new stuff doesn't
> overallocate using a fixed multiplier (though you can force it to do
> so). For small jobs (walltime = 1s) the multiplier is set by
> lowOverallocation (10.0 by default) while for large jobs (walltime ->
> +inf) the multiplier is 1, with an exponential decay in-between.
>
> If you want to always have blocks being 10 times the job walltime, you
> can set highOverallocation to 10.
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bqp.log
Type: text/x-log
Size: 369066 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20100119/99d0de37/attachment.bin>


More information about the Swift-user mailing list