[Swift-user] Re: swift on jazz

Michael Wilde wilde at mcs.anl.gov
Thu Aug 27 11:55:51 CDT 2009


Marcin,

If what you're seeing is that Swift is not sending enough jobs to PBS, 
add the following to your sites.xml entry for Jazz/PBS:

     <profile namespace="karajan" key="jobThrottle">.24</profile>
     <profile namespace="karajan" key="initialScore">10000</profile>

This should cause Swift to queue up to 25 jobs at time to PBS.
The formula is nJobs = (jobThrottle*100)+1. I.e., for 30 jobs at a time, 
use .29; for 256 jobs at a time use 2.55

Swift tries to hide this from users and throttle automatically, but the 
algorithm still causes "surprises" and starts up very (too?) slowly so 
as to not overwhelm a cluster with jobs.

So you should be able to use the XML elements above to force Swift to go 
right to a specific level of parallelism.

- Mike

ps. I'll contact you off-list to set up a meeting.




On 8/27/09 10:24 AM, Marcin Hitczenko wrote:
> Hi,
> 
> I am actually observing the former, which is why I thought this might be
> controllable via swift.
> 
> I also have a few other basic questions regarding following job status and
> organization of all the output files. I think the easiest thing to do
> would be to look at my account together. Is there any way that we could
> meet this week?
> 
> Thanks,
> 
> Marcin
> 
>> Hi Marcin,
>>
>> I took the liberty of moving this thread to swift-user for others to
>> help me answer you, and for other users to benefit.
>>
>> On Jazz, are you observing that Swift only puts at most 2 jobs in the
>> Jazz PBS queue (where you can see them with "qstat") or that Swift puts
>> many jobs in the queue but only 2 run at a time?
>>
>> Assuming its the latter, you must be bumping in Jazz's scheduler policy
>> which is favoring multi-CPU jobs. If thats the case, then lets try
>> running the "coaster" provider which is specified in the sites.xml file.
>> (tc.data doesnt change).
>>
>> First, change your "jazz" entry in sites.xml from the PBS execution
>> provider:
>>
>>    <execution provider="pbs" url="none" />
>>
>> to the Coaster provider:
>>
>>    <execution provider="coaster" url="none" jobmanager="pbs:local" />
>>
>> This should work, although we may need to add additional XML
>> specifications for timilimits, accounts, and maybe queues.
>>
>> Then we expect to be applying a fix to the coaster rpovider tonight, so
>> we'll need to do a custom Swift build from the source repository after
>> that, and test the latest fix. The fix improves the throughput, but even
>> without it, you should see Swift requesting more CPUs from PBS in a
>> single job.
>>
>> I suggest getting started with this simple change, and we'll enhance it
>>   in stages to give you better performance and more parallelism.
>>
>> - Mike
>>
>>
>> On 8/26/09 3:20 PM, Marcin Hitczenko wrote:
>>
>>> ... I am running jobs on jazz and I noticed that jazz will only run
>>> at most two jobs at once for me (I have about 30), even though there are
>>> more nodes free and I am requiring only one node per job. Is there
>>> something I can do to change this? Would I have to change the tc.data or
>>> sites.xml file?
>>>
>>> Thanks,
>>>
>>> Marcin
> 



More information about the Swift-user mailing list