[Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules

Ian Foster foster at mcs.anl.gov
Tue Jul 17 22:35:24 CDT 2007


Great! What resource acquisition policy are you using?
>> b) Why did it take so long to get all of the workers working?
> I finally had enough confidence in the dynamic resource provisioning 
> that we won't loose any jobs across resource allocation boundaries 
> (ran lots of tests and they were all positive), so I enabled it for 
> this run.  I set the max to be the entire ANL site (274 processors)... 
> and we got 146 at the beginning, and with time, the # of processors 
> kept increasing up to the peak of 208 or so... the rest up to 274 were 
> queued up in the PBS wait queue.  The difference between the beginning 
> with 146 and the end with 208 was that others who were in the system 
> at the beginning finished their work and released some nodes, and idle 
> processors went from the wait queue into the run queue.  I would 
> actually be curious to try out the latest DRP stuff on a busy site, 
> such as Purdue or NCSA, and to see if we can maintain a nice pool size 
> over a period of time, despite the sites being busy...
>
> BTW, in the previous runs for MolDyn, we normally set the min and max 
> to say 100 processors, or 200 processors, and we would wait until we 
> had all of them before we started... sometimes, this meant waiting 
> 12~24 hours for enough nodes to become free so the large job could 
> start.  With DRP, you can start off with whatever the site has 
> available, and you get more with time as your jobs make it through the 
> wait queue and other jobs that are running complete...




More information about the Swift-devel mailing list