[Swift-devel] Re: angle-1000 second run

Mihael Hategan hategan at mcs.anl.gov
Tue Nov 6 11:32:05 CST 2007


On Tue, 2007-11-06 at 11:24 -0600, Michael Wilde wrote:
> It seems that the cluster problem is also due to the slow speed of input 
> data file stage-in.

Sounds likely.

> 
> It took 6 minutes to stage in 60 40MB input files to uc-tg
> (this is to NFS; I will try GPFS as well).
> 
> So at 10 files per minute, if we check the cluster queue every 30 
> seconds, that about 5 jobs per cluster on average, which explains what 
> we're seeing.
> 
> 10 fpm = 400MB/min = 6.5MB/sec.  Note that Im submitting from the login 
> node to the same cluster - seems very slow.

You should also factor in protocol latencies and various things like
directory creation/checks.

> 
> I will test further and try to calibrate the expected speeds on a big file.
> 
> - Mike
> 
> 
> On 11/6/07 10:19 AM, Michael Wilde wrote:
> > 
> >>> 3. The cluster sizes were extremely small about 4 - should have been 
> >>> 10-20 by
> >>> my calcs.
> >>
> >> Increase the cluster queue delay parameter from 4 to about 30 
> >> (seconds). This will make Swift wait much longer before putting 
> >> clusters together, which may allow more jobs to build up in the 
> >> clustering queue.
> > 
> > Previous run had this set to 10 seconds. The logs confirm that this was 
> > the clustering period: the cluster size=4 message came out every 10 
> > seconds.
> > 
> >> Make sure that you havethe cluster maximum time and maxwalltimes for 
> >> jobs set to sensible values, because large clusters will highlight 
> >> misconfigurations there. In particular, note that the maximum cluster 
> >> time in the config file needs to be (less than) half of the 
> >> maxwalltime permitted for the site you submit to (so if you are 
> >> allowewd to run 15 minute jobs, set the cluster maximum time to 7*60, 
> >> for example).
> > 
> > I set cluster max time to 1200 with a maxwalltime of 60 seconds.
> > 
> > I will fiddle with this part with smaller runs till it works.
> > 
> > Likely I have a config issue somewhere, or theres a bug.
> > 
> >> Are you using the PBS provider or GRAM to submit?
> > 
> > GRAM, gt2.
> > 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 




More information about the Swift-devel mailing list