[Swift-devel] Re: angle-1000 second run
Mihael Hategan
hategan at mcs.anl.gov
Tue Nov 6 11:32:05 CST 2007
On Tue, 2007-11-06 at 11:24 -0600, Michael Wilde wrote:
> It seems that the cluster problem is also due to the slow speed of input
> data file stage-in.
Sounds likely.
>
> It took 6 minutes to stage in 60 40MB input files to uc-tg
> (this is to NFS; I will try GPFS as well).
>
> So at 10 files per minute, if we check the cluster queue every 30
> seconds, that about 5 jobs per cluster on average, which explains what
> we're seeing.
>
> 10 fpm = 400MB/min = 6.5MB/sec. Note that Im submitting from the login
> node to the same cluster - seems very slow.
You should also factor in protocol latencies and various things like
directory creation/checks.
>
> I will test further and try to calibrate the expected speeds on a big file.
>
> - Mike
>
>
> On 11/6/07 10:19 AM, Michael Wilde wrote:
> >
> >>> 3. The cluster sizes were extremely small about 4 - should have been
> >>> 10-20 by
> >>> my calcs.
> >>
> >> Increase the cluster queue delay parameter from 4 to about 30
> >> (seconds). This will make Swift wait much longer before putting
> >> clusters together, which may allow more jobs to build up in the
> >> clustering queue.
> >
> > Previous run had this set to 10 seconds. The logs confirm that this was
> > the clustering period: the cluster size=4 message came out every 10
> > seconds.
> >
> >> Make sure that you havethe cluster maximum time and maxwalltimes for
> >> jobs set to sensible values, because large clusters will highlight
> >> misconfigurations there. In particular, note that the maximum cluster
> >> time in the config file needs to be (less than) half of the
> >> maxwalltime permitted for the site you submit to (so if you are
> >> allowewd to run 15 minute jobs, set the cluster maximum time to 7*60,
> >> for example).
> >
> > I set cluster max time to 1200 with a maxwalltime of 60 seconds.
> >
> > I will fiddle with this part with smaller runs till it works.
> >
> > Likely I have a config issue somewhere, or theres a bug.
> >
> >> Are you using the PBS provider or GRAM to submit?
> >
> > GRAM, gt2.
> >
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
More information about the Swift-devel
mailing list