<div class="gmail_quote">On Thu, Feb 3, 2011 at 16:17, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
In src/benchmarks/streams you can run make test and have it generate a report of how the streams benchmark is able to utilize the memory bandwidth. Run that and send us the output (run with just 2 threads).</blockquote></div>
<br><div>That test does no software prefetch, is not vectorized (look at the assembly, you want all movapd and addpd/mulpd with memory addresses instead of addsd/mulsd or addpd/mulpd operating only on register operands), and is not NUMA-aware (which depending on the hardware, can cause performance problems). The output is still relevant and indicates what can be done without tuning, but does not accurately represent the peak achievable by the hardware.</div>