<div class="gmail_quote">On Mon, Feb 27, 2012 at 16:31, Gerard Gorman <span dir="ltr"><<a href="mailto:g.gorman@imperial.ac.uk">g.gorman@imperial.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div id=":1y">I had a quick go at trying to get some sensible benchmarks for this but<br>
I was getting too much system noise. I am particularly interested in<br>
seeing if the overhead goes to zero if num_threads(1) is used.</div></blockquote><div> </div><div>What timing method did you use? I did not see overhead going to zero when num_threads goes to 1 when using GCC compilers, but Intel seems to do fairly well.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":1y"><div class="im">
<br>
</div>I'm surprised by this. I not aware of any compiler that doesn't have<br>
OpenMP support - and then you do not actually enable OpenMP compilers<br>
generally just ignore the pragma. Do you know of any compiler that does<br>
not have OpenMP support which will complain?<br></div></blockquote><div><br></div><div>Sean points out that omp.h might not be available, but that misses the point. As far as I know, recent mainstream compilers have enough sense to at least ignore these directives, but I'm sure there are still cases where it would be an issue. More importantly, #pragma was a misfeature that should never be used now that _Pragma() exists. The latter is better not just because it can be turned off, but because it can be manipulated using macros and can be explicitly compiled out.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":1y"><div class="im">
</div>This may not be flexible enough. You frequently want to have a parallel<br>
region, and then have multiple omp for's within that one region.</div></blockquote><div><br></div><div>PetscPragmaOMPObject(obj, parallel)</div><div>{</div><div>PetscPragmaOMP(whetever you normally write for this loop)</div>
<div>for (....) { }</div><div>...</div><div>and so on</div><div>}</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":1y"><div class="im"></div>
I think what you describe is close to Fig 3 of this paper written by<br>
your neighbours:<br>
<a href="http://greg.bronevetsky.com/papers/2008IWOMP.pdf" target="_blank">http://greg.bronevetsky.com/papers/2008IWOMP.pdf</a><br>
However, before making the implementation more complex, it would be good<br>
to benchmark the current approach and use a tool like likwid to measure<br>
the NUMA traffic so we can get a good handle on the costs.<br></div></blockquote><div><br></div><div>Sure.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div id=":1y"><div class="im">
</div>Well this is where the implementation details get richer and there are<br>
many options - they also become less portable. For example, what does<br>
all this mean for the sparc64 processors which are UMA.</div></blockquote><div><br></div><div>Delay to runtime, use an ignorant partition for UMA. (Blue Gene/Q is also essentially uniform.) But note that even with uniform memory, cache still makes it somewhat hierarchical.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":1y"> Not to mention<br>
Intel MIC which also supports OpenMP. I guess I am cautious about<br>
getting too bogged down with very invasive optimisations until we have<br>
benchmarked the basic approach which in a wide range of use cases will<br>
achieve good thread/page locality as illustrated previously.<br></div></blockquote></div><br><div>I guess I'm just interested in exposing enough semantic information to be able to schedule a few different ways using run-time (or, if absolutely necessary, configure-time) options. I don't want to have to revisit individual loops.</div>