<div class="gmail_quote">On Tue, Nov 29, 2011 at 08:52, Dmitry Karpeev <span dir="ltr"><<a href="mailto:karpeev@mcs.anl.gov">karpeev@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div id=":1au">>From what I understand Barry doesn't want the threads to spin.</div></blockquote><div><br></div><div>Lots of MPI calls spin because it's MUCH lower latency. Unless we have more threads than cores, what is the problem?</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div id=":1au"><div>Also, synchronizing through an unguarded memory location seems to create a race conditions.</div>

</div></blockquote><div><br></div><div>Not if writes are atomic. There is always a way to do atomic writes (usually machine-word) because otherwise the operating system could not implement synchronization primitives.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div id=":1au"><div>Isn't cmpxchg instruction-set specific?</div></div></blockquote></div><br><div>

All instruction sets have some analogue of cmpxchg because it's the building block for all other primitives.</div>