<div class="gmail_quote">On Tue, Nov 29, 2011 at 12:53, Dmitry Karpeev <span dir="ltr"><<a href="mailto:karpeev@mcs.anl.gov">karpeev@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div id=":1t5">I understand that any solution requires serialization: atomic reads by the spinning loop would serialize on that read,<div>while pthread_cond_wait requires mutex serialization.  sigwait serializes since it clears the signal atomically.</div>


<div>Ultimately, all three rely on atomicity of some sort, but mutexes apparently have higher overhead for it?</div><div>At least the stackoverflow page that suggests the sigwait solution reports a 40x improvement over the pthread_cond_wait </div>


<div>solution (admittedly, I don't know the details of the sigwait stuff).</div><div></div></div></blockquote></div><br><div>I wouldn't consider that stackoverflow post to be authoritative, but there is a large body of literature on lock-free synchronization. Unless there is resource contention, spinning is always the lowest latency choice. All multi-threaded cores have some sort of pause instruction because it's necessary to make that design useful. For this purpose, we should just spin.</div>

<div><br></div><div>Note that with sequential consistency, you can also provide information about the next work unit when you activate a thread. For example, you can pass a function pointer.</div>