On Thu, Nov 24, 2011 at 4:10 PM, Jed Brown <span dir="ltr"><<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="im"><div class="gmail_quote">On Thu, Nov 24, 2011 at 16:01, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>  Yes, but they can only get access to that shared variable one at a time: first get's it, then second get's it, then third gets, .... Ok for a couple of cores but not for dozens.<br>

<br>

   Take a look at src/sys/objects/pthread.c for the various ways we have coded for "waking" the threads. Maybe I am missing something but this is the best Kerry and I could figure out.</div></blockquote></div><br>


</div><div>What exactly should I be looking at? Can't you have all the threads spin on a normal shared variable (not a mutex) that is only written by the thread that needs to spark them? Or use a fetch-and-add atomic if you want to keep track of how many are running or limit the number? The latter could use a tree to get logarithmic cost, but if they are stored next to each other, you would still have O(P) cache invalidations.</div>


</blockquote></div><br>This is one great reason that vectorization works and pthreads is crap. I am not totally sold on the thread block system, but<div>it looks like genius compared to pthreads. I would start there.</div>

<div><br></div><div>  Matt<br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>

</div>