<div class="gmail_quote">On Tue, Nov 23, 2010 at 10:32, Clemens Domanig <span dir="ltr">&lt;<a href="mailto:clemens.domanig@uibk.ac.at">clemens.domanig@uibk.ac.at</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div id=":9b">I&#39;m writing a FEM-program and at the moment I try to parallize the assembly of the stiffness-matrix. The idea would be if I for example use 4 threads I create 4 matrices and 4 rhs-vectors. Each thread fills its matrix and rhs-vector. At the end I add all matrices and vectors.<br>

</div></blockquote><div><br></div><div>Adding sparse matrices together is a relatively expensive operation, usually more expensive than assembling the matrix you wanted in the first place.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div id=":9b">

But the Petsc-manual says that Petsc is not thread-safe.<br></div></blockquote><div><br></div><div>There is a minor issue of some logging functions that are not thread safe.  Wrapping locks around some of those operations would not be a huge deal, but hasn&#39;t been done.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div id=":9b">

Is that the reason why it doesn&#39;t work?<br></div></blockquote><div><br></div><div>The matrix data structures cannot cheaply be mutated in a thread-safe way.  We would either need fine-grained locks which have more overhead, and are very tricky when the user does not preallocate correctly (requiring &quot;rollback&quot; logic), or a coarse-grained lock for the whole MatSetValues.  If you want to do multi-threaded assembly, you should just put your own lock around your call to MatSetValues.  As long as your physics does some work (e.g. is a bit more than a linear constant-coefficient problem on affine elements), this can work fine for a few threads, usually 8-16 or so, before serialization of MatSetValues becomes a bottleneck.</div>

<div><br></div><div>Note that you can also use multiple MPI processes to keep your cores busy, there have been a few discussions on this list, and petsc-dev, about the relative merits of threads versus processes.  If you tell us a bit about your problem, we may be able to predict how each will perform.  This helps to assess the importance of making PETSc thread-safe and making certain kernels perform well with threads, relative to other features.</div>

<div><br></div><div>Jed</div></div>