since developing object oriented software is so cumbersome in C and we are all resistent to doing it in C++
jed at 59A2.org
Sat Dec 5 18:01:29 CST 2009
On Sat, 5 Dec 2009 16:50:38 -0600, Matthew Knepley <knepley at gmail.com> wrote:
> You assign a few threads per element to calculate the FEM
> integral. You could maintain this unassembled if you only need
You can also store it with much less memory as just values at quadrature
> However, if you want an actual sparse matrix, there are a couple of
> 1) Store the unassembled matrix, and run assembly after integration
> is complete. This needs more memory, but should perform well.
Fine, but how is this assembly done? If it's serial then it would be a
bottleneck, so you still need the concurrent thing below.
> 2) Use atmoic operations to update. I have not seen this yet, so I am
> unsure how is will perform.
Atomic operations could be used per-entry but this costs on the order of
100 cycles on CPUs. I think newer GPUs have atomics, but I don't know
the cost. Presumably it's at least as much as the latency of a read
When inserting in decent sized chunks, it would probably be worth taking
per-row or larger locks to amortize the cost of the atomics.
Additionally, you could statically partition the workload and only use
atomics for rows/entries that were shared.
More information about the petsc-dev