<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Hi,<br>
<br>
I just had a look at the threaded version of MatMult_SeqAIJ and I
think the Flops logging might be incorrect, because the
nonzerorows aren't counted in MatMult_SeqAIJ_Kernel. Fixing this
in the thread kernel would require a reduction though, which could
impact performance. Is this a known problem, or is there a better
way to compute Flops, which doesn't require the nonzerorows?<br>
<br>
Alternatively, would it make sense to pre-compute the nonzerorows
and store them in the threadcomm? This might require more of the
AIJ data structure to be exposed to PetscLayoutSetUp /
PetscThreadCommGetOwnershipRanges though. <br>
<br>
Regards,<br>
Michael <br>
<br>
On 08/08/13 12:08, Matthew Knepley wrote:<br>
</div>
<blockquote
cite="mid:CAMYG4GmpxmiByzS5xnEc4rFZ7s3nSOra+RCQ8iT6A5QZCP2uxw@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<div dir="ltr">On Thu, Aug 8, 2013 at 5:37 AM, Michael Lange <span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:michael.lange@imperial.ac.uk" target="_blank">michael.lange@imperial.ac.uk</a>></span>
wrote:<br>
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
We have recently been trying to re-align our OpenMP fork (<a
moz-do-not-send="true"
href="https://bitbucket.org/ggorman/petsc-3.3-omp"
target="_blank">https://bitbucket.org/ggorman/petsc-3.3-omp</a>)
with petsc/master. Much of our early work has now been
superseded by the threadcomm implementations.
Nevertheless, there are still a few algorithmic
differences between the two branches:<br>
<br>
1) Enforcing MPI latency hiding by using task-based spMV:<br>
If the MPI implementation used does not actually provide
truly asynchronous communication in hardware, performance
can be increased by dedicating a single thread to
overlapping MPI communication in PETSc. However, this is
arguably a vendor-specific fix which requires significant
code changes (ie the parallel section needs to be raised
up by one level). So perhaps the strategy should be to
give guilty vendors a hard time rather than messing up the
current abstraction.<br>
<br>
2) Nonzero-based thread partitioning:<br>
Rather than evenly dividing the number of rows among
threads, we can partition the thread ownership ranges
according to the number of non-zeros in each row. This
balances the work load between threads and thus increases
strong scalability due to optimised bandwidth utilisation.
In general, this optimisation should integrate well with
threadcomms, since it only changes the thread ownership
ranges, but it does require some structural changes since
nnz is currently not passed to PetscLayoutSetUp. Any
thoughts on whether people regard such a scheme as useful
would be greatly appreciated.<br>
</blockquote>
<div><br>
</div>
<div>I think this should be handled by changing the AIJ data
structure. Going all the way to "2D" partitions would also
allow</div>
<div>us to handle power-law matrix graphs. This would keep
the thread implementation simple, and at the same time be
more</div>
<div>flexible.</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
3) MatMult_SeqBAIJ not threaded:<br>
Is there a reason why MatMult has not been threaded for
BAIJ matrices, or is somebody already working on this? If
not, I would like to prepare a pull request for this using
the same approach as MatMult_SeqAIJ.<br>
<br>
We would welcome any suggestions/feedback on this, in
particular the second point. Up to date benchmarking
results for the first two methods, including BlueGene/Q,
can be found in:<br>
<a moz-do-not-send="true"
href="http://arxiv.org/abs/1307.4567" target="_blank">http://arxiv.org/abs/1307.4567</a><br>
<br>
Kind regards,<br>
<br>
Michael Lange<br>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
What most experimenters take for granted before they begin
their experiments is infinitely more interesting than any
results to which their experiments lead.<br>
-- Norbert Wiener
</div>
</div>
</blockquote>
<br>
</body>
</html>