[petsc-dev] Fwd: Heaps
Barry Smith
bsmith at mcs.anl.gov
Sat Jun 9 17:19:49 CDT 2012
Yup, there are truckloads of cool possibilities of alternatives here.
Barry
On Jun 9, 2012, at 4:35 PM, Jed Brown wrote:
> Forwarding here in case anyone has comments on this approach to MatMatMultSymbolic. The idea is to use a heap as a priority queue for the next nontrivial entry in the rows of B that need to be merged.
>
> The asymptotics are basically the same as if you have a list with O(log n) insertion. There are two versions, the _Heap uses the heap to make sure that all duplicate entries come to the top at the same time. The _BTHeap version uses a PetscBT to avoid putting duplicates into the heap (so each nonzero in the merged version involves one heap operation).
>
> It's possible that merging lists (accelerated by BT) is still the best option in the long run. It seems to be somewhat more memory-friendly. Alternatively, making the heap fatter might give enough of a boost to pay off.
>
> The attached callgrind/kcachegrind profiles were moved to
>
> http://i.imgur.com/j7yOh.png (btheap)
> http://i.imgur.com/1ORcH.png (standard)
>
>
> ---------- Forwarded message ----------
> From: Jed Brown <jedbrown at mcs.anl.gov>
> Date: Sat, Jun 9, 2012 at 4:18 PM
> Subject: Re: Heaps
> To: Hong Zhang <hzhang at mcs.anl.gov>
>
>
> On Sat, Jun 9, 2012 at 3:51 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
> Jed :
>
> Wow, you are so fast!
> I implemented a heap-based MatMatMultSymbolic_SeqAIJ_SeqAIJ_Heap like we discussed earlier, but it's slower than the PetscLLCondensed stuff.
>
> I left the code in case you want to look at it or in case the heap can be used elsewhere.
>
> http://petsc.cs.iit.edu/petsc/petsc-dev/rev/d2e6d8577ed8
> http://petsc.cs.iit.edu/petsc/petsc-dev/rev/437bcb349f01
>
> I'll look at this while supervising the high school kid :-)
> The algorithm gives optimal complexity than current linear search.
> We need investigate and optimize it though. This is a good project for
> the kid.
>
> I just pushed a better one which has competitive performance (now there is one heap operation per nonzero in C rather than one per entry in the rows of B that need to be merged).
>
> http://petsc.cs.iit.edu/petsc/petsc-dev/rev/27307154f71e
> http://petsc.cs.iit.edu/petsc/petsc-dev/rev/4ab803940a7f
>
> This could perhaps be optimized more.
>
> $ valgrind --tool=callgrind ./ex94 -f0 /home/jed/petsc/datafiles/matrices/arco3 -f1 /home/jed/petsc/datafiles/matrices/arco3 -viewer_binary_skip_info -matmatmult_btheap
>
> See attached profiles.
>
>
More information about the petsc-dev
mailing list