[petsc-users] Optimizing MatMatSolve

Mon Aug 1 16:57:08 CDT 2011

Agreed. According to your timings, it took ~130 seconds to invert a 3000 x
3000 matrix. With dense algorithms, the cost of LU with partial pivoting is
2/3 n^3, and performing triangle solves against n right hand sides requires
n^3 flops. Thus the total work is 5/3 n^3. With n=3000, this is roughly 45
billion flops. Nearly any machine can easily hit at least 50% of theoretical
peak on both of these operations, and thus, if your laptop has a theoretical
peak of 5 GFlops, it should take 18 seconds with a dense algorithm. This is
a pretty substantial improvement over what you currently are seeing, and
dense algorithms parallelize nicely since there are O(n^3) operations to
perform and only O(n^2) data.

On the other hand, when you increase your problem size to n ~= 30,000, the
asymptotic difference between sparse and dense algorithms will start to
become obvious. In 2d, sparse factorization is O(n^3/2) and the cost of
doing n triangle solves is O(n^2 log(n)), leading to a total complexity of
O(n^2 log(n)) versus the dense O(n^3) approach. For 3d problems, the cost of
the triangle solves again dominates at O(n^7/3) versus the direct O(n^3)
cost.

As a practical matter, according to the information you attached, you are
both running in debug mode and not using optimized computational kernels. If
you fix both of these issues, your performance should increase
substantially.

Jack

On Mon, Aug 1, 2011 at 4:43 PM, Aron Ahmadia <aron.ahmadia at kaust.edu.sa>wrote:

> What Matt is getting at is that typically we measure the computational
> difficulty of a problem as a function of the 'unknowns'.  If you are looking
> at turning a sparse matrix O(n) bytes into a dense inverse O(n^2) bytes,
> you've taken what was originally a potentially optimal problem and turned it
> into one of quadratic difficulty in terms of memory, and even more in terms
> of flops.  You may have better luck using an explicitly dense
> method/algorithm for computing the inverse: it will scale well in the sense
> that adding more processors will work faster (this is the heart of the
> LINPACK computation after all), but you will need a lot of core-hours and
> memory to get there...
>
> A
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110801/f03c9876/attachment.htm>