[petsc-users] Optimizing MatMatSolve

Mon Aug 1 17:24:47 CDT 2011

Thanks for all the info. The impression I'm getting is I'm probably best off
splitting up the problem in such a way that each node does it's own
sequential solves. I believe this is feasible and relatively easy with my
problem. I'm doing tens of thousands of solves that should be able to be
done independently of one another.

Thanks for the tip about the configuration options, I'm planning on using
them once the code is more polished.

On Mon, Aug 1, 2011 at 5:57 PM, Jack Poulson <jack.poulson at gmail.com> wrote:

> Agreed. According to your timings, it took ~130 seconds to invert a 3000 x
> 3000 matrix. With dense algorithms, the cost of LU with partial pivoting is
> 2/3 n^3, and performing triangle solves against n right hand sides requires
> n^3 flops. Thus the total work is 5/3 n^3. With n=3000, this is roughly 45
> billion flops. Nearly any machine can easily hit at least 50% of theoretical
> peak on both of these operations, and thus, if your laptop has a theoretical
> peak of 5 GFlops, it should take 18 seconds with a dense algorithm. This is
> a pretty substantial improvement over what you currently are seeing, and
> dense algorithms parallelize nicely since there are O(n^3) operations to
> perform and only O(n^2) data.
>
> On the other hand, when you increase your problem size to n ~= 30,000, the
> asymptotic difference between sparse and dense algorithms will start to
> become obvious. In 2d, sparse factorization is O(n^3/2) and the cost of
> doing n triangle solves is O(n^2 log(n)), leading to a total complexity of
> O(n^2 log(n)) versus the dense O(n^3) approach. For 3d problems, the cost of
> the triangle solves again dominates at O(n^7/3) versus the direct O(n^3)
> cost.
>
> As a practical matter, according to the information you attached, you are
> both running in debug mode and not using optimized computational kernels. If
> you fix both of these issues, your performance should increase
> substantially.
>
> Jack
>
>
> On Mon, Aug 1, 2011 at 4:43 PM, Aron Ahmadia <aron.ahmadia at kaust.edu.sa>wrote:
>
>> What Matt is getting at is that typically we measure the computational
>> difficulty of a problem as a function of the 'unknowns'.  If you are looking
>> at turning a sparse matrix O(n) bytes into a dense inverse O(n^2) bytes,
>> you've taken what was originally a potentially optimal problem and turned it
>> into one of quadratic difficulty in terms of memory, and even more in terms
>> of flops.  You may have better luck using an explicitly dense
>> method/algorithm for computing the inverse: it will scale well in the sense
>> that adding more processors will work faster (this is the heart of the
>> LINPACK computation after all), but you will need a lot of core-hours and
>> memory to get there...
>>
>> A
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110801/92671b4a/attachment.htm>