[petsc-dev] Performance of Petsc Cholesky vs LU

Fri Dec 2 22:59:10 CST 2011

On Dec 2, 2011, at 5:20 PM, Dave Nystrom wrote:

> Hi Jed,
> 
> I tried your suggestion below and it made a huge improvement.  Instead of
> taking about 6x as long per linear solve as using LU, it now takes between
> 35% to 80% longer than LU depending on which of the different linear solves
> I'm measuring.  So that is a big improvement.  Is there a reason why cholesky
> is not roughly the same as LU or perhaps a bit faster?

   You can look at the code for the two cases: MatLUFactorNumeric_SeqAIJ and MatCholeskyFactorNumeric_SeqAIJ.  The issue is that because of the "missing" values when doing Cholesky there is a lot more "data motion" during the computation and in modern systems "data motion" is slow relative to floating point. 

> 
> On a different note, all of my linear systems are sparse banded systems as a
> result of the problem being discretized on a 2d structured grid.  When I
> input the matrix elements to petsc, does petsc do any sort of analysis to
> figure out that my system is a sparse banded system?  Does petsc do the LU or
> Cholesky solve using other packages such as lapack or mkl/acml to do the band
> solve?  If not, would it be more reasonable for me to also interface my code
> to lapack and/or mkl/acml for the occasions where a direct band solve makes
> sense?

   The band is two large for direct solvers to take advantage of. 

   But for structured grid problems "generally" using an iterative solver will be much faster than a direct solver for larger problems. In particular multigrid.

   Barry

> 
> Thanks,
> 
> Dave Nystrom writes:
>> Jed Brown writes:
>>> On Tue, Nov 29, 2011 at 23:53, Dave Nystrom <dnystrom1 at comcast.net> wrote:
>>> 
>>>> I have a resistive mhd code that I have recently interfaced to petsc which
>>>> has 7 linear solves that are all symmetric.  I recently tried using -pc_type
>>>> cholesky -ksp_type preonly for a run and found that it was taking about 6
>>>> times as long per linear solve as when I was using -pc_type lu -ksp_type
>>>> preonly.
>>> 
>>> Try -pc_factor_mat_ordering_type nd
>>> 
>>>> I was wondering if that was reasonable behavior.  I would not have
>>>> thought that using a cholesky direct solve would take longer than an LU
>>>> direct solve in petsc for the serial case and was hoping it would be
>>>> faster.  Does this behavior seem reasonable?
>>> 
>>> Try this:
>>> 
>>> -pc_type cholesky -pc_factor_mat_ordering_type nd
>> 
>> Thanks.  I'll give this a try and report back on the results.
>> 
>>> Barry, why is natural ordering still the default for Cholesky? It is so
>>> slow that it is worthless.