[petsc-dev] number of mallocs inside KSP during factorization

Mon Dec 12 11:07:52 CST 2011

>> Matrix Object:   64 MPI processes
>>     type: mpiaij
>>     rows=1048944, cols=1048944
>>     total: nonzeros=7251312, allocated nonzeros=11554449
>>     total number of mallocs used during MatSetValues calls =1071
>>       not using I-node (on process 0) routines

    This indicates that preallocation was NOT done properly for the matrix. It needed to do many mallocs during the MatSetValues process. See the FAQ and its links for how to get that number down to zero and have faster code.

   Barry

On Dec 12, 2011, at 9:40 AM, Alexander Grayver wrote:

> Thanks,
> 
> I didn't this number concerns system matrix which was assembled outside.
> 
> Regards,
> Alexander
> 
> On 12.12.2011 15:09, Barry Smith wrote:
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#efficient-assembly
>> 
>> 
>> On Dec 12, 2011, at 5:35 AM, Alexander Grayver wrote:
>> 
>>> Hello,
>>> 
>>> I use PETSs with MUMPS and looking carefully at the -ksp_view -ksp_monitor results I see:
>>> 
>>> KSP Object:(fwd_) 64 MPI processes
>>>   type: preonly
>>>   maximum iterations=10000, initial guess is zero
>>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>>>   left preconditioning
>>>   using NONE norm type for convergence test
>>> PC Object:(fwd_) 64 MPI processes
>>>   type: cholesky
>>>     Cholesky: out-of-place factorization
>>>     tolerance for zero pivot 2.22045e-14
>>>     matrix ordering: natural
>>>     factor fill ratio given 0, needed 0
>>>       Factored matrix follows:
>>>         Matrix Object:         64 MPI processes
>>>           type: mpiaij
>>>           rows=1048944, cols=1048944
>>>           package used to perform factorization: mumps
>>>           total: nonzeros=1266866685, allocated nonzeros=1266866685
>>>           total number of mallocs used during MatSetValues calls =0
>>>             MUMPS run parameters:
>>>               SYM (matrix type):                   1
>>>               PAR (host participation):            1
>>>               ICNTL(1) (output for error):         6
>>>               ICNTL(2) (output of diagnostic msg): 0
>>>               ICNTL(3) (output for global info):   0
>>>               ICNTL(4) (level of printing):        0
>>>               ICNTL(5) (input mat struct):         0
>>>               ICNTL(6) (matrix prescaling):        0
>>>               ICNTL(7) (sequentia matrix ordering):5
>>>               ICNTL(8) (scalling strategy):        77
>>>               ICNTL(10) (max num of refinements):  0
>>>               ICNTL(11) (error analysis):          0
>>>               ICNTL(12) (efficiency control):                         1
>>>               ICNTL(13) (efficiency control):                         0
>>>               ICNTL(14) (percentage of estimated workspace increase): 30
>>>               ICNTL(18) (input mat struct):                           3
>>>               ICNTL(19) (Shur complement info):                       0
>>>               ICNTL(20) (rhs sparse pattern):                         0
>>>               ICNTL(21) (solution struct):                            1
>>>               ICNTL(22) (in-core/out-of-core facility):               0
>>>               ICNTL(23) (max size of memory can be allocated locally):0
>>>               ICNTL(24) (detection of null pivot rows):               0
>>>               ICNTL(25) (computation of a null space basis):          0
>>>               ICNTL(26) (Schur options for rhs or solution):          0
>>>               ICNTL(27) (experimental parameter):                     -8
>>>               ICNTL(28) (use parallel or sequential ordering):        2
>>>               ICNTL(29) (parallel ordering):                          0
>>>               ICNTL(30) (user-specified set of entries in inv(A)):    0
>>>               ICNTL(31) (factors is discarded in the solve phase):    0
>>>               ICNTL(33) (compute determinant):                        0
>>>               ...
>>>   linear system matrix = precond matrix:
>>>   Matrix Object:   64 MPI processes
>>>     type: mpiaij
>>>     rows=1048944, cols=1048944
>>>     total: nonzeros=7251312, allocated nonzeros=11554449
>>>     total number of mallocs used during MatSetValues calls =1071
>>>       not using I-node (on process 0) routines
>>> 
>>> The particularly interesting part are last 3 lines.
>>> Where do these mallocs come from? Is it possible to reduce this number?
>>> 
>>> Regards,
>>> Alexander
>>> 
>