[petsc-users] reusing LU factorization?

Hong Zhang hzhang at mcs.anl.gov
Wed Jan 29 11:42:45 CST 2014


Sorry, I overlooked your attachment, which gives '-log_summary':
1.txt:
MatSolve               2 1.0 9.7397e-02 1.0 0.00e+00 0.0 5.4e+02 5.5e+03
6.0e+00  0  0 34 10 10   0  0 34 10 11     0
MatLUFactorSym         1 1.0 1.2882e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
7.0e+00  6  0  0  0 12   6  0  0  0 12     0
MatLUFactorNum         1 1.0 1.8813e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 90  0  0  0  2  90  0  0  0  2     0

2.txt:
MatSolve               2 1.0 1.0811e-01 1.0 0.00e+00 0.0 4.9e+02 6.1e+03
6.0e+00  0  0 31 10 10   0  0 31 10 11     0
MatLUFactorSym         1 1.0 1.8920e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
7.0e+00  8  0  0  0 12   8  0  0  0 12     0
MatLUFactorNum         1 1.0 2.1836e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 89  0  0  0  2  89  0  0  0  2     0

Again, only 1st solve calls LU factorization, which dominates as expected.
MatLUFactorSym is ignorable, but matrix ordering makes noticable effect. I
would stay with sequential MatLUFactorSym and experiment different matrix
orderings using '-mat_mumps_icntl_7 <>'.

Hong



On Wed, Jan 29, 2014 at 11:33 AM, Hong Zhang <hzhang at mcs.anl.gov> wrote:

> Tabrez:
>
>>  I am getting the opposite result, i.e., MUMPS becomes slower when using
>> ParMETIS for parallel ordering. What did I mess up? Is the problem too
>> small?
>>
>
> I saw similar performance when adding parallel symbolic factorization into
> petsc interface, thus I did not set
> parallel symbolic factorization as default for petsc/mumps interface.
> How large is your matrix?
>
> Can you send us output of '-log_summary' for these two runs?
>
> Hong
>
>>
>>
>> Case 1 took 24.731s
>>
>> $ rm -f *vtk; time mpiexec -n 16 ./defmod -f point.inp -pc_type lu
>> -pc_factor_mat_solver_package mumps -mat_mumps_icntl_4 1 -log_summary >
>> 1.txt
>>
>>
>> Case 2 with "-mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2" took 34.720s
>>
>> $ rm -f *vtk; time mpiexec -n 16 ./defmod -f point.inp -pc_type lu
>> -pc_factor_mat_solver_package mumps -mat_mumps_icntl_4 1 -log_summary
>> -mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2 > 2.txt
>>
>>
>> Both 1.txt and 2.txt are attached.
>>
>> Regards,
>>
>> Tabrez
>>
>>
>> On 01/29/2014 09:18 AM, Hong Zhang wrote:
>>
>> MUMPS now supports parallel symbolic factorization. With petsc-3.4
>> interface, you can use runtime option
>>
>>    -mat_mumps_icntl_28 <1>: ICNTL(28): use 1 for sequential analysis and
>> ictnl(7) ordering, or 2 for parallel analysis and ictnl(29) ordering
>>   -mat_mumps_icntl_29 <0>: ICNTL(29): parallel ordering 1 = ptscotch 2 =
>> parmetis
>>
>>  e.g, '-mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2' activates parallel
>> symbolic factorization with pametis for matrix ordering.
>> Give it a try and let us know what you get.
>>
>>  Hong
>>
>>
>> On Tue, Jan 28, 2014 at 5:48 PM, Smith, Barry F. <bsmith at mcs.anl.gov>wrote:
>>
>>>
>>> On Jan 28, 2014, at 5:39 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>
>>> > On Tue, Jan 28, 2014 at 5:25 PM, Tabrez Ali <stali at geology.wisc.edu>
>>> wrote:
>>> > Hello
>>> >
>>> > This is my observation as well (with MUMPS). The first solve (after
>>> assembly which is super fast) takes a few mins (for ~1 million unknowns on
>>> 12/24 cores) but from then on only a few seconds for each subsequent solve
>>> for each time step.
>>> >
>>> > Perhaps symbolic factorization in MUMPS is all serial?
>>> >
>>> > Yes, it is.
>>>
>>>     I missed this. I was just assuming a PETSc LU. Yes, I have no idea
>>> of relative time of symbolic and numeric for those other packages.
>>>
>>>   Barry
>>>  >
>>> >   Matt
>>> >
>>> > Like the OP I often do multiple runs on the same problem but I dont
>>> know if MUMPS or any other direct solver can save the symbolic
>>> factorization info to a file that perhaps can be utilized in subsequent
>>> reruns to avoid the costly "first solves".
>>> >
>>> > Tabrez
>>> >
>>> >
>>> > On 01/28/2014 04:04 PM, Barry Smith wrote:
>>> > On Jan 28, 2014, at 1:36 PM, David Liu<daveliu at mit.edu>  wrote:
>>> >
>>> > Hi, I'm writing an application that solves a sparse matrix many times
>>> using Pastix. I notice that the first solves takes a very long time,
>>> >    Is it the first "solve" or the first time you put values into that
>>> matrix that "takes a long time"? If you are not properly preallocating the
>>> matrix then the initial setting of values will be slow and waste memory.
>>>  See
>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html
>>> >
>>> >    The symbolic factorization is usually much faster than a numeric
>>> factorization so that is not the cause of the slow "first solve".
>>> >
>>> >     Barry
>>> >
>>> >
>>> >
>>> > while the subsequent solves are very fast. I don't fully understand
>>> what's going on behind the curtains, but I'm guessing it's because the very
>>> first solve has to read in the non-zero structure for the LU factorization,
>>> while the subsequent solves are faster because the nonzero structure
>>> doesn't change.
>>> >
>>> > My question is, is there any way to save the information obtained from
>>> the very first solve, so that the next time I run the application, the very
>>> first solve can be fast too (provided that I still have the same nonzero
>>> structure)?
>>> >
>>> >
>>> > --
>>> > No one trusts a model except the one who wrote it; Everyone trusts an
>>> observation except the one who made it- Harlow Shapley
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> > -- Norbert Wiener
>>>
>>>
>>
>>
>> --
>> No one trusts a model except the one who wrote it; Everyone trusts an observation except the one who made it- Harlow Shapley
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140129/7b83086f/attachment.html>


More information about the petsc-users mailing list