[petsc-users] block ILU(K) is slower than the point-wise version?
Hong
hzhang at mcs.anl.gov
Tue Mar 7 12:15:24 CST 2017
Fande :
A small one, e.g., the size used by a sequential diagonal block for ilu
preconditioner would work.
Thanks,
Hong
>
>
> On Tue, Mar 7, 2017 at 10:23 AM, Hong <hzhang at mcs.anl.gov> wrote:
>
>> I checked
>> MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
>> they are virtually same. Why the version for BAIJ is so much slower?
>> I'll investigate it.
>>
>
>> Fande,
>> How large is your matrix? Is it possible to send us your matrix so I can
>> test it?
>>
>
> Thanks, Hong,
>
> It is a 3020875x3020875 matrix, and it is large. I can make a small one if
> you like, but not sure it will reproduce this issue or not.
>
> Fande,
>
>
>
>>
>> Hong
>>
>>
>> On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>>
>>> Thanks. Even the symbolic is slower for BAIJ. I don't like that, it
>>> definitely should not be since it is (at least should be) doing a symbolic
>>> factorization on a symbolic matrix 1/11th the size!
>>>
>>> Keep us informed.
>>>
>>>
>>>
>>> > On Mar 6, 2017, at 5:44 PM, Kong, Fande <fande.kong at inl.gov> wrote:
>>> >
>>> > Thanks, Barry,
>>> >
>>> > Log info:
>>> >
>>> > AIJ:
>>> >
>>> > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00
>>> 0.0e+00 0.0e+00 0 41 0 0 0 0 41 0 0 0 49594
>>> > MatLUFactorNum 25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00
>>> 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 153394
>>> > MatILUFactorSym 13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> >
>>> > BAIJ:
>>> >
>>> > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00
>>> 0.0e+00 0.0e+00 1 29 0 0 0 1 29 0 0 0 154617
>>> > MatLUFactorNum 25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00
>>> 0.0e+00 0.0e+00 1 67 0 0 0 1 67 0 0 0 303190
>>> > MatILUFactorSym 13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> >
>>> > It looks like both MatSolve and MatLUFactorNum are slower.
>>> >
>>> > I will try your suggestions.
>>> >
>>> > Fande
>>> >
>>> > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>> >
>>> > Note also that if the 11 by 11 blocks are actually sparse (and you
>>> don't store all the zeros in the blocks in the AIJ format) then then AIJ
>>> non-block factorization involves less floating point operations and less
>>> memory access so can be faster than the BAIJ format, depending on "how
>>> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with
>>> AIJ (with zeros maybe in certain locations) then the above is not true.
>>> >
>>> >
>>> > > On Mar 6, 2017, at 5:10 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>> > >
>>> > >
>>> > > This is because for block size 11 it is using calls to LAPACK/BLAS
>>> for the block operations instead of custom routines for that block size.
>>> > >
>>> > > Here is what you need to do. For a good sized case run both with
>>> -log_view and check the time spent in
>>> > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and
>>> BAIJ. If they have a different number of function calls then divide by the
>>> function call count to determine the time per function call.
>>> > >
>>> > > This will tell you which routine needs to be optimized first
>>> either MatLUFactorNumeric or MatSolve. My guess is MatSolve.
>>> > >
>>> > > So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the
>>> function MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function
>>> MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the
>>> block size of 11.
>>> > >
>>> > > Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is
>>> 11 it uses the new routine something like.
>>> > >
>>> > > if (both_identity) {
>>> > > if (b->bs == 11)
>>> > > C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
>>> > > } else {
>>> > > C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
>>> > > }
>>> > >
>>> > > Rerun and look at the new -log_view. Send all three -log_view to
>>> use at this point. If this optimization helps and now
>>> > > MatLUFactorNumeric is the time sink you can do the process to
>>> MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size
>>> block custom version.
>>> > >
>>> > > Barry
>>> > >
>>> > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande <fande.kong at inl.gov> wrote:
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan <
>>> patrick.sanan at gmail.com> wrote:
>>> > >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande <fande.kong at inl.gov>
>>> wrote:
>>> > >>> Hi All,
>>> > >>>
>>> > >>> I am solving a nonlinear system whose Jacobian matrix has a block
>>> structure.
>>> > >>> More precisely, there is a mesh, and for each vertex there are 11
>>> variables
>>> > >>> associated with it. I am using BAIJ.
>>> > >>>
>>> > >>> I thought block ILU(k) should be more efficient than the
>>> point-wise ILU(k).
>>> > >>> After some numerical experiments, I found that the block ILU(K) is
>>> much
>>> > >>> slower than the point-wise version.
>>> > >> Do you mean that it takes more iterations to converge, or that the
>>> > >> time per iteration is greater, or both?
>>> > >>
>>> > >> The number of iterations is very similar, but the timer per
>>> iteration is greater.
>>> > >>
>>> > >>
>>> > >>>
>>> > >>> Any thoughts?
>>> > >>>
>>> > >>> Fande,
>>> > >>
>>> > >
>>> >
>>> >
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170307/702b80c6/attachment.html>
More information about the petsc-users
mailing list