[petsc-users] block ILU(K) is slower than the point-wise version?
Fande Kong
fdkong.jd at gmail.com
Tue Mar 7 21:44:40 CST 2017
On Tue, Mar 7, 2017 at 7:55 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> I have run your larger matrix on my laptop with "default" optimization
> (so --with-debugging=0) this is what I get
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> Event Count Time (sec) Flop
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------
> ------------------------------------------------------------
>
> AIJ
>
> MatMult 5 1.0 7.7636e-02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 12 16 0 0 0 16 16 0 0 0 1830
> MatSolve 5 1.0 7.8164e-02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 12 16 0 0 0 16 16 0 0 0 1818
> MatLUFactorNum 1 1.0 2.3056e-01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 35 67 0 0 0 46 67 0 0 0 2580
> MatILUFactorSym 1 1.0 8.3201e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 13 0 0 0 0 17 0 0 0 0 0
>
> BAIJ
>
> MatMult 5 1.0 5.3482e-02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 6 6 0 0 0 9 6 0 0 0 2657
> MatSolve 5 1.0 6.2669e-02 1.0 1.39e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 7 6 0 0 0 11 6 0 0 0 2224
> MatLUFactorNum 1 1.0 3.7688e-01 1.0 2.12e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 40 88 0 0 0 66 88 0 0 0 5635
> MatILUFactorSym 1 1.0 4.4828e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 5 0 0 0 0 8 0 0 0 0 0
>
> So BAIJ symbolic is faster (which definitely should be). BAIJ MatMult and
> MatSolve are also faster, the numerical BAIJ factorization is slower.
>
> Providing custom code for block size 11 should definitely improve the
> performance of all three of these.
>
> I note that the number of iterations 5 is much less than in the case you
> emailed originally? Is this really the matrix of interest?
>
The matrix given to you is the matrix for the first nonlinear iteration of
the first time step. The number of iterations in the original email is for
all nonlinear iterations and all time steps.
Fande,
>
> Barry
>
> > On Mar 7, 2017, at 3:26 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> >
> >
> >
> > On Tue, Mar 7, 2017 at 2:07 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > The matrix is too small. Please post ONE big matrix
> >
> > I am using "-ksp_view_pmat binary" to save the matrix. How can I save
> the latest one only for a time-dependent problem?
> >
> >
> > Fande,
> >
> >
> >
> > > On Mar 7, 2017, at 2:26 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> > >
> > > Uploaded to google drive, and sent you links in another email. Not
> sure if it works or not.
> > >
> > > Fande,
> > >
> > > On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > >
> > > It is too big for email you can post it somewhere so we can
> download it.
> > >
> > >
> > > > On Mar 7, 2017, at 12:01 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> > > >
> > > >
> > > >
> > > > On Tue, Mar 7, 2017 at 10:23 AM, Hong <hzhang at mcs.anl.gov> wrote:
> > > > I checked
> > > > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
> > > > they are virtually same. Why the version for BAIJ is so much slower?
> > > > I'll investigate it.
> > > >
> > > > Fande,
> > > > How large is your matrix? Is it possible to send us your matrix so I
> can test it?
> > > >
> > > > Thanks, Hong,
> > > >
> > > > It is a 3020875x3020875 matrix, and it is large. I can make a small
> one if you like, but not sure it will reproduce this issue or not.
> > > >
> > > > Fande,
> > > >
> > > >
> > > >
> > > > Hong
> > > >
> > > >
> > > > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > > >
> > > > Thanks. Even the symbolic is slower for BAIJ. I don't like that,
> it definitely should not be since it is (at least should be) doing a
> symbolic factorization on a symbolic matrix 1/11th the size!
> > > >
> > > > Keep us informed.
> > > >
> > > >
> > > >
> > > > > On Mar 6, 2017, at 5:44 PM, Kong, Fande <fande.kong at inl.gov>
> wrote:
> > > > >
> > > > > Thanks, Barry,
> > > > >
> > > > > Log info:
> > > > >
> > > > > AIJ:
> > > > >
> > > > > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00
> 0.0e+00 0.0e+00 0 41 0 0 0 0 41 0 0 0 49594
> > > > > MatLUFactorNum 25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00
> 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 153394
> > > > > MatILUFactorSym 13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > > > >
> > > > > BAIJ:
> > > > >
> > > > > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00
> 0.0e+00 0.0e+00 1 29 0 0 0 1 29 0 0 0 154617
> > > > > MatLUFactorNum 25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00
> 0.0e+00 0.0e+00 1 67 0 0 0 1 67 0 0 0 303190
> > > > > MatILUFactorSym 13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> > > > >
> > > > > It looks like both MatSolve and MatLUFactorNum are slower.
> > > > >
> > > > > I will try your suggestions.
> > > > >
> > > > > Fande
> > > > >
> > > > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > > > >
> > > > > Note also that if the 11 by 11 blocks are actually sparse (and
> you don't store all the zeros in the blocks in the AIJ format) then then
> AIJ non-block factorization involves less floating point operations and
> less memory access so can be faster than the BAIJ format, depending on "how
> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with
> AIJ (with zeros maybe in certain locations) then the above is not true.
> > > > >
> > > > >
> > > > > > On Mar 6, 2017, at 5:10 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> > > > > >
> > > > > >
> > > > > > This is because for block size 11 it is using calls to
> LAPACK/BLAS for the block operations instead of custom routines for that
> block size.
> > > > > >
> > > > > > Here is what you need to do. For a good sized case run both
> with -log_view and check the time spent in
> > > > > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ
> and BAIJ. If they have a different number of function calls then divide by
> the function call count to determine the time per function call.
> > > > > >
> > > > > > This will tell you which routine needs to be optimized first
> either MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> > > > > >
> > > > > > So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the
> function MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function
> MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the
> block size of 11.
> > > > > >
> > > > > > Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size
> is 11 it uses the new routine something like.
> > > > > >
> > > > > > if (both_identity) {
> > > > > > if (b->bs == 11)
> > > > > > C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
> > > > > > } else {
> > > > > > C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
> > > > > > }
> > > > > >
> > > > > > Rerun and look at the new -log_view. Send all three -log_view
> to use at this point. If this optimization helps and now
> > > > > > MatLUFactorNumeric is the time sink you can do the process to
> MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block
> custom version.
> > > > > >
> > > > > > Barry
> > > > > >
> > > > > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande <fande.kong at inl.gov>
> wrote:
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan <
> patrick.sanan at gmail.com> wrote:
> > > > > >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande <fande.kong at inl.gov>
> wrote:
> > > > > >>> Hi All,
> > > > > >>>
> > > > > >>> I am solving a nonlinear system whose Jacobian matrix has a
> block structure.
> > > > > >>> More precisely, there is a mesh, and for each vertex there are
> 11 variables
> > > > > >>> associated with it. I am using BAIJ.
> > > > > >>>
> > > > > >>> I thought block ILU(k) should be more efficient than the
> point-wise ILU(k).
> > > > > >>> After some numerical experiments, I found that the block
> ILU(K) is much
> > > > > >>> slower than the point-wise version.
> > > > > >> Do you mean that it takes more iterations to converge, or that
> the
> > > > > >> time per iteration is greater, or both?
> > > > > >>
> > > > > >> The number of iterations is very similar, but the timer per
> iteration is greater.
> > > > > >>
> > > > > >>
> > > > > >>>
> > > > > >>> Any thoughts?
> > > > > >>>
> > > > > >>> Fande,
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170307/0a7132b3/attachment.html>
More information about the petsc-users
mailing list