[petsc-users] block ILU(K) is slower than the point-wise version?
Kong, Fande
fande.kong at inl.gov
Tue Mar 7 16:21:00 CST 2017
On Tue, Mar 7, 2017 at 3:16 PM, Jed Brown <jed at jedbrown.org> wrote:
> Hong <hzhang at mcs.anl.gov> writes:
>
> > Fande,
> > Got it. Below are what I get:
>
> Is Fande using ILU(0) or ILU(k)? (And I think it should be possible to
> get a somewhat larger benefit.)
>
I am using ILU(0). Will it be much better to use ILU(k>0)?
Fande,
>
> > petsc/src/ksp/ksp/examples/tutorials (master)
> > $ ./ex10 -f0 binaryoutput -rhs 0 -mat_view ascii::ascii_info
> > Mat Object: 1 MPI processes
> > type: seqaij
> > rows=8019, cols=8019, bs=11
> > total: nonzeros=1890625, allocated nonzeros=1890625
> > total number of mallocs used during MatSetValues calls =0
> > using I-node routines: found 2187 nodes, limit used is 5
> > Number of iterations = 3
> > Residual norm 0.00200589
> >
> > -mat_type aij
> > MatMult 4 1.0 8.3621e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
> > 0.0e+00 6 7 0 0 0 7 7 0 0 0 1805
> > MatSolve 4 1.0 8.3971e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
> > 0.0e+00 6 7 0 0 0 7 7 0 0 0 1797
> > MatLUFactorNum 1 1.0 8.6171e-02 1.0 1.80e+08 1.0 0.0e+00 0.0e+00
> > 0.0e+00 57 85 0 0 0 70 85 0 0 0 2086
> > MatILUFactorSym 1 1.0 1.4951e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00 10 0 0 0 0 12 0 0 0 0 0
> >
> > -mat_type baij
> > MatMult 4 1.0 5.5540e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
> > 0.0e+00 4 5 0 0 0 7 5 0 0 0 2718
> > MatSolve 4 1.0 7.0803e-03 1.0 1.48e+07 1.0 0.0e+00 0.0e+00
> > 0.0e+00 5 5 0 0 0 8 5 0 0 0 2086
> > MatLUFactorNum 1 1.0 6.0118e-02 1.0 2.55e+08 1.0 0.0e+00 0.0e+00
> > 0.0e+00 42 89 0 0 0 72 89 0 0 0 4241
> > MatILUFactorSym 1 1.0 6.7251e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00 5 0 0 0 0 8 0 0 0 0 0
> >
> > I ran it on my macpro. baij is faster than aij in all routines.
> >
> > Hong
> >
> > On Tue, Mar 7, 2017 at 2:26 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> >
> >> Uploaded to google drive, and sent you links in another email. Not sure
> if
> >> it works or not.
> >>
> >> Fande,
> >>
> >> On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> >>
> >>>
> >>> It is too big for email you can post it somewhere so we can download
> >>> it.
> >>>
> >>>
> >>>
> >>> > On Mar 7, 2017, at 12:01 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> >>> >
> >>> >
> >>> >
> >>> > On Tue, Mar 7, 2017 at 10:23 AM, Hong <hzhang at mcs.anl.gov> wrote:
> >>> > I checked
> >>> > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
> >>> > they are virtually same. Why the version for BAIJ is so much slower?
> >>> > I'll investigate it.
> >>> >
> >>> > Fande,
> >>> > How large is your matrix? Is it possible to send us your matrix so I
> >>> can test it?
> >>> >
> >>> > Thanks, Hong,
> >>> >
> >>> > It is a 3020875x3020875 matrix, and it is large. I can make a small
> one
> >>> if you like, but not sure it will reproduce this issue or not.
> >>> >
> >>> > Fande,
> >>> >
> >>> >
> >>> >
> >>> > Hong
> >>> >
> >>> >
> >>> > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> >>> >
> >>> > Thanks. Even the symbolic is slower for BAIJ. I don't like that, it
> >>> definitely should not be since it is (at least should be) doing a
> symbolic
> >>> factorization on a symbolic matrix 1/11th the size!
> >>> >
> >>> > Keep us informed.
> >>> >
> >>> >
> >>> >
> >>> > > On Mar 6, 2017, at 5:44 PM, Kong, Fande <fande.kong at inl.gov>
> wrote:
> >>> > >
> >>> > > Thanks, Barry,
> >>> > >
> >>> > > Log info:
> >>> > >
> >>> > > AIJ:
> >>> > >
> >>> > > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00
> >>> 0.0e+00 0.0e+00 0 41 0 0 0 0 41 0 0 0 49594
> >>> > > MatLUFactorNum 25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00
> >>> 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 153394
> >>> > > MatILUFactorSym 13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00
> >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >>> > >
> >>> > > BAIJ:
> >>> > >
> >>> > > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00
> >>> 0.0e+00 0.0e+00 1 29 0 0 0 1 29 0 0 0 154617
> >>> > > MatLUFactorNum 25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00
> >>> 0.0e+00 0.0e+00 1 67 0 0 0 1 67 0 0 0 303190
> >>> > > MatILUFactorSym 13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00
> >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> >>> > >
> >>> > > It looks like both MatSolve and MatLUFactorNum are slower.
> >>> > >
> >>> > > I will try your suggestions.
> >>> > >
> >>> > > Fande
> >>> > >
> >>> > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith <bsmith at mcs.anl.gov>
> >>> wrote:
> >>> > >
> >>> > > Note also that if the 11 by 11 blocks are actually sparse (and
> you
> >>> don't store all the zeros in the blocks in the AIJ format) then then
> AIJ
> >>> non-block factorization involves less floating point operations and
> less
> >>> memory access so can be faster than the BAIJ format, depending on "how
> >>> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks
> with
> >>> AIJ (with zeros maybe in certain locations) then the above is not true.
> >>> > >
> >>> > >
> >>> > > > On Mar 6, 2017, at 5:10 PM, Barry Smith <bsmith at mcs.anl.gov>
> wrote:
> >>> > > >
> >>> > > >
> >>> > > > This is because for block size 11 it is using calls to
> >>> LAPACK/BLAS for the block operations instead of custom routines for
> that
> >>> block size.
> >>> > > >
> >>> > > > Here is what you need to do. For a good sized case run both
> with
> >>> -log_view and check the time spent in
> >>> > > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ
> and
> >>> BAIJ. If they have a different number of function calls then divide by
> the
> >>> function call count to determine the time per function call.
> >>> > > >
> >>> > > > This will tell you which routine needs to be optimized first
> >>> either MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> >>> > > >
> >>> > > > So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the
> >>> function MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function
> >>> MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for
> the
> >>> block size of 11.
> >>> > > >
> >>> > > > Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size
> is
> >>> 11 it uses the new routine something like.
> >>> > > >
> >>> > > > if (both_identity) {
> >>> > > > if (b->bs == 11)
> >>> > > > C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
> >>> > > > } else {
> >>> > > > C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
> >>> > > > }
> >>> > > >
> >>> > > > Rerun and look at the new -log_view. Send all three -log_view
> to
> >>> use at this point. If this optimization helps and now
> >>> > > > MatLUFactorNumeric is the time sink you can do the process to
> >>> MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size
> block
> >>> custom version.
> >>> > > >
> >>> > > > Barry
> >>> > > >
> >>> > > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande <fande.kong at inl.gov>
> >>> wrote:
> >>> > > >>
> >>> > > >>
> >>> > > >>
> >>> > > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan <
> >>> patrick.sanan at gmail.com> wrote:
> >>> > > >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande <fande.kong at inl.gov
> >
> >>> wrote:
> >>> > > >>> Hi All,
> >>> > > >>>
> >>> > > >>> I am solving a nonlinear system whose Jacobian matrix has a
> block
> >>> structure.
> >>> > > >>> More precisely, there is a mesh, and for each vertex there are
> 11
> >>> variables
> >>> > > >>> associated with it. I am using BAIJ.
> >>> > > >>>
> >>> > > >>> I thought block ILU(k) should be more efficient than the
> >>> point-wise ILU(k).
> >>> > > >>> After some numerical experiments, I found that the block ILU(K)
> >>> is much
> >>> > > >>> slower than the point-wise version.
> >>> > > >> Do you mean that it takes more iterations to converge, or that
> the
> >>> > > >> time per iteration is greater, or both?
> >>> > > >>
> >>> > > >> The number of iterations is very similar, but the timer per
> >>> iteration is greater.
> >>> > > >>
> >>> > > >>
> >>> > > >>>
> >>> > > >>> Any thoughts?
> >>> > > >>>
> >>> > > >>> Fande,
> >>> > > >>
> >>> > > >
> >>> > >
> >>> > >
> >>> >
> >>> >
> >>> >
> >>>
> >>>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170307/bd808028/attachment.html>
More information about the petsc-users
mailing list