[petsc-users] block ILU(K) is slower than the point-wise version?

Tue Mar 7 15:17:21 CST 2017

Fande,
Got it. Below are what I get:

petsc/src/ksp/ksp/examples/tutorials (master)
$ ./ex10 -f0 binaryoutput -rhs 0 -mat_view ascii::ascii_info
Mat Object: 1 MPI processes
  type: seqaij
  rows=8019, cols=8019, bs=11
  total: nonzeros=1890625, allocated nonzeros=1890625
  total number of mallocs used during MatSetValues calls =0
    using I-node routines: found 2187 nodes, limit used is 5
Number of iterations =   3
Residual norm 0.00200589

-mat_type aij
MatMult                4 1.0 8.3621e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
0.0e+00  6  7  0  0  0   7  7  0  0  0  1805
MatSolve               4 1.0 8.3971e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
0.0e+00  6  7  0  0  0   7  7  0  0  0  1797
MatLUFactorNum         1 1.0 8.6171e-02 1.0 1.80e+08 1.0 0.0e+00 0.0e+00
0.0e+00 57 85  0  0  0  70 85  0  0  0  2086
MatILUFactorSym        1 1.0 1.4951e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 10  0  0  0  0  12  0  0  0  0     0

-mat_type baij
MatMult                4 1.0 5.5540e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
0.0e+00  4  5  0  0  0   7  5  0  0  0  2718
MatSolve               4 1.0 7.0803e-03 1.0 1.48e+07 1.0 0.0e+00 0.0e+00
0.0e+00  5  5  0  0  0   8  5  0  0  0  2086
MatLUFactorNum         1 1.0 6.0118e-02 1.0 2.55e+08 1.0 0.0e+00 0.0e+00
0.0e+00 42 89  0  0  0  72 89  0  0  0  4241
MatILUFactorSym        1 1.0 6.7251e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  5  0  0  0  0   8  0  0  0  0     0

I ran it on my macpro. baij is faster than aij in all routines.

Hong

On Tue, Mar 7, 2017 at 2:26 PM, Kong, Fande <fande.kong at inl.gov> wrote:

> Uploaded to google drive, and sent you links in another email. Not sure if
> it works or not.
>
> Fande,
>
> On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>>    It is too big for email you can post it somewhere so we can download
>> it.
>>
>>
>>
>> > On Mar 7, 2017, at 12:01 PM, Kong, Fande <fande.kong at inl.gov> wrote:
>> >
>> >
>> >
>> > On Tue, Mar 7, 2017 at 10:23 AM, Hong <hzhang at mcs.anl.gov> wrote:
>> > I checked
>> > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
>> > they are virtually same. Why the version for BAIJ is so much slower?
>> > I'll investigate it.
>> >
>> > Fande,
>> > How large is your matrix? Is it possible to send us your matrix so I
>> can test it?
>> >
>> > Thanks, Hong,
>> >
>> > It is a 3020875x3020875 matrix, and it is large. I can make a small one
>> if you like, but not sure it will reproduce this issue or not.
>> >
>> > Fande,
>> >
>> >
>> >
>> > Hong
>> >
>> >
>> > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> >
>> >   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it
>> definitely should not be since it is (at least should be) doing a symbolic
>> factorization on a symbolic matrix 1/11th the size!
>> >
>> >    Keep us informed.
>> >
>> >
>> >
>> > > On Mar 6, 2017, at 5:44 PM, Kong, Fande <fande.kong at inl.gov> wrote:
>> > >
>> > > Thanks, Barry,
>> > >
>> > > Log info:
>> > >
>> > > AIJ:
>> > >
>> > > MatSolve             850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00
>> 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
>> > > MatLUFactorNum        25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00
>> 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
>> > > MatILUFactorSym       13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> > >
>> > > BAIJ:
>> > >
>> > > MatSolve             826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00
>> 0.0e+00 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
>> > > MatLUFactorNum        25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00
>> 0.0e+00 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
>> > > MatILUFactorSym       13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> > >
>> > > It looks like both MatSolve and MatLUFactorNum are slower.
>> > >
>> > > I will try your suggestions.
>> > >
>> > > Fande
>> > >
>> > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> > >
>> > >   Note also that if the 11 by 11 blocks are actually sparse (and you
>> don't store all the zeros in the blocks in the AIJ format) then then AIJ
>> non-block factorization involves less floating point operations and less
>> memory access so can be faster than the BAIJ format, depending on "how
>> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with
>> AIJ (with zeros maybe in certain locations) then the above is not true.
>> > >
>> > >
>> > > > On Mar 6, 2017, at 5:10 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> > > >
>> > > >
>> > > >   This is because for block size 11 it is using calls to
>> LAPACK/BLAS for the block operations instead of custom routines for that
>> block size.
>> > > >
>> > > >   Here is what you need to do. For a good sized case run both with
>> -log_view and check the time spent in
>> > > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and
>> BAIJ. If they have a different number of function calls then divide by the
>> function call count to determine the time per function call.
>> > > >
>> > > >   This will tell you which routine needs to be optimized first
>> either MatLUFactorNumeric or MatSolve. My guess is MatSolve.
>> > > >
>> > > >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the
>> function MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function
>> MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the
>> block size of 11.
>> > > >
>> > > >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is
>> 11 it uses the new routine something like.
>> > > >
>> > > > if (both_identity) {
>> > > >   if (b->bs == 11)
>> > > >    C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
>> > > >   } else {
>> > > >    C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
>> > > >   }
>> > > >
>> > > >   Rerun and look at the new -log_view. Send all three -log_view to
>> use at this point.  If this optimization helps and now
>> > > > MatLUFactorNumeric is the time sink you can do the process to
>> MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block
>> custom version.
>> > > >
>> > > >  Barry
>> > > >
>> > > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande <fande.kong at inl.gov>
>> wrote:
>> > > >>
>> > > >>
>> > > >>
>> > > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan <
>> patrick.sanan at gmail.com> wrote:
>> > > >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande <fande.kong at inl.gov>
>> wrote:
>> > > >>> Hi All,
>> > > >>>
>> > > >>> I am solving a nonlinear system whose Jacobian matrix has a block
>> structure.
>> > > >>> More precisely, there is a mesh, and for each vertex there are 11
>> variables
>> > > >>> associated with it. I am using BAIJ.
>> > > >>>
>> > > >>> I thought block ILU(k) should be more efficient than the
>> point-wise ILU(k).
>> > > >>> After some numerical experiments, I found that the block ILU(K)
>> is much
>> > > >>> slower than the point-wise version.
>> > > >> Do you mean that it takes more iterations to converge, or that the
>> > > >> time per iteration is greater, or both?
>> > > >>
>> > > >> The number of iterations is very similar, but the timer per
>> iteration is greater.
>> > > >>
>> > > >>
>> > > >>>
>> > > >>> Any thoughts?
>> > > >>>
>> > > >>> Fande,
>> > > >>
>> > > >
>> > >
>> > >
>> >
>> >
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20170307/2b0ee87b/attachment-0001.html>