<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 7, 2017 at 3:16 PM, Jed Brown <span dir="ltr"><<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Hong <<a href="mailto:hzhang@mcs.anl.gov">hzhang@mcs.anl.gov</a>> writes:<br>
<br>
> Fande,<br>
> Got it. Below are what I get:<br>
<br>
</span>Is Fande using ILU(0) or ILU(k)? (And I think it should be possible to<br>
get a somewhat larger benefit.)<br></blockquote><div><br><br></div><div>I am using ILU(0). Will it be much better to use ILU(k>0)?<br><br></div><div>Fande,<br></div><div><br> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5"><br>
> petsc/src/ksp/ksp/examples/<wbr>tutorials (master)<br>
> $ ./ex10 -f0 binaryoutput -rhs 0 -mat_view ascii::ascii_info<br>
> Mat Object: 1 MPI processes<br>
> type: seqaij<br>
> rows=8019, cols=8019, bs=11<br>
> total: nonzeros=1890625, allocated nonzeros=1890625<br>
> total number of mallocs used during MatSetValues calls =0<br>
> using I-node routines: found 2187 nodes, limit used is 5<br>
> Number of iterations = 3<br>
> Residual norm 0.00200589<br>
><br>
> -mat_type aij<br>
> MatMult 4 1.0 8.3621e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00<br>
> 0.0e+00 6 7 0 0 0 7 7 0 0 0 1805<br>
> MatSolve 4 1.0 8.3971e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00<br>
> 0.0e+00 6 7 0 0 0 7 7 0 0 0 1797<br>
> MatLUFactorNum 1 1.0 8.6171e-02 1.0 1.80e+08 1.0 0.0e+00 0.0e+00<br>
> 0.0e+00 57 85 0 0 0 70 85 0 0 0 2086<br>
> MatILUFactorSym 1 1.0 1.4951e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00<br>
> 0.0e+00 10 0 0 0 0 12 0 0 0 0 0<br>
><br>
> -mat_type baij<br>
> MatMult 4 1.0 5.5540e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00<br>
> 0.0e+00 4 5 0 0 0 7 5 0 0 0 2718<br>
> MatSolve 4 1.0 7.0803e-03 1.0 1.48e+07 1.0 0.0e+00 0.0e+00<br>
> 0.0e+00 5 5 0 0 0 8 5 0 0 0 2086<br>
> MatLUFactorNum 1 1.0 6.0118e-02 1.0 2.55e+08 1.0 0.0e+00 0.0e+00<br>
> 0.0e+00 42 89 0 0 0 72 89 0 0 0 4241<br>
> MatILUFactorSym 1 1.0 6.7251e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00<br>
> 0.0e+00 5 0 0 0 0 8 0 0 0 0 0<br>
><br>
> I ran it on my macpro. baij is faster than aij in all routines.<br>
><br>
> Hong<br>
><br>
> On Tue, Mar 7, 2017 at 2:26 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov">fande.kong@inl.gov</a>> wrote:<br>
><br>
>> Uploaded to google drive, and sent you links in another email. Not sure if<br>
>> it works or not.<br>
>><br>
>> Fande,<br>
>><br>
>> On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
>><br>
>>><br>
>>> It is too big for email you can post it somewhere so we can download<br>
>>> it.<br>
>>><br>
>>><br>
>>><br>
>>> > On Mar 7, 2017, at 12:01 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov">fande.kong@inl.gov</a>> wrote:<br>
>>> ><br>
>>> ><br>
>>> ><br>
>>> > On Tue, Mar 7, 2017 at 10:23 AM, Hong <<a href="mailto:hzhang@mcs.anl.gov">hzhang@mcs.anl.gov</a>> wrote:<br>
>>> > I checked<br>
>>> > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),<br>
>>> > they are virtually same. Why the version for BAIJ is so much slower?<br>
>>> > I'll investigate it.<br>
>>> ><br>
>>> > Fande,<br>
>>> > How large is your matrix? Is it possible to send us your matrix so I<br>
>>> can test it?<br>
>>> ><br>
>>> > Thanks, Hong,<br>
>>> ><br>
>>> > It is a 3020875x3020875 matrix, and it is large. I can make a small one<br>
>>> if you like, but not sure it will reproduce this issue or not.<br>
>>> ><br>
>>> > Fande,<br>
>>> ><br>
>>> ><br>
>>> ><br>
>>> > Hong<br>
>>> ><br>
>>> ><br>
>>> > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
>>> ><br>
>>> > Thanks. Even the symbolic is slower for BAIJ. I don't like that, it<br>
>>> definitely should not be since it is (at least should be) doing a symbolic<br>
>>> factorization on a symbolic matrix 1/11th the size!<br>
>>> ><br>
>>> > Keep us informed.<br>
>>> ><br>
>>> ><br>
>>> ><br>
>>> > > On Mar 6, 2017, at 5:44 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov">fande.kong@inl.gov</a>> wrote:<br>
>>> > ><br>
>>> > > Thanks, Barry,<br>
>>> > ><br>
>>> > > Log info:<br>
>>> > ><br>
>>> > > AIJ:<br>
>>> > ><br>
>>> > > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00<br>
>>> 0.0e+00 0.0e+00 0 41 0 0 0 0 41 0 0 0 49594<br>
>>> > > MatLUFactorNum 25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00<br>
>>> 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 153394<br>
>>> > > MatILUFactorSym 13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00<br>
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
>>> > ><br>
>>> > > BAIJ:<br>
>>> > ><br>
>>> > > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00<br>
>>> 0.0e+00 0.0e+00 1 29 0 0 0 1 29 0 0 0 154617<br>
>>> > > MatLUFactorNum 25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00<br>
>>> 0.0e+00 0.0e+00 1 67 0 0 0 1 67 0 0 0 303190<br>
>>> > > MatILUFactorSym 13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00<br>
>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
>>> > ><br>
>>> > > It looks like both MatSolve and MatLUFactorNum are slower.<br>
>>> > ><br>
>>> > > I will try your suggestions.<br>
>>> > ><br>
>>> > > Fande<br>
>>> > ><br>
>>> > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>><br>
>>> wrote:<br>
>>> > ><br>
>>> > > Note also that if the 11 by 11 blocks are actually sparse (and you<br>
>>> don't store all the zeros in the blocks in the AIJ format) then then AIJ<br>
>>> non-block factorization involves less floating point operations and less<br>
>>> memory access so can be faster than the BAIJ format, depending on "how<br>
>>> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with<br>
>>> AIJ (with zeros maybe in certain locations) then the above is not true.<br>
>>> > ><br>
>>> > ><br>
>>> > > > On Mar 6, 2017, at 5:10 PM, Barry Smith <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
>>> > > ><br>
>>> > > ><br>
>>> > > > This is because for block size 11 it is using calls to<br>
>>> LAPACK/BLAS for the block operations instead of custom routines for that<br>
>>> block size.<br>
>>> > > ><br>
>>> > > > Here is what you need to do. For a good sized case run both with<br>
>>> -log_view and check the time spent in<br>
>>> > > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and<br>
>>> BAIJ. If they have a different number of function calls then divide by the<br>
>>> function call count to determine the time per function call.<br>
>>> > > ><br>
>>> > > > This will tell you which routine needs to be optimized first<br>
>>> either MatLUFactorNumeric or MatSolve. My guess is MatSolve.<br>
>>> > > ><br>
>>> > > > So edit src/mat/impls/baij/seq/<wbr>baijsolvnat.c and copy the<br>
>>> function MatSolve_SeqBAIJ_15_<wbr>NaturalOrdering_ver1() to a new function<br>
>>> MatSolve_SeqBAIJ_11_<wbr>NaturalOrdering_ver1. Edit the new function for the<br>
>>> block size of 11.<br>
>>> > > ><br>
>>> > > > Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is<br>
>>> 11 it uses the new routine something like.<br>
>>> > > ><br>
>>> > > > if (both_identity) {<br>
>>> > > > if (b->bs == 11)<br>
>>> > > > C->ops->solve = MatSolve_SeqBAIJ_11_<wbr>NaturalOrdering_ver1;<br>
>>> > > > } else {<br>
>>> > > > C->ops->solve = MatSolve_SeqBAIJ_N_<wbr>NaturalOrdering;<br>
>>> > > > }<br>
>>> > > ><br>
>>> > > > Rerun and look at the new -log_view. Send all three -log_view to<br>
>>> use at this point. If this optimization helps and now<br>
>>> > > > MatLUFactorNumeric is the time sink you can do the process to<br>
>>> MatLUFactorNumeric_SeqBAIJ_15_<wbr>NaturalOrdering() to make an 11 size block<br>
>>> custom version.<br>
>>> > > ><br>
>>> > > > Barry<br>
>>> > > ><br>
>>> > > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov">fande.kong@inl.gov</a>><br>
>>> wrote:<br>
>>> > > >><br>
>>> > > >><br>
>>> > > >><br>
>>> > > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan <<br>
>>> <a href="mailto:patrick.sanan@gmail.com">patrick.sanan@gmail.com</a>> wrote:<br>
>>> > > >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande <<a href="mailto:fande.kong@inl.gov">fande.kong@inl.gov</a>><br>
>>> wrote:<br>
>>> > > >>> Hi All,<br>
>>> > > >>><br>
>>> > > >>> I am solving a nonlinear system whose Jacobian matrix has a block<br>
>>> structure.<br>
>>> > > >>> More precisely, there is a mesh, and for each vertex there are 11<br>
>>> variables<br>
>>> > > >>> associated with it. I am using BAIJ.<br>
>>> > > >>><br>
>>> > > >>> I thought block ILU(k) should be more efficient than the<br>
>>> point-wise ILU(k).<br>
>>> > > >>> After some numerical experiments, I found that the block ILU(K)<br>
>>> is much<br>
>>> > > >>> slower than the point-wise version.<br>
>>> > > >> Do you mean that it takes more iterations to converge, or that the<br>
>>> > > >> time per iteration is greater, or both?<br>
>>> > > >><br>
>>> > > >> The number of iterations is very similar, but the timer per<br>
>>> iteration is greater.<br>
>>> > > >><br>
>>> > > >><br>
>>> > > >>><br>
>>> > > >>> Any thoughts?<br>
>>> > > >>><br>
>>> > > >>> Fande,<br>
>>> > > >><br>
>>> > > ><br>
>>> > ><br>
>>> > ><br>
>>> ><br>
>>> ><br>
>>> ><br>
>>><br>
>>><br>
>><br>
</div></div></blockquote></div><br></div></div>