[petsc-users] Matrix Free Method questions

Matthew Knepley knepley at gmail.com
Thu Sep 24 15:08:13 CDT 2020


On Thu, Sep 24, 2020 at 4:03 PM Blondel, Sophie via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi Barry,
>
> I probably should have sent this output before (with -log_view_memory to
> get an idea of where the vectors are created). I looked at it but it
> doesn't help me much...
>

Just quickie, there is 82M in 85 vectors, but your system has 1.5M
unknowns, so a single vector is about 12M. Thus, there are probably 5 or 6
systems vectors, and a bunch of small ones.

  Thanks,

    Matt


> Cheers,
>
> Sophie
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Wednesday, September 16, 2020 16:38
> *To:* Blondel, Sophie <sblondel at utk.edu>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>
> *Subject:* Re: [petsc-users] Matrix Free Method questions
>
>
>   Yikes, GAMG is using a lot of vectors. But many of these are much
> smaller vectors so not of major concern.
>
>   I think this will just have to be an ongoing  issue to see where the
> vectors are created internally and reuse or eliminate as many extra as
> possible.
>
>   The option -log_view_memory causes  the PETSc logging summary to print
> additional columns showing the memory allocated during the different events
> in PETSc. This can be useful to see "when" the memory is mostly created; it
> does not tell us "why" it is created but at least tells us were to look.
>
>   Barry
>
>
> On Sep 16, 2020, at 1:54 PM, Blondel, Sophie <sblondel at utk.edu> wrote:
>
> Hi Barry,
>
> I don't think we're explicitly creating many PETSc vectors in Xolotl.
> There is a global one created for the solution when the TS is set up, and
> local ones in RHSFunction and RHSJacobian; everywhere else we just get the
> array from it with DMDAVecGetArrayDOF and DMDAVecRestoreArrayDOF. I tried a
> few things to see if it changed the number of Vec from 85 (removing
> monitors, fewer time steps, fewer MPI tasks) but it stayed the same, except
> when I changed the PC option from "-fieldsplit_1_pc_type redundant" to
> "-fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres
> -fieldsplit_1_pc_gamg_threshold -1" where I got 10567 vectors.
>
> Cheers,
>
> Sophie
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Tuesday, September 15, 2020 18:37
> *To:* Blondel, Sophie <sblondel at utk.edu>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>
> *Subject:* Re: [petsc-users] Matrix Free Method questions
>
>
>   Sophie,
>
>     Great, everything looks good.
>
>     So the new version takes about 7 times longer, due to the relatively
> modest increase (about 25 percent) in the number of iterations from the
> poorer preconditioner convergence and the rest from the much slower
> matrix-vector product due to using matrix free instead of matrix based
> precondtioner. Both of these are expected.
>
>     The matrix is taking about 10% of the memory it used to require, also
> expected.
>
>      I noticed in the logging the memory for the vectors
>
>              Vector    85             85     82303208     0.
>               Matrix    15             15      8744032     0.
>
>    is substantial/huge, with the much smaller matrix memory the vector
> memory dominates.
>
>    It indicates 85 vectors are used. This is a large number, there are
> some needed for the TS (maybe 5?) and some needed for the KSP solve (maybe
> about 37) but I am not sure why there are so many. Perhaps this number
> could be reduced. Are there are lot of vectors created in the Xolotyl code?
> I would it could run with about 45 vectors.
>
>   Barry
>
>
>
>
> On Sep 15, 2020, at 5:12 PM, Blondel, Sophie <sblondel at utk.edu> wrote:
>
> Hi Barry,
>
> I fixed everything and re-ran the 4 cases in 1D. They took more time than
> before because I used the Kokkos serial backend on the Xolotl side instead
> of the CUDA one previously (long story short, I tried to update CUDA and
> messed up the whole installation). Step 4 looks much better than prevously,
> I was even able to remove
> MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) from the code and
> it ran without throwing errors. The log files are attached.
>
> Cheers,
>
> Sophie
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Friday, September 11, 2020 18:03
> *To:* Blondel, Sophie <sblondel at utk.edu>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>
> *Subject:* Re: [petsc-users] Matrix Free Method questions
>
>
>
> On Sep 11, 2020, at 7:45 AM, Blondel, Sophie <sblondel at utk.edu> wrote:
>
> Thank you Barry,
>
> Step 3 worked after I moved MatSetOption at the beginning of
> computeJacobian(). Attached is the updated log which is pretty similar to
> what I had before. Step 4 still uses many more iterations.
>
> I checked the Jacobian using -ksp_view_pmat ascii (on a simpler case), I
> can see the difference between step 3 and 4 is that the contribution from
> the reactions is not included in the step 4 Jacobian (as expected from the
> fact that I removed their setting from the code).
>
> Looking back at one of your previous email, you wrote "This routine should
> only compute the elements of the Jacobian needed for this reduced matrix
> Jacobian, so the diagonals and the diffusion/convection terms. ", does it
> mean that I should still include the contributions from the reactions that
> affect the pure diagonal terms?
>
>
>   Yes, you need to leave in everything that affects the diagonal otherwise
> the "Jacobi" preconditioner will not reflect the true Jacobi preconditioner
> and likely perform poorly.
>
>   Barry
>
>
> Cheers,
>
> Sophie
> ------------------------------
>
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Thursday, September 10, 2020 17:04
> *To:* Blondel, Sophie <sblondel at utk.edu>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>
> *Subject:* Re: [petsc-users] Matrix Free Method questions
>
>
>
> On Sep 10, 2020, at 2:46 PM, Blondel, Sophie <sblondel at utk.edu> wrote:
>
> Hi Barry,
>
> Going through the different changes again to understand what was going
> wrong with the last step, I discovered that my changes from 2 to 3
> (keeping only the pure diagonal for the reaction Jacobian setup and adding
> MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the
> sparsity of the matrix was correct but then the RHSJacobian method was
> wrong. I updated it
>
>
>    I'm not sure what you mean here. My hope was that in step 3 you won't
> need to change RHSJacobian at all (that is just for step 4).
>
> but now when I run step 3 again I get the following error:
>
> [2]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [2]PETSC ERROR: Argument out of range
> [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400,
> 316825) into matrix
> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032
>  GIT Date: 2020-09-01 13:07:58 -0500
> [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10
> 15:38:58 2020
> [2]PETSC ERROR: Configure options
> PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902
> --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no
> --with-shared-libraries --download-fblaslapack=1
> [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in
> /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c
> [2]PETSC ERROR: #2 MatSetValues() line 1392 in
> /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c
> [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in
> /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c
> [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in
> /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c
> PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions)
> failed.
>
> Because the RHSJacobian method is trying to update the elements
> corresponding to the reactions. I'm not sure I understood correctly what
> step 3 was supposed to be.
>
>
>   In step the three the RHSJacobian was suppose to be unchanged, only the
> option to ignore the "unneeded" Jacobian entries inside MatSetValues (set
> with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed
> (plus changing the DMDASetBlockFillsXXX argument).
>
>   The error message  Inserting a new nonzero at global row/column
> (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR
> is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is
> setting values the Jacobian values.
>
>  The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);)  is
> normally called inside the DMCreateMatrix() so I am not sure how they could
> be getting called in the wrong order but it seems somehow it is
>
>   When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);)
> in the code?  You can call it at the beginning of computeJacobian().
>
>   If this still doesn't work and you get the same error you can run in the
> debugger on one process and put a breakpoint for MatSetOptions() to found
> out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the
> apple cart. You should see MatSetOption() called at least twice and the
> last one should have the MAT_NEW_NONZERO_LOCATION flag.
>
>   Barry
>
>
>
>
>
> Cheers,
>
> Sophie
>
>
> ------------------------------
> *From:* Barry Smith <bsmith at petsc.dev>
> *Sent:* Friday, September 4, 2020 01:06
> *To:* Blondel, Sophie <sblondel at utk.edu>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>
> *Subject:* Re: [petsc-users] Matrix Free Method questions
>
>
>   Sophie,
>
>   Thanks.  I have started looking through the logs
>
>    The change to matrix-free multiple (from 1 to 2) which reduces the
> accuracy of the multiply to about half the digits is not surprising.
>
>     * It roughly doubles the time since doing the matrix-free product
> requires a function evaluation
>
>     * It increases the iteration count, but not significantly since the
> reduced precision of the multiple induces some additional linear iterations
>
>   The change from 2 to 3 (not storing the entire matrix)
>
>     * number of nonzeros goes from 49459966 to 1558766  = 3.15 percent so
> it succeds in not storing the unneeded part of the matrix
>
>     * the number of MatMult_MF goes from 2331 to 2418. I don't understand
> this, I expected it to be identical because it should be using the same
> preconditioner in 3 as in 2 and thus get the same convergence. Could
> possibility be due to the variability in convergence due to different runs
> with the matrix-free preconditioner preconditioner and not related to not
> storing the entire matrix.
>
>     * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial
> difference which is what I would expect
>
>     * the SNESSolve time goes from  5.0047e+02 to 4.3275e+02 about a 14
> percent drop which is reasonable because 3 doesn't spend as much time
> inserting matrix values (it still computes them but doesn't insert the ones
> we don't want for the preconditioner).
>
>   The change from 3 to 4
>
>     * something goes seriously wrong here. The total number of linear
> solve iterations goes from 2282 to 97403 so something has gone seriously
> wrong with the preconditioner, but since the preconditioner operations are
> the same it seems something has gone wrong with the new reduced
> preconditioner.
>
>  I think there is an error in computing the reduced matrix entries, that
> is the new compute Jacobian code is not computing the entries it needs to
> correctly.
>
>   To debug this you can run case 3 and case 4 for a single time step with
> -ksp_view_pmat  binary This should create a binary file with the initial
> Jacobian matrices in each. You can use Matlab or Python to do the
> difference in the matrices and see how possibly the new Jacobian
> computation code is not producing the  correct values in some locations.
>
>    Good luck,
>
>    Barry
>
>
>
>
> On Sep 3, 2020, at 12:26 PM, Blondel, Sophie <sblondel at utk.edu> wrote:
>
> Hi Barry,
>
> Attached are the log files for the 1D case, for each of the 4 steps. I
> don't know how I did it yesterday but the differences between steps look
> better today, except for step 4 that takes many more iterations and smaller
> time steps.
>
> Cheers,
>
> Sophie
> ------------------------------
>
> *De :* Barry Smith <bsmith at petsc.dev>
> *Envoyé :* mercredi 2 septembre 2020 15:53
> *À :* Blondel, Sophie <sblondel at utk.edu>
> *Cc :* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>
> *Objet :* Re: [petsc-users] Matrix Free Method questions
>
>
>
> On Sep 2, 2020, at 1:44 PM, Blondel, Sophie <sblondel at utk.edu> wrote:
>
> Thank you Barry,
>
> The code ran with your branch but it's much slower than running with the
> full Jacobian and Jacobi PC subtype (around 10 times slower). It is using
> less memory as expected. I tried step 2 as well and it's even slower.
>
>
>   Sophie,
>
>   That is puzzling. It should be using the same matrix in the solver so
> should be the same speed and the setup time should be a bit better since it
> does not form the full Jacobian. (We'll get to this later)
>
> The TS iteration for step 1 are the same as with full Jacobian. Let me
> know what I can look at to check if I've done something wrong.
>
>
>   We need to see if the  KSP iterations are pretty similar for four
> approaches (1) original code with Jacobi PC subtype (2) matrix free with
> Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the
> MatSetOption() to not store the entire Jacobian also with
> the -snes_mf_operator and (4) the new code that doesn't compute the
> unneeded part of the Jacobian also with the -snes_mf_operator
>
>   You could run each case with same 20 timesteps and -ts_monitor
> -ksp_monitor and -ts_view and send the four output files around.
>
>  Once we are sure the four cases are behaving as expected then you can get
> timings for them but let's not do that until we confirm the similar
> results. There could easily be a flaw in my reasoning or the PETSc code
> somewhere that affects the correctness so its best to check that first.
>
>
>   Barry
>
>
> Cheers,
>
> Sophie
> ------------------------------
> *De :* Barry Smith <bsmith at petsc.dev>
> *Envoyé :* mardi 1 septembre 2020 14:12
> *À :* Blondel, Sophie <sblondel at utk.edu>
> *Cc :* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>
> *Objet :* Re: [petsc-users] Matrix Free Method questions
>
>
>   Sophie,
>
>    Sorry, looks like an old bug in PETSc that was undetected due to lack
> of use. The code is trying to use the first of the two matrices to
> determine the preconditioner which won't work in your case since it is
> matrix-free. It should be using the second matrix.
>
>   I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this
> issue for you.
>
>   Barry
>
>
> On Sep 1, 2020, at 12:45 PM, Blondel, Sophie <sblondel at utk.edu> wrote:
>
> Hi Barry,
>
> I'm working through step 1) but I think I am doing something wrong. I'm
> using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing
> clusters (small He clusters here, from size 1 to 7) and all the diagonal
> entries. Then I added a few lines in the code:
> Mat mat;
> DMCreateMatrix(da, &mat);
> MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);
>
> When I try to run with the following options: -snes_mf_operator -ts_dt
> 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration
> -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type
> jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max
> 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover
> -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20
>
> I get an error:
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: No support for this operation for this object type
> [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal
> entries defined
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da
>  GIT Date: 2020-08-28 16:47:50 +0000
> [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by
> sophie Tue Sep  1 10:58:44 2020
> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc
> PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77
> --with-debugging=no --with-shared-libraries
> [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in
> /home/sophie/Code/petsc/src/mat/interface/matrix.c
> [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in
> /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c
> [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in
> /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c
> [0]PETSC ERROR: #4 PCSetUp() line 1009 in
> /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #5 KSPSetUp() line 406 in
> /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in
> /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #7 KSPSolve() line 889 in
> /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in
> /home/sophie/Code/petsc/src/snes/impls/ls/ls.c
> [0]PETSC ERROR: #9 SNESSolve() line 4524 in
> /home/sophie/Code/petsc/src/snes/interface/snes.c
> [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in
> /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c
> [0]PETSC ERROR: #11 TSStep() line 3731 in
> /home/sophie/Code/petsc/src/ts/interface/ts.c
> [0]PETSC ERROR: #12 TSSolve() line 4128 in
> /home/sophie/Code/petsc/src/ts/interface/ts.c
> PetscSolver::solve: TSSolve failed.
>
> Cheers,
>
> Sophie
> ------------------------------
> *De :* Barry Smith <bsmith at petsc.dev>
> *Envoyé :* lundi 31 août 2020 14:50
> *À :* Blondel, Sophie <sblondel at utk.edu>
> *Cc :* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>
> *Objet :* Re: [petsc-users] Matrix Free Method questions
>
>
>  Sophie,
>
>    Thanks.
>
>    The factor of 4 is lot, the 1.5 not so bad.
>
>    You will definitely want to retain the full matrix assembly codes for
> speed and to verify a reduced matrix version.
>
>    It is worth trying a "reduced matrix version" with matrix-free multiply
> based on these numbers. This reduced matrix Jacobian will only have the
> diagonals and all the terms connected to the cluster sizes that move. In
> other words you will be building just the part of the Jacobian needed for
> the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector
> product matrix free. (SOR requires all the Jacobian entries).
>
>    Fortunately this is hopefully pretty straightforward for this code. You
> will not have to change the structure of the main code at all.
>
>   Step 1) create a new "sparse matrix" that will be passed to
> DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the
> diagonal entries and also all the entries that are associated with the
> variables that diffuse. If I remember correctly these are just the smallest
> cluster size, plain Helium?
>
>   Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);
>
> Then you would run the code with -snes_mf_operator and the new PC subtype
> for Jacobi.
>
>   A test that the new reduced Jacobian is correct will be that you get
> almost the same iterations as the runs you just make using the PC subtype
> of Jacobi. Hopefully not slower and using a great deal less memory. The
> iterations will not be identical because of the matrix-free multiple.
>
>  Step 2) create a new version of the Jacobian computation routine. This
> routine should only compute the elements of the Jacobian needed for this
> reduced matrix Jacobian, so the diagonals and the diffusion/convection
> terms.
>
>    Again run with with -snes_mf_operator and the new PC subtype for Jacobi
> and you should again get the same convergence history.
>
>    I made two steps because it makes it easier to validate and debug to
> get the same results as before. The first step cheats in that it still
> computes the full Jacobian but ignores the entries that we don't need to
> store for the preconditioner. The second step is more efficient because it
> only computes the Jacobian entries needed for the preconditioner but it
> requires you going through the Jacobian code and making sure only the
> needed parts are computed.
>
>
>   If you have any questions please let me know.
>
>   Barry
>
>
>
>
> On Aug 31, 2020, at 1:13 PM, Blondel, Sophie <sblondel at utk.edu> wrote:
>
> Hi Barry,
>
> I ran the 2 cases to look at the effect of the Jacobi pre-conditionner:
>
>    - 1D with 200 grid points and 7759 DOF per grid point (for the PSI
>    application), for 20 TS: the factor between SOR and Jacobi is ~4 (976
>    MatMult for SOR and 4162 MatMult for Jacobi)
>    - 2D with 63x63 grid points and 4124 DOF per grid point (for the NE
>    application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi)
>
> Cheers,
>
> Sophie
> ------------------------------
> *De :* Barry Smith <bsmith at petsc.dev>
> *Envoyé :* vendredi 28 août 2020 18:31
> *À :* Blondel, Sophie <sblondel at utk.edu>
> *Cc :* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>
> *Objet :* Re: [petsc-users] Matrix Free Method questions
>
>
>
> On Aug 28, 2020, at 4:11 PM, Blondel, Sophie <sblondel at utk.edu> wrote:
>
> Thank you Jed and Barry,
>
> First, attached are the logs from the benchmark runs I did without
> (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to
> get the -log_view to work because I'm using push and pop for the options
> which means that PETSc is initialized with no argument so the command line
> argument was not taken into account, but I guess this is for a separate
> discussion.
>
> To answer questions about the current per-conditioners:
>
>    - I used the same pre-conditioner options as listed in my previous
>    email when I added the -snes_mf option; I did try to remove all the PC
>    related options at one point with the MF method but didn't see a change in
>    runtime so I put them back in
>    - this benchmark is for a 1D DMDA using 20 grid points; when running
>    in 2D or 3D I switch the PC options to: -pc_type fieldsplit
>    -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type
>    gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1
>
> I haven't tried a Jacobi PC instead of SOR, I will run a set of more
> realistic runs (1D and 2D) without MF but with Jacobi and report on it next
> week. When you say "iterations" do you mean what is given by -ksp_monitor?
>
>
>   Yes, the number of MatMult is a good enough surrogate.
>
>   So using matrix-free (which means no preconditioning) has
>
>   35846/160
>
> ans =
>
>   224.0375
>
>   or 224 as many iterations. So even for this modest 1d problem
> preconditioning is doing a great deal.
>
>   Barry
>
>
>
>
> Cheers,
>
> Sophie
> ------------------------------
> *De :* Barry Smith <bsmith at petsc.dev>
> *Envoyé :* vendredi 28 août 2020 12:12
> *À :* Blondel, Sophie <sblondel at utk.edu>
> *Cc :* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>
> *Objet :* Re: [petsc-users] Matrix Free Method questions
>
> *[External Email]*
>
>   Sophie,
>
>    This is exactly what i would expect. If you run with -ksp_monitor you
> will see the -snes_mf run takes many more iterations.
>
>    I am puzzled that the argument -pc_type fieldsplit did not stop the
> run since this is under normal circumstances not a viable preconditioner
> with -snes_mf. Did you also remove the -pc_type fieldsplit argument?
>
>    In order to see how one can avoid forming the entire matrix and use
> matrix-free to do the matrix-vector but still have an effective
> preconditioner let's look at what the current preconditioner options do.
>
>  -pc_fieldsplit_detect_coupling
>
>
> creates two sub-preconditioners, the first for all the variables and the
> second for those that are coupled by the matrix to variables in neighboring
> cells Since only the smallest cluster sizes have diffusion/advection this
> second set contains only the cluster size one variables.
>
> -fieldsplit_0_pc_type sor
>
>
> Runs SOR on all the variables; you can think of this as running SOR on the
> reactions, it is a pretty good preconditioner for the reactions since the
> reactions are local, per cell.
>
>  -fieldsplit_1_pc_type redundant
>
>
> This runs the default preconditioner (ILU) on just the variables that
> diffuse, i.e. the elliptic part. For smallish problems this is fine, for
> larger problems and 2d and 3d presumably you have also -redundant_pc_type
> gamg to use algebraic multigrid for the diffusion.  This part of the matrix
> will always need to be formed and used in the preconditioner. It  is very
> important since the diffusion is what brings in most of the
> ill-conditioning for larger problems into the linear system. Note that it
> only needs the matrix entries for the cluster size of 1 so it is very small
> compared to the entire sparse matrix.
>
> ----
>  The first preconditioner SOR requires ALL the matrix entries which are
> almost all (except for the diffusion terms) the coupling between different
> size clusters within a cell. Especially each cell has its own sparse matrix
> of the size of total number of clusters, it is sparse but not super sparse.
>
>  So the to significantly lower memory usage we need to remove the SOR and
> the storing of all the matrix entries but still have an efficient
> preconditioner for the "reaction" terms.
>
>  The simplest thing would be to use Jacobi instead of SOR for the first
> subpreconditioner since it only requires the diagonal entries in the
> matrix. But Jacobi is a worse preconditioner than SOR (since it totally
> ignores the matrix coupling) and sometimes can be much worse.
>
>   Before anyone writes additional code we need to know if doing something
> along these lines does not ruin the convergence that.
>
>  Have you used the same options as before but with  -fieldsplit_0_pc_type
> jacobi ? (Not using any matrix free). We need to get an idea of how many
> more linear iterations it requires (not time, comparing time won't be
> helpful for this exercise.) We also need this information for realistic
> size problems in 2 or 3 dimensions that you really want to run; for small
> problems this approach will work ok and give misleading information about
> what happens for large problems.
>
>   I suspect the iteration counts will shot up. Can you run some cases and
> see how the iteration counts change?
>
>   Based on that we can decide if we still retain "good convergence" by
> changing the SOR to Jacobi and then change the code to make this change
> efficient (basically by skipping the explicit computation of the reaction
> Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.)
>
>   Barry
>
>
>
>
>
>
>
>
>
> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hi everyone,
>
> I have been using PETSc for a few years with a fully implicit TS ARKIMEX
> method and am now exploring the matrix free method option. Here is the list
> of PETSc options I typically use: -ts_dt 1.0e-12
> -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time
> 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY
> -ts_exact_final_time stepover -fieldsplit_0_pc_type sor
> -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor
> -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100
>
> I started to compare the performance of the code without changing anything
> of the executable and simply adding "-snes_mf", I see a reduction of memory
> usage as expected and a benchmark that would usually take ~5min to run now
> takes ~50min. Reading the documentation I saw that there are a few option
> to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or
> switching to -snes_mf_type wp. I used and modified the values of each of
> these options separately but never saw a sizable change in runtime, is it
> expected?
>
> And are there other ways to make the matrix free method faster? I saw in
> the documentation that you can define your own per-conditioner for
> instance. Let me know if you need additional information about the PETSc
> setup in the application I use.
>
> Best,
>
> Sophie
>
>
> <log_mf.txt><log_std.txt>
>
>
> <log_1D_1.txt><log_1D_2.txt><log_1D_3.txt><log_1D_4.txt>
>
>
> <log_1D_3_bis.txt>
>
>
> <log_1D_1.txt><log_1D_2.txt><log_1D_3.txt><log_1D_4.txt>
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200924/0bf7e4c0/attachment-0001.html>


More information about the petsc-users mailing list