[petsc-users] Matrix Free Method questions
Barry Smith
bsmith at petsc.dev
Thu Sep 24 15:44:35 CDT 2020
> On Sep 24, 2020, at 3:08 PM, Matthew Knepley <knepley at gmail.com> wrote:
>
> On Thu, Sep 24, 2020 at 4:03 PM Blondel, Sophie via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
> Hi Barry,
>
> I probably should have sent this output before (with -log_view_memory to get an idea of where the vectors are created). I looked at it but it doesn't help me much...
>
> Just quickie, there is 82M in 85 vectors, but your system has 1.5M unknowns, so a single vector is about 12M. Thus, there are probably 5 or 6 systems vectors, and a bunch of small ones.
Matt is right, maybe the vectors are as expected.
But there is something totally off about the memory used for the matrix.
type: mpiaij
rows=1552000, cols=1552000, bs=7760
total: nonzeros=1558766, allocated nonzeros=1558766
12*1558766 = 18,705,192 but
Matrix 15 15 8,744,032 0.
somehow it is logging much less memory for the matrix than it must contain. A quick look at the code seems ok.
PetscErrorCode MatSeqAIJSetPreallocation_SeqAIJ(Mat B,PetscInt nz,const PetscInt *nnz)
PetscLogObjectMemory((PetscObject)B,(B->rmap->n+1)*sizeof(PetscInt)+nz*(sizeof(PetscScalar)+sizeof(PetscInt)));CHKERRQ(ierr);
Well, probably not important.
So the memory usage is probably fine.
Barry
>
> Thanks,
>
> Matt
>
> Cheers,
>
> Sophie
> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
> Sent: Wednesday, September 16, 2020 16:38
> To: Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>>
> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net>>
> Subject: Re: [petsc-users] Matrix Free Method questions
>
>
> Yikes, GAMG is using a lot of vectors. But many of these are much smaller vectors so not of major concern.
>
> I think this will just have to be an ongoing issue to see where the vectors are created internally and reuse or eliminate as many extra as possible.
>
> The option -log_view_memory causes the PETSc logging summary to print additional columns showing the memory allocated during the different events in PETSc. This can be useful to see "when" the memory is mostly created; it does not tell us "why" it is created but at least tells us were to look.
>
> Barry
>
>
>> On Sep 16, 2020, at 1:54 PM, Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>> wrote:
>>
>> Hi Barry,
>>
>> I don't think we're explicitly creating many PETSc vectors in Xolotl. There is a global one created for the solution when the TS is set up, and local ones in RHSFunction and RHSJacobian; everywhere else we just get the array from it with DMDAVecGetArrayDOF and DMDAVecRestoreArrayDOF. I tried a few things to see if it changed the number of Vec from 85 (removing monitors, fewer time steps, fewer MPI tasks) but it stayed the same, except when I changed the PC option from "-fieldsplit_1_pc_type redundant" to "-fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1" where I got 10567 vectors.
>>
>> Cheers,
>>
>> Sophie
>> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>> Sent: Tuesday, September 15, 2020 18:37
>> To: Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>>
>> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net>>
>> Subject: Re: [petsc-users] Matrix Free Method questions
>>
>>
>> Sophie,
>>
>> Great, everything looks good.
>>
>> So the new version takes about 7 times longer, due to the relatively modest increase (about 25 percent) in the number of iterations from the poorer preconditioner convergence and the rest from the much slower matrix-vector product due to using matrix free instead of matrix based precondtioner. Both of these are expected.
>>
>> The matrix is taking about 10% of the memory it used to require, also expected.
>>
>> I noticed in the logging the memory for the vectors
>>
>> Vector 85 85 82303208 0.
>> Matrix 15 15 8744032 0.
>>
>> is substantial/huge, with the much smaller matrix memory the vector memory dominates.
>>
>> It indicates 85 vectors are used. This is a large number, there are some needed for the TS (maybe 5?) and some needed for the KSP solve (maybe about 37) but I am not sure why there are so many. Perhaps this number could be reduced. Are there are lot of vectors created in the Xolotyl code? I would it could run with about 45 vectors.
>>
>> Barry
>>
>>
>>
>>
>>> On Sep 15, 2020, at 5:12 PM, Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>> wrote:
>>>
>>> Hi Barry,
>>>
>>> I fixed everything and re-ran the 4 cases in 1D. They took more time than before because I used the Kokkos serial backend on the Xolotl side instead of the CUDA one previously (long story short, I tried to update CUDA and messed up the whole installation). Step 4 looks much better than prevously, I was even able to remove MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) from the code and it ran without throwing errors. The log files are attached.
>>>
>>> Cheers,
>>>
>>> Sophie
>>> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>> Sent: Friday, September 11, 2020 18:03
>>> To: Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>>
>>> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net>>
>>> Subject: Re: [petsc-users] Matrix Free Method questions
>>>
>>>
>>>
>>>> On Sep 11, 2020, at 7:45 AM, Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>> wrote:
>>>>
>>>> Thank you Barry,
>>>>
>>>> Step 3 worked after I moved MatSetOption at the beginning of computeJacobian(). Attached is the updated log which is pretty similar to what I had before. Step 4 still uses many more iterations.
>>>>
>>>> I checked the Jacobian using -ksp_view_pmat ascii (on a simpler case), I can see the difference between step 3 and 4 is that the contribution from the reactions is not included in the step 4 Jacobian (as expected from the fact that I removed their setting from the code).
>>>>
>>>> Looking back at one of your previous email, you wrote "This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms. ", does it mean that I should still include the contributions from the reactions that affect the pure diagonal terms?
>>>
>>> Yes, you need to leave in everything that affects the diagonal otherwise the "Jacobi" preconditioner will not reflect the true Jacobi preconditioner and likely perform poorly.
>>>
>>> Barry
>>>
>>>>
>>>> Cheers,
>>>>
>>>> Sophie
>>>>
>>>> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>>> Sent: Thursday, September 10, 2020 17:04
>>>> To: Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>>
>>>> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net>>
>>>> Subject: Re: [petsc-users] Matrix Free Method questions
>>>>
>>>>
>>>>
>>>>> On Sep 10, 2020, at 2:46 PM, Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>> wrote:
>>>>>
>>>>> Hi Barry,
>>>>>
>>>>> Going through the different changes again to understand what was going wrong with the last step, I discovered that my changes from 2 to 3 (keeping only the pure diagonal for the reaction Jacobian setup and adding MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) were wrong: the sparsity of the matrix was correct but then the RHSJacobian method was wrong. I updated it
>>>>
>>>> I'm not sure what you mean here. My hope was that in step 3 you won't need to change RHSJacobian at all (that is just for step 4).
>>>>
>>>>> but now when I run step 3 again I get the following error:
>>>>>
>>>>> [2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>>>>> [2]PETSC ERROR: Argument out of range
>>>>> [2]PETSC ERROR: Inserting a new nonzero at global row/column (310400, 316825) into matrix
>>>>> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
>>>>> [2]PETSC ERROR: Petsc Development GIT revision: v3.13.4-885-gf58a62b032 GIT Date: 2020-09-01 13:07:58 -0500
>>>>> [2]PETSC ERROR: Unknown Name on a 20200902 named iguazu by bqo Thu Sep 10 15:38:58 2020
>>>>> [2]PETSC ERROR: Configure options PETSC_DIR=/home2/bqo/libraries/petsc-barry PETSC_ARCH=20200902 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries --download-fblaslapack=1
>>>>> [2]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 606 in /home2/bqo/libraries/petsc-barry/src/mat/impls/aij/mpi/mpiaij.c
>>>>> [2]PETSC ERROR: #2 MatSetValues() line 1392 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c
>>>>> [2]PETSC ERROR: #3 MatSetValuesLocal() line 2207 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c
>>>>> [2]PETSC ERROR: #4 MatSetValuesStencil() line 1595 in /home2/bqo/libraries/petsc-barry/src/mat/interface/matrix.c
>>>>> PetscSolverExpHandler::computeJacobian: MatSetValuesStencil (reactions) failed.
>>>>>
>>>>> Because the RHSJacobian method is trying to update the elements corresponding to the reactions. I'm not sure I understood correctly what step 3 was supposed to be.
>>>>
>>>> In step the three the RHSJacobian was suppose to be unchanged, only the option to ignore the "unneeded" Jacobian entries inside MatSetValues (set with MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) was needed (plus changing the DMDASetBlockFillsXXX argument).
>>>>
>>>> The error message Inserting a new nonzero at global row/column (310400, 316825) into matrix indicates that somehow the MatOption MAT_NEW_NONZERO_LOCATION_ERR is in control instead of the option MAT_NEW_NONZERO_LOCATIONS, when it is setting values the Jacobian values.
>>>>
>>>> The MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS_ERR,PETSC_TRUE);) is normally called inside the DMCreateMatrix() so I am not sure how they could be getting called in the wrong order but it seems somehow it is
>>>>
>>>> When do you call MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);) in the code? You can call it at the beginning of computeJacobian().
>>>>
>>>> If this still doesn't work and you get the same error you can run in the debugger on one process and put a breakpoint for MatSetOptions() to found out how the MAT_NEW_NONZERO_LOCATIONS_ERR comes in late to upset the apple cart. You should see MatSetOption() called at least twice and the last one should have the MAT_NEW_NONZERO_LOCATION flag.
>>>>
>>>> Barry
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Sophie
>>>>>
>>>>>
>>>>> From: Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>>>> Sent: Friday, September 4, 2020 01:06
>>>>> To: Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>>
>>>>> Cc: petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net>>
>>>>> Subject: Re: [petsc-users] Matrix Free Method questions
>>>>>
>>>>>
>>>>> Sophie,
>>>>>
>>>>> Thanks. I have started looking through the logs
>>>>>
>>>>> The change to matrix-free multiple (from 1 to 2) which reduces the accuracy of the multiply to about half the digits is not surprising.
>>>>>
>>>>> * It roughly doubles the time since doing the matrix-free product requires a function evaluation
>>>>>
>>>>> * It increases the iteration count, but not significantly since the reduced precision of the multiple induces some additional linear iterations
>>>>>
>>>>> The change from 2 to 3 (not storing the entire matrix)
>>>>>
>>>>> * number of nonzeros goes from 49459966 to 1558766 = 3.15 percent so it succeds in not storing the unneeded part of the matrix
>>>>>
>>>>> * the number of MatMult_MF goes from 2331 to 2418. I don't understand this, I expected it to be identical because it should be using the same preconditioner in 3 as in 2 and thus get the same convergence. Could possibility be due to the variability in convergence due to different runs with the matrix-free preconditioner preconditioner and not related to not storing the entire matrix.
>>>>>
>>>>> * the KSPSolve() time goes from 3.8774e+0 to 3.7855e+02 a trivial difference which is what I would expect
>>>>>
>>>>> * the SNESSolve time goes from 5.0047e+02 to 4.3275e+02 about a 14 percent drop which is reasonable because 3 doesn't spend as much time inserting matrix values (it still computes them but doesn't insert the ones we don't want for the preconditioner).
>>>>>
>>>>> The change from 3 to 4
>>>>>
>>>>> * something goes seriously wrong here. The total number of linear solve iterations goes from 2282 to 97403 so something has gone seriously wrong with the preconditioner, but since the preconditioner operations are the same it seems something has gone wrong with the new reduced preconditioner.
>>>>>
>>>>> I think there is an error in computing the reduced matrix entries, that is the new compute Jacobian code is not computing the entries it needs to correctly.
>>>>>
>>>>> To debug this you can run case 3 and case 4 for a single time step with -ksp_view_pmat binary This should create a binary file with the initial Jacobian matrices in each. You can use Matlab or Python to do the difference in the matrices and see how possibly the new Jacobian computation code is not producing the correct values in some locations.
>>>>>
>>>>> Good luck,
>>>>>
>>>>> Barry
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Sep 3, 2020, at 12:26 PM, Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>> wrote:
>>>>>>
>>>>>> Hi Barry,
>>>>>>
>>>>>> Attached are the log files for the 1D case, for each of the 4 steps. I don't know how I did it yesterday but the differences between steps look better today, except for step 4 that takes many more iterations and smaller time steps.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Sophie
>>>>>>
>>>>>> De : Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>>>>> Envoyé : mercredi 2 septembre 2020 15:53
>>>>>> À : Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>>
>>>>>> Cc : petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net>>
>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Sep 2, 2020, at 1:44 PM, Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>> wrote:
>>>>>>>
>>>>>>> Thank you Barry,
>>>>>>>
>>>>>>> The code ran with your branch but it's much slower than running with the full Jacobian and Jacobi PC subtype (around 10 times slower). It is using less memory as expected. I tried step 2 as well and it's even slower.
>>>>>>
>>>>>> Sophie,
>>>>>>
>>>>>> That is puzzling. It should be using the same matrix in the solver so should be the same speed and the setup time should be a bit better since it does not form the full Jacobian. (We'll get to this later)
>>>>>>
>>>>>>> The TS iteration for step 1 are the same as with full Jacobian. Let me know what I can look at to check if I've done something wrong.
>>>>>>
>>>>>> We need to see if the KSP iterations are pretty similar for four approaches (1) original code with Jacobi PC subtype (2) matrix free with Jacobi PC (just add -snes_mf_operator to case 1) (3) the new code with the MatSetOption() to not store the entire Jacobian also with the -snes_mf_operator and (4) the new code that doesn't compute the unneeded part of the Jacobian also with the -snes_mf_operator
>>>>>>
>>>>>> You could run each case with same 20 timesteps and -ts_monitor -ksp_monitor and -ts_view and send the four output files around.
>>>>>>
>>>>>> Once we are sure the four cases are behaving as expected then you can get timings for them but let's not do that until we confirm the similar results. There could easily be a flaw in my reasoning or the PETSc code somewhere that affects the correctness so its best to check that first.
>>>>>>
>>>>>>
>>>>>> Barry
>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Sophie
>>>>>>> De : Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>>>>>> Envoyé : mardi 1 septembre 2020 14:12
>>>>>>> À : Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>>
>>>>>>> Cc : petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net>>
>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions
>>>>>>>
>>>>>>>
>>>>>>> Sophie,
>>>>>>>
>>>>>>> Sorry, looks like an old bug in PETSc that was undetected due to lack of use. The code is trying to use the first of the two matrices to determine the preconditioner which won't work in your case since it is matrix-free. It should be using the second matrix.
>>>>>>>
>>>>>>> I hope the branch barry/2020-09-01/fix-fieldsplit-mf resolves this issue for you.
>>>>>>>
>>>>>>> Barry
>>>>>>>
>>>>>>>
>>>>>>>> On Sep 1, 2020, at 12:45 PM, Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>> wrote:
>>>>>>>>
>>>>>>>> Hi Barry,
>>>>>>>>
>>>>>>>> I'm working through step 1) but I think I am doing something wrong. I'm using DMDASetBlockFillsSparse to set the non-zeros only for the diffusing clusters (small He clusters here, from size 1 to 7) and all the diagonal entries. Then I added a few lines in the code:
>>>>>>>> Mat mat;
>>>>>>>> DMCreateMatrix(da, &mat);
>>>>>>>> MatSetOption(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);
>>>>>>>>
>>>>>>>> When I try to run with the following options: -snes_mf_operator -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 2 -snes_force_iteration -pc_fieldsplit_detect_coupling -pc_type fieldsplit -fieldsplit_0_pc_type jacobi -fieldsplit_1_pc_type redundant -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -ts_max_snes_failures -1 -ts_monitor -ts_max_steps 20
>>>>>>>>
>>>>>>>> I get an error:
>>>>>>>> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>>>>>>>> [0]PETSC ERROR: No support for this operation for this object type
>>>>>>>> [0]PETSC ERROR: Matrix type mffd does not have a find off block diagonal entries defined
>>>>>>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
>>>>>>>> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.4-851-gde18fec8da GIT Date: 2020-08-28 16:47:50 +0000
>>>>>>>> [0]PETSC ERROR: Unknown Name on a 20200828 named sophie-Precision-5530 by sophie Tue Sep 1 10:58:44 2020
>>>>>>>> [0]PETSC ERROR: Configure options PETSC_DIR=/home/sophie/Code/petsc PETSC_ARCH=20200828 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif77 --with-debugging=no --with-shared-libraries
>>>>>>>> [0]PETSC ERROR: #1 MatFindOffBlockDiagonalEntries() line 9847 in /home/sophie/Code/petsc/src/mat/interface/matrix.c
>>>>>>>> [0]PETSC ERROR: #2 PCFieldSplitSetDefaults() line 504 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c
>>>>>>>> [0]PETSC ERROR: #3 PCSetUp_FieldSplit() line 606 in /home/sophie/Code/petsc/src/ksp/pc/impls/fieldsplit/fieldsplit.c
>>>>>>>> [0]PETSC ERROR: #4 PCSetUp() line 1009 in /home/sophie/Code/petsc/src/ksp/pc/interface/precon.c
>>>>>>>> [0]PETSC ERROR: #5 KSPSetUp() line 406 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c
>>>>>>>> [0]PETSC ERROR: #6 KSPSolve_Private() line 658 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c
>>>>>>>> [0]PETSC ERROR: #7 KSPSolve() line 889 in /home/sophie/Code/petsc/src/ksp/ksp/interface/itfunc.c
>>>>>>>> [0]PETSC ERROR: #8 SNESSolve_NEWTONLS() line 225 in /home/sophie/Code/petsc/src/snes/impls/ls/ls.c
>>>>>>>> [0]PETSC ERROR: #9 SNESSolve() line 4524 in /home/sophie/Code/petsc/src/snes/interface/snes.c
>>>>>>>> [0]PETSC ERROR: #10 TSStep_ARKIMEX() line 811 in /home/sophie/Code/petsc/src/ts/impls/arkimex/arkimex.c
>>>>>>>> [0]PETSC ERROR: #11 TSStep() line 3731 in /home/sophie/Code/petsc/src/ts/interface/ts.c
>>>>>>>> [0]PETSC ERROR: #12 TSSolve() line 4128 in /home/sophie/Code/petsc/src/ts/interface/ts.c
>>>>>>>> PetscSolver::solve: TSSolve failed.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Sophie
>>>>>>>> De : Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>>>>>>> Envoyé : lundi 31 août 2020 14:50
>>>>>>>> À : Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>>
>>>>>>>> Cc : petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net>>
>>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions
>>>>>>>>
>>>>>>>>
>>>>>>>> Sophie,
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> The factor of 4 is lot, the 1.5 not so bad.
>>>>>>>>
>>>>>>>> You will definitely want to retain the full matrix assembly codes for speed and to verify a reduced matrix version.
>>>>>>>>
>>>>>>>> It is worth trying a "reduced matrix version" with matrix-free multiply based on these numbers. This reduced matrix Jacobian will only have the diagonals and all the terms connected to the cluster sizes that move. In other words you will be building just the part of the Jacobian needed for the new preconditioner (PC subtype for Jacobi) and doing the matrix-vector product matrix free. (SOR requires all the Jacobian entries).
>>>>>>>>
>>>>>>>> Fortunately this is hopefully pretty straightforward for this code. You will not have to change the structure of the main code at all.
>>>>>>>>
>>>>>>>> Step 1) create a new "sparse matrix" that will be passed to DMDASetBlockFillsSparse(). This new "sparse matrix" needs to retain all the diagonal entries and also all the entries that are associated with the variables that diffuse. If I remember correctly these are just the smallest cluster size, plain Helium?
>>>>>>>>
>>>>>>>> Call MatSetOptions(mat,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE);
>>>>>>>>
>>>>>>>> Then you would run the code with -snes_mf_operator and the new PC subtype for Jacobi.
>>>>>>>>
>>>>>>>> A test that the new reduced Jacobian is correct will be that you get almost the same iterations as the runs you just make using the PC subtype of Jacobi. Hopefully not slower and using a great deal less memory. The iterations will not be identical because of the matrix-free multiple.
>>>>>>>>
>>>>>>>> Step 2) create a new version of the Jacobian computation routine. This routine should only compute the elements of the Jacobian needed for this reduced matrix Jacobian, so the diagonals and the diffusion/convection terms.
>>>>>>>>
>>>>>>>> Again run with with -snes_mf_operator and the new PC subtype for Jacobi and you should again get the same convergence history.
>>>>>>>>
>>>>>>>> I made two steps because it makes it easier to validate and debug to get the same results as before. The first step cheats in that it still computes the full Jacobian but ignores the entries that we don't need to store for the preconditioner. The second step is more efficient because it only computes the Jacobian entries needed for the preconditioner but it requires you going through the Jacobian code and making sure only the needed parts are computed.
>>>>>>>>
>>>>>>>>
>>>>>>>> If you have any questions please let me know.
>>>>>>>>
>>>>>>>> Barry
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Aug 31, 2020, at 1:13 PM, Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Barry,
>>>>>>>>>
>>>>>>>>> I ran the 2 cases to look at the effect of the Jacobi pre-conditionner:
>>>>>>>>> 1D with 200 grid points and 7759 DOF per grid point (for the PSI application), for 20 TS: the factor between SOR and Jacobi is ~4 (976 MatMult for SOR and 4162 MatMult for Jacobi)
>>>>>>>>> 2D with 63x63 grid points and 4124 DOF per grid point (for the NE application), for 20 TS: the factor is 1.5 (6657 for SOR, 10379 for Jacobi)
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Sophie
>>>>>>>>> De : Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>>>>>>>> Envoyé : vendredi 28 août 2020 18:31
>>>>>>>>> À : Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>>
>>>>>>>>> Cc : petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net>>
>>>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Aug 28, 2020, at 4:11 PM, Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>> wrote:
>>>>>>>>>>
>>>>>>>>>> Thank you Jed and Barry,
>>>>>>>>>>
>>>>>>>>>> First, attached are the logs from the benchmark runs I did without (log_std.txt) and with MF method (log_mf.txt). It took me some trouble to get the -log_view to work because I'm using push and pop for the options which means that PETSc is initialized with no argument so the command line argument was not taken into account, but I guess this is for a separate discussion.
>>>>>>>>>>
>>>>>>>>>> To answer questions about the current per-conditioners:
>>>>>>>>>> I used the same pre-conditioner options as listed in my previous email when I added the -snes_mf option; I did try to remove all the PC related options at one point with the MF method but didn't see a change in runtime so I put them back in
>>>>>>>>>> this benchmark is for a 1D DMDA using 20 grid points; when running in 2D or 3D I switch the PC options to: -pc_type fieldsplit -fieldsplit_0_pc_type sor -fieldsplit_1_pc_type gamg -fieldsplit_1_ksp_type gmres -ksp_type fgmres -fieldsplit_1_pc_gamg_threshold -1
>>>>>>>>>> I haven't tried a Jacobi PC instead of SOR, I will run a set of more realistic runs (1D and 2D) without MF but with Jacobi and report on it next week. When you say "iterations" do you mean what is given by -ksp_monitor?
>>>>>>>>>
>>>>>>>>> Yes, the number of MatMult is a good enough surrogate.
>>>>>>>>>
>>>>>>>>> So using matrix-free (which means no preconditioning) has
>>>>>>>>>
>>>>>>>>> 35846/160
>>>>>>>>>
>>>>>>>>> ans =
>>>>>>>>>
>>>>>>>>> 224.0375
>>>>>>>>>
>>>>>>>>> or 224 as many iterations. So even for this modest 1d problem preconditioning is doing a great deal.
>>>>>>>>>
>>>>>>>>> Barry
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>> Sophie
>>>>>>>>>> De : Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>>
>>>>>>>>>> Envoyé : vendredi 28 août 2020 12:12
>>>>>>>>>> À : Blondel, Sophie <sblondel at utk.edu <mailto:sblondel at utk.edu>>
>>>>>>>>>> Cc : petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>; xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net <mailto:xolotl-psi-development at lists.sourceforge.net>>
>>>>>>>>>> Objet : Re: [petsc-users] Matrix Free Method questions
>>>>>>>>>>
>>>>>>>>>> [External Email]
>>>>>>>>>>
>>>>>>>>>> Sophie,
>>>>>>>>>>
>>>>>>>>>> This is exactly what i would expect. If you run with -ksp_monitor you will see the -snes_mf run takes many more iterations.
>>>>>>>>>>
>>>>>>>>>> I am puzzled that the argument -pc_type fieldsplit did not stop the run since this is under normal circumstances not a viable preconditioner with -snes_mf. Did you also remove the -pc_type fieldsplit argument?
>>>>>>>>>>
>>>>>>>>>> In order to see how one can avoid forming the entire matrix and use matrix-free to do the matrix-vector but still have an effective preconditioner let's look at what the current preconditioner options do.
>>>>>>>>>>
>>>>>>>>>>> -pc_fieldsplit_detect_coupling
>>>>>>>>>>
>>>>>>>>>> creates two sub-preconditioners, the first for all the variables and the second for those that are coupled by the matrix to variables in neighboring cells Since only the smallest cluster sizes have diffusion/advection this second set contains only the cluster size one variables.
>>>>>>>>>>
>>>>>>>>>>> -fieldsplit_0_pc_type sor
>>>>>>>>>>
>>>>>>>>>> Runs SOR on all the variables; you can think of this as running SOR on the reactions, it is a pretty good preconditioner for the reactions since the reactions are local, per cell.
>>>>>>>>>>
>>>>>>>>>>> -fieldsplit_1_pc_type redundant
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This runs the default preconditioner (ILU) on just the variables that diffuse, i.e. the elliptic part. For smallish problems this is fine, for larger problems and 2d and 3d presumably you have also -redundant_pc_type gamg to use algebraic multigrid for the diffusion. This part of the matrix will always need to be formed and used in the preconditioner. It is very important since the diffusion is what brings in most of the ill-conditioning for larger problems into the linear system. Note that it only needs the matrix entries for the cluster size of 1 so it is very small compared to the entire sparse matrix.
>>>>>>>>>>
>>>>>>>>>> ----
>>>>>>>>>> The first preconditioner SOR requires ALL the matrix entries which are almost all (except for the diffusion terms) the coupling between different size clusters within a cell. Especially each cell has its own sparse matrix of the size of total number of clusters, it is sparse but not super sparse.
>>>>>>>>>>
>>>>>>>>>> So the to significantly lower memory usage we need to remove the SOR and the storing of all the matrix entries but still have an efficient preconditioner for the "reaction" terms.
>>>>>>>>>>
>>>>>>>>>> The simplest thing would be to use Jacobi instead of SOR for the first subpreconditioner since it only requires the diagonal entries in the matrix. But Jacobi is a worse preconditioner than SOR (since it totally ignores the matrix coupling) and sometimes can be much worse.
>>>>>>>>>>
>>>>>>>>>> Before anyone writes additional code we need to know if doing something along these lines does not ruin the convergence that.
>>>>>>>>>>
>>>>>>>>>> Have you used the same options as before but with -fieldsplit_0_pc_type jacobi ? (Not using any matrix free). We need to get an idea of how many more linear iterations it requires (not time, comparing time won't be helpful for this exercise.) We also need this information for realistic size problems in 2 or 3 dimensions that you really want to run; for small problems this approach will work ok and give misleading information about what happens for large problems.
>>>>>>>>>>
>>>>>>>>>> I suspect the iteration counts will shot up. Can you run some cases and see how the iteration counts change?
>>>>>>>>>>
>>>>>>>>>> Based on that we can decide if we still retain "good convergence" by changing the SOR to Jacobi and then change the code to make this change efficient (basically by skipping the explicit computation of the reaction Jacobian terms and using matrix-free on the outside of the PCFIELDSPLIT.)
>>>>>>>>>>
>>>>>>>>>> Barry
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Aug 28, 2020, at 9:49 AM, Blondel, Sophie via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>
>>>>>>>>>>> I have been using PETSc for a few years with a fully implicit TS ARKIMEX method and am now exploring the matrix free method option. Here is the list of PETSc options I typically use: -ts_dt 1.0e-12 -ts_adapt_time_step_increase_delay 5 -snes_force_iteration -ts_max_time 1000.0 -ts_adapt_dt_max 2.0e-3 -ts_adapt_wnormtype INFINITY -ts_exact_final_time stepover -fieldsplit_0_pc_type sor -ts_max_snes_failures -1 -pc_fieldsplit_detect_coupling -ts_monitor -pc_type fieldsplit -fieldsplit_1_pc_type redundant -ts_max_steps 100
>>>>>>>>>>>
>>>>>>>>>>> I started to compare the performance of the code without changing anything of the executable and simply adding "-snes_mf", I see a reduction of memory usage as expected and a benchmark that would usually take ~5min to run now takes ~50min. Reading the documentation I saw that there are a few option to play with the matrix free method like -snes_mf_err, -snes_mf_umin, or switching to -snes_mf_type wp. I used and modified the values of each of these options separately but never saw a sizable change in runtime, is it expected?
>>>>>>>>>>>
>>>>>>>>>>> And are there other ways to make the matrix free method faster? I saw in the documentation that you can define your own per-conditioner for instance. Let me know if you need additional information about the PETSc setup in the application I use.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>> Sophie
>>>>>>>>>>
>>>>>>>>>> <log_mf.txt><log_std.txt>
>>>>>>
>>>>>> <log_1D_1.txt><log_1D_2.txt><log_1D_3.txt><log_1D_4.txt>
>>>>
>>>> <log_1D_3_bis.txt>
>>>
>>> <log_1D_1.txt><log_1D_2.txt><log_1D_3.txt><log_1D_4.txt>
>
>
>
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200924/72ed81b9/attachment-0001.html>
More information about the petsc-users
mailing list