[petsc-users] Nonconforming object sizes using TAO (TAOBNTR)
Barry Smith
bsmith at petsc.dev
Tue Jan 17 18:07:12 CST 2023
It appears that Tao is not written to allow multiple TaoSolve() on the same Tao object; this is different from KSP, SNES, and TS.
If you look at the converged reason at the beginning of the second TaoSolve() you will see it is the reason that occurred in the first solve and all the Tao data structures are in the state they were in when the previous TaoSolve ended. Thus it is using incorrect flags and previous matrices incorrectly.
Fixing this would be a largish process I think.
I added an error check for TaoSolve that checks if converged reason is not iterating (meaning the Tao object was previously used and left in a bad state) so that this same problem won't come up for other users. https://gitlab.com/petsc/petsc/-/merge_requests/5986
Barry
> On Jan 16, 2023, at 5:07 PM, Blaise Bourdin <bourdin at mcmaster.ca> wrote:
>
> Hi,
>
> I am attaching a small modification of the eptorsion1.c example that replicates the issue. It looks like this bug is triggered when the upper and lower bounds are equal on enough (10) degrees of freedom.
>
>
> ---- Elastic-Plastic Torsion Problem -----
> mx: 10 my: 10
>
> i: 0
> i: 1
> i: 2
> i: 3
> i: 4
> i: 5
> i: 6
> i: 7
> i: 8
> i: 9
> [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> [0]PETSC ERROR: Nonconforming object sizes
> [0]PETSC ERROR: Preconditioner number of local rows 44 does not equal input vector size 54
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.18.2-242-g4615508c7fc GIT Date: 2022-11-28 10:21:46 -0600
> [0]PETSC ERROR: ./eptorsion1 on a ventura-gcc12.2-arm64-g64 named bblaptop.math.mcmaster.ca by blaise Mon Jan 16 17:06:57 2023
> [0]PETSC ERROR: Configure options --CFLAGS="-Wimplicit-function-declaration -Wunused" --FFLAGS="-ffree-line-length-none -fallow-argument-mismatch -Wunused" --download-ctetgen=1 --download-exodusii=1 --download-hdf5=1 --download-hypre=1 --download-metis=1 --download-netcdf=1 --download-mumps=1 --download-parmetis=1 --download-pnetcdf=1 --download-scalapack --download-triangle=1 --download-zlib=1 --with-64-bit-indices=1 --with-debugging=1 --with-exodusii-fortran-bindings --with-shared-libraries=1 --with-x11=0
> [0]PETSC ERROR: #1 PCApply() at /opt/HPC/petsc-main/src/ksp/pc/interface/precon.c:434
> [0]PETSC ERROR: #2 KSP_PCApply() at /opt/HPC/petsc-main/include/petsc/private/kspimpl.h:380
> [0]PETSC ERROR: #3 KSPCGSolve_STCG() at /opt/HPC/petsc-main/src/ksp/ksp/impls/cg/stcg/stcg.c:76
> [0]PETSC ERROR: #4 KSPSolve_Private() at /opt/HPC/petsc-main/src/ksp/ksp/interface/itfunc.c:898
> [0]PETSC ERROR: #5 KSPSolve() at /opt/HPC/petsc-main/src/ksp/ksp/interface/itfunc.c:1070
> [0]PETSC ERROR: #6 TaoBNKComputeStep() at /opt/HPC/petsc-main/src/tao/bound/impls/bnk/bnk.c:459
> [0]PETSC ERROR: #7 TaoSolve_BNTR() at /opt/HPC/petsc-main/src/tao/bound/impls/bnk/bntr.c:138
> [0]PETSC ERROR: #8 TaoSolve() at /opt/HPC/petsc-main/src/tao/interface/taosolver.c:177
> [0]PETSC ERROR: #9 main() at eptorsion1.c:166
> [0]PETSC ERROR: No PETSc Option Table entries
> [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov----------
> Abort(60) on node 0 (rank 0 in comm 16): application called MPI_Abort(MPI_COMM_SELF, 60) - process 0
>
> I hope that this helps.
>
> Blaise
>
>
>> On Jan 16, 2023, at 3:14 PM, Alexis Marboeuf <alexis.marboeuf at hotmail.fr> wrote:
>>
>> Hi Matt,
>> After investigation, it fails because, at some point, the boolean needH is set to PETSC_FALSE when initializing the BNK method with TAOBNKInitialize (line 103 of $PETSC_DIR/src/tao/bound/impls/bnk/bntr.c). The Hessian and the precondtitioner are thus not updated throughout the TAO iterations. It has something to do with the option BNK_INIT_INTERPOLATION set by default. It works when I choose BNK_INIT_CONSTANT. In my case, in all the successful calls of TAOSolve, the computed trial objective value is better than the current value which implies needH = PETSC_TRUE within TAOBNKInitialize. At some point, the trial value becomes equal to the current objective value up to machine precision and then, needH = PETSC_FALSE. I have to admit I am struggling understanding how that boolean needH is computed when BNK is initialized with BNK_INIT_INTERPOLATION. Can you help me with that?
>> Thanks a lot.
>> Alexis
>> De : Alexis Marboeuf <alexis.marboeuf at hotmail.fr>
>> Envoyé : samedi 14 janvier 2023 05:24
>> À : Matthew Knepley <knepley at gmail.com>
>> Cc : petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> Objet : RE: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR)
>>
>> Hi Matt,
>> Indeed, it fails on 1 process with the same error. The source code is available here: https://github.com/bourdin/mef90 (branch marboeuf/vdef-tao-test)
>> <https://github.com/bourdin/mef90>
>> GitHub - bourdin/mef90: Official repository for mef90/vDef <https://github.com/bourdin/mef90>
>> mef90 / vDef: A reference implementation of the variational approach to fracture, as described in: Francfort, G. and Marigo, J.-J. (1998). Revisiting brittle fracture as an energy minimization problem.
>> github.com
>> I can share the details (installation + command line) for running it. But the ideal would be to reproduce this error with a Petsc example so it's easier for you to investigate. I looked for a bound constraint minimization problem with TAO and TS but I didn't find it. What example could I use?
>> Thanks!
>> Alexis
>> De : Matthew Knepley <knepley at gmail.com>
>> Envoyé : samedi 14 janvier 2023 02:38
>> À : Alexis Marboeuf <alexis.marboeuf at hotmail.fr>
>> Cc : petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
>> Objet : Re: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR)
>>
>> On Fri, Jan 13, 2023 at 3:21 PM Alexis Marboeuf <alexis.marboeuf at hotmail.fr <mailto:alexis.marboeuf at hotmail.fr>> wrote:
>> Hi Matt,
>>
>> Here is the output from Petsc when I view the TAO solver:
>>
>> Okay, there is no sophisticated caching going on. So, first I would get it to fail on1 process. It should if it just depends on
>> the convergence (I hope). Then send the source so we can run it. It should be simple for us to find where system size
>> changes and the KSP is not reset (if that indeed is what happens).
>>
>> Thanks,
>>
>> Matt
>>
>> Tao Object: (Damage_) 4 MPI processes
>> type: bntr
>> Tao Object: (Damage_tao_bnk_cg_) 4 MPI processes
>> type: bncg
>> CG Type: ssml_bfgs
>> Skipped Stepdirection Updates: 0
>> Scaled gradient steps: 0
>> Pure gradient steps: 0
>> Not a descent direction: 0
>> Line search fails: 0
>> Matrix has not been preallocated yet
>> TaoLineSearch Object: (Damage_tao_bnk_cg_) 4 MPI processes
>> type: more-thuente
>> maximum function evaluations=30
>> tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9
>> total number of function evaluations=0
>> total number of gradient evaluations=0
>> total number of function/gradient evaluations=0
>> Termination reason: 0
>> Active Set subset type: subvec
>> convergence tolerances: gatol=1e-08, steptol=0., gttol=0.
>> Residual in Function/Gradient:=0.
>> Objective value=0.
>> total number of iterations=0, (max: 2000)
>> Solver terminated: 0
>> Rejected BFGS updates: 0
>> CG steps: 0
>> Newton steps: 11
>> BFGS steps: 0
>> Scaled gradient steps: 0
>> Gradient steps: 0
>> KSP termination reasons:
>> atol: 4
>> rtol: 0
>> ctol: 7
>> negc: 0
>> dtol: 0
>> iter: 0
>> othr: 0
>> TaoLineSearch Object: (Damage_) 4 MPI processes
>> type: more-thuente
>> maximum function evaluations=30
>> tolerances: ftol=0.0001, rtol=1e-10, gtol=0.9
>> total number of function evaluations=0
>> total number of gradient evaluations=0
>> total number of function/gradient evaluations=0
>> using variable bounds
>> Termination reason: 0
>> KSP Object: (Damage_tao_bnk_) 4 MPI processes
>> type: stcg
>> maximum iterations=10000, nonzero initial guess
>> tolerances: relative=1e-08, absolute=1e-08, divergence=1e+10
>> left preconditioning
>> using UNPRECONDITIONED norm type for convergence test
>> PC Object: (Damage_tao_bnk_) 4 MPI processes
>> type: lmvm
>> Mat Object: (Damage_tao_bnk_pc_lmvm_) 4 MPI processes
>> type: lmvmbfgs
>> rows=30634, cols=30634
>> Scale type: DIAGONAL
>> Scale history: 1
>> Scale params: alpha=1., beta=0.5, rho=1.
>> Convex factors: phi=0., theta=0.125
>> Max. storage: 5
>> Used storage: 5
>> Number of updates: 11
>> Number of rejects: 0
>> Number of resets: 0
>> Mat Object: (Damage_tao_bnk_pc_lmvm_J0_) 4 MPI processes
>> type: lmvmdiagbroyden
>> rows=30634, cols=30634
>> Scale history: 1
>> Scale params: alpha=1., beta=0.5, rho=1.
>> Convex factor: theta=0.125
>> Max. storage: 1
>> Used storage: 1
>> Number of updates: 11
>> Number of rejects: 0
>> Number of resets: 0
>> linear system matrix = precond matrix:
>> Mat Object: 4 MPI processes
>> type: mpiaij
>> rows=468, cols=468
>> total: nonzeros=2932, allocated nonzeros=2932
>> total number of mallocs used during MatSetValues calls=0
>> not using I-node (on process 0) routines
>> total KSP iterations: 103
>> Active Set subset type: subvec
>> convergence tolerances: gatol=0.0001, steptol=0., gttol=1e-05
>> Residual in Function/Gradient:=9.11153e-05
>> Objective value=0.00665458
>> total number of iterations=11, (max: 50)
>> total number of function evaluations=17, max: -1
>> total number of gradient evaluations=13, max: -1
>> total number of Hessian evaluations=12
>> Solution converged: ||g(X)|| <= gatol
>>
>> Thanks again for your help!
>> Alexis
>>
>> De : Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
>> Envoyé : samedi 14 janvier 2023 01:38
>> À : Alexis Marboeuf <alexis.marboeuf at hotmail.fr <mailto:alexis.marboeuf at hotmail.fr>>
>> Cc : petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
>> Objet : Re: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR)
>>
>> On Fri, Jan 13, 2023 at 11:22 AM Alexis Marboeuf <alexis.marboeuf at hotmail.fr <mailto:alexis.marboeuf at hotmail.fr>> wrote:
>> Hi all,
>>
>> In a variational approach of brittle fracture setting, I try to solve a bound constraint minimization problem using TAO. I checkout on the main branch of Petsc. Minimization with respect to the bounded variable (damage) is achieved through the Bounded Newton Trust Region (TAOBNTR). All other TAO parameters are set by default. On a Linux machine, I get the following error with a 4 processors run:
>>
>> Can you view the solver?
>>
>> Thanks,
>>
>> Matt
>>
>>
>> [3]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [3]PETSC ERROR: Nonconforming object sizes
>> [3]PETSC ERROR: Preconditioner number of local rows 1122 does not equal input vector size 1161
>> [3]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>> [3]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 GIT Date: 2023-01-04 13:37:04 +0000
>> [3]PETSC ERROR: /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023
>> [2]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
>> [3]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 --download-mumps=1 --download-chaco=1 --download-exodusii=1 --download-hypre=1 --download-ml=1 --download-triangle --download-scalapack=1 --download-superlu=1 --download-sowing=1 --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 --download-zlib=1 --with-cmake=1 --with-debugging=0 --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c
>> [3]PETSC ERROR: #1 PCApply() at /1/home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434
>> [3]PETSC ERROR: #2 KSP_PCApply() at /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380
>> [3]PETSC ERROR: #3 KSPCGSolve_STCG() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76
>> [3]PETSC ERROR: #4 KSPSolve_Private() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898
>> [3]PETSC ERROR: #5 KSPSolve() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070
>> [3]PETSC ERROR: #6 TaoBNKComputeStep() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459
>> [3]PETSC ERROR: #7 TaoSolve_BNTR() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138
>> [3]PETSC ERROR: #8 TaoSolve() at /1/home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177
>> [2]PETSC ERROR: Nonconforming object sizes
>> [2]PETSC ERROR: Preconditioner number of local rows 1229 does not equal input vector size 1254
>> [2]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>> [2]PETSC ERROR: Petsc Development GIT revision: v3.18.3-342-gdab44c92d91 GIT Date: 2023-01-04 13:37:04 +0000
>> [2]PETSC ERROR: /home/marboeua/Developpement/mef90/arch-darwin-c/bin/vDefTAO on a arch-darwin-c named bb01 by marboeua Thu Jan 12 16:55:18 2023
>> [2]PETSC ERROR: Configure options --FFLAGS=-ffree-line-length-none --COPTFLAGS="-O3 -march=znver3 -g" --CXXOPTFLAGS="-O3 -march=znver3 -g" --FOPTFLAGS="-O3 -march=znver3 -g" --download-fblaslapack=1 --download-mumps=1 --download-chaco=1 --download-exodusii=1 --download-hypre=1 --download-ml=1 --download-triangle --download-scalapack=1 --download-superlu=1 --download-sowing=1 --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp --download-yaml=1 --download-bison=1 --download-hdf5=1 --download-metis=1 --download-parmetis=1 --download-netcdf=1 --download-pnetcdf=1 --download-zlib=1 --with-cmake=1 --with-debugging=0 --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-ranlib=ranlib --with-shared-libraries=1 --with-sieve=1 --download-p4est=1 --with-pic --with-mpiexec=srun --with-x11=0 PETSC_ARCH=arch-darwin-c
>> [2]PETSC ERROR: #1 PCApply() at /1/home/marboeua/Developpement/petsc/src/ksp/pc/interface/precon.c:434
>> [2]PETSC ERROR: #2 KSP_PCApply() at /home/marboeua/Developpement/petsc/include/petsc/private/kspimpl.h:380
>> [2]PETSC ERROR: #3 KSPCGSolve_STCG() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/impls/cg/stcg/stcg.c:76
>> [2]PETSC ERROR: #4 KSPSolve_Private() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:898
>> [2]PETSC ERROR: #5 KSPSolve() at /1/home/marboeua/Developpement/petsc/src/ksp/ksp/interface/itfunc.c:1070
>> [2]PETSC ERROR: #6 TaoBNKComputeStep() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bnk.c:459
>> [2]PETSC ERROR: #7 TaoSolve_BNTR() at /1/home/marboeua/Developpement/petsc/src/tao/bound/impls/bnk/bntr.c:138
>> [2]PETSC ERROR: #8 TaoSolve() at /1/home/marboeua/Developpement/petsc/src/tao/interface/taosolver.c:177
>> [3]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370
>> application called MPI_Abort(MPI_COMM_SELF, 60) - process 0
>> [2]PETSC ERROR: #9 /home/marboeua/Developpement/mef90/vDef/vDefTAO.F90:370
>> application called MPI_Abort(MPI_COMM_SELF, 60) - process 0
>> slurmstepd: error: *** STEP 5034.0 ON bb01 CANCELLED AT 2023-01-12T17:21:07 ***
>> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>> srun: error: bb01: tasks 0-1: Killed
>> srun: error: bb01: tasks 2-3: Exited with exit code 1
>>
>> The error is raised in the middle of the computation after many successful calls of TAOSolve and TAO iterations. My guess is that TAO computes the preconditioner during its first iteration with all variables in the active set. But the preconditioner is never updated when some variables are moved to the inactive set during the next TAO iterations. Am I right? Can you help me with that?
>>
>> Thanks a lot for your help and your time.
>> Regards,
>> Alexis
>>
>>
>> --
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>> --
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> —
> Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243
>
> <eptorsion1.c>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230117/cad13a7e/attachment-0001.html>
More information about the petsc-users
mailing list