[petsc-users] Question regarding SNES error about locked vectors

Wed Dec 24 22:02:17 CST 2025

   I have started a merge request to properly propagate failure reasons up from the line search to the SNESSolve in https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8914__;!!G_uCfscf7eWS!b2nWDVWqfyc96V63w_2sLd0siVZ769Ztwal8rZgfCzJ3q3V3ALVEMdGDLu6IvbSPmudCO08cQL4r0J54oVEz12k$  Could you give it a try when you get the chance?

> On Dec 22, 2025, at 3:03 PM, David Knezevic <david.knezevic at akselos.com> wrote:
> 
> P.S. As a test I removed the "postcheck" callback, and I still get the same behavior with the DIVERGED_LINE_SEARCH converged reason, so I guess the "postcheck" is not related.
> 
> 
> On Mon, Dec 22, 2025 at 1:58 PM David Knezevic <david.knezevic at akselos.com <mailto:david.knezevic at akselos.com>> wrote:
>> The print out I get from -snes_view is shown below. I wonder if the issue is related to "using user-defined postcheck step"?
>> 
>> 
>> SNES Object: 1 MPI process
>>   type: newtonls
>>   maximum iterations=5, maximum function evaluations=10000
>>   tolerances: relative=0., absolute=0., solution=0.
>>   total number of linear solver iterations=3
>>   total number of function evaluations=4
>>   norm schedule ALWAYS
>>   SNESLineSearch Object: 1 MPI process
>>     type: basic
>>     maxstep=1.000000e+08, minlambda=1.000000e-12
>>     tolerances: relative=1.000000e-08, absolute=1.000000e-15, lambda=1.000000e-08
>>     maximum iterations=40
>>     using user-defined postcheck step
>>   KSP Object: 1 MPI process
>>     type: preonly
>>     maximum iterations=10000, initial guess is zero
>>     tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>>     left preconditioning
>>     using NONE norm type for convergence test
>>   PC Object: 1 MPI process
>>     type: cholesky
>>       out-of-place factorization
>>       tolerance for zero pivot 2.22045e-14
>>       matrix ordering: external
>>       factor fill ratio given 0., needed 0.
>>         Factored matrix follows:
>>           Mat Object: 1 MPI process
>>             type: mumps
>>             rows=1152, cols=1152
>>             package used to perform factorization: mumps
>>             total: nonzeros=126936, allocated nonzeros=126936
>>               MUMPS run parameters:
>>                 Use -ksp_view ::ascii_info_detail to display information for all processes
>>                 RINFOG(1) (global estimated flops for the elimination after analysis): 1.63461e+07
>>                 RINFOG(2) (global estimated flops for the assembly after factorization): 74826.
>>                 RINFOG(3) (global estimated flops for the elimination after factorization): 1.63461e+07
>>                 (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
>>                 INFOG(3) (estimated real workspace for factors on all processors after analysis): 150505
>>                 INFOG(4) (estimated integer workspace for factors on all processors after analysis): 6276
>>                 INFOG(5) (estimated maximum front size in the complete tree): 216
>>                 INFOG(6) (number of nodes in the complete tree): 24
>>                 INFOG(7) (ordering option effectively used after analysis): 2
>>                 INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100
>>                 INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 150505
>>                 INFOG(10) (total integer space store the matrix factors after factorization): 6276
>>                 INFOG(11) (order of largest frontal matrix after factorization): 216
>>                 INFOG(12) (number of off-diagonal pivots): 1044
>>                 INFOG(13) (number of delayed pivots after factorization): 0
>>                 INFOG(14) (number of memory compress after factorization): 0
>>                 INFOG(15) (number of steps of iterative refinement after solution): 0
>>                 INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 2
>>                 INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 2
>>                 INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 2
>>                 INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 2
>>                 INFOG(20) (estimated number of entries in the factors): 126936
>>                 INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 2
>>                 INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 2
>>                 INFOG(23) (after analysis: value of ICNTL(6) effectively used): 0
>>                 INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1
>>                 INFOG(25) (after factorization: number of pivots modified by static pivoting): 0
>>                 INFOG(28) (after factorization: number of null pivots encountered): 0
>>                 INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 126936
>>                 INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 2, 2
>>                 INFOG(32) (after analysis: type of analysis done): 1
>>                 INFOG(33) (value used for ICNTL(8)): 7
>>                 INFOG(34) (exponent of the determinant if determinant is requested): 0
>>                 INFOG(35) (after factorization: number of entries taking into account BLR factor compression - sum over all processors): 126936
>>                 INFOG(36) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - value on the most memory consuming processor): 0
>>                 INFOG(37) (after analysis: estimated size of all MUMPS internal data for running BLR in-core - sum over all processors): 0
>>                 INFOG(38) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - value on the most memory consuming processor): 0
>>                 INFOG(39) (after analysis: estimated size of all MUMPS internal data for running BLR out-of-core - sum over all processors): 0
>>     linear system matrix = precond matrix:
>>     Mat Object: 1 MPI process
>>       type: seqaij
>>       rows=1152, cols=1152
>>       total: nonzeros=60480, allocated nonzeros=60480
>>       total number of mallocs used during MatSetValues calls=0
>>         using I-node routines: found 384 nodes, limit used is 5
>> 
>> 
>> 
>> On Mon, Dec 22, 2025 at 9:25 AM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>   David,
>>> 
>>>     This is due to a software glitch. SNES_DIVERGED_FUNCTION_DOMAIN was added long after the origins of SNES and, in places, the code was never fully updated to handle function domain problems. In particular, parts of the line search don't handle it correctly. Can you run with -snes_view and that will help us find the spot that needs to be updated. 
>>> 
>>>    Barry
>>> 
>>> 
>>>> On Dec 21, 2025, at 5:53 PM, David Knezevic <david.knezevic at akselos.com <mailto:david.knezevic at akselos.com>> wrote:
>>>> 
>>>> Hi, actually, I have a follow up on this topic.
>>>> 
>>>> I noticed that when I call SNESSetFunctionDomainError(), it exits the solve as expected, but it leads to a converged reason "DIVERGED_LINE_SEARCH" instead of "DIVERGED_FUNCTION_DOMAIN". If I also set SNESSetConvergedReason(snes, SNES_DIVERGED_FUNCTION_DOMAIN) in the callback, then I get the expected SNES_DIVERGED_FUNCTION_DOMAIN converged reason, so that's what I'm doing now. I was surprised by this behavior, though, since I expected that calling SNESSetFunctionDomainError woudld lead to the DIVERGED_FUNCTION_DOMAIN converged reason, so I just wanted to check on what could be causing this.
>>>> 
>>>> FYI, I'm using PETSc 3.23.4
>>>> 
>>>> Thanks,
>>>> David
>>>> 
>>>> 
>>>> On Thu, Dec 18, 2025 at 8:10 AM David Knezevic <david.knezevic at akselos.com <mailto:david.knezevic at akselos.com>> wrote:
>>>>> Thank you very much for this guidance. I switched to use SNES_DIVERGED_FUNCTION_DOMAIN, and I don't get any errors now.
>>>>> 
>>>>> Thanks!
>>>>> David
>>>>> 
>>>>> 
>>>>> On Wed, Dec 17, 2025 at 3:43 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>>>> 
>>>>>> 
>>>>>>> On Dec 17, 2025, at 2:47 PM, David Knezevic <david.knezevic at akselos.com <mailto:david.knezevic at akselos.com>> wrote:
>>>>>>> 
>>>>>>> Stefano and Barry: Thank you, this is very helpful.
>>>>>>> 
>>>>>>> I'll give some more info here which may help to clarify further. Normally we do just get a negative "converged reason", as you described. But in this specific case where I'm having issues the solve is a numerically sensitive creep solve, which has exponential terms in the residual and jacobian callback that can "blow up" and give NaN values. In this case, the root cause is that we hit a NaN value during a callback, and then we throw an exception (in libMesh C++ code) which I gather leads to the SNES solve exiting with this error code.
>>>>>>> 
>>>>>>> Is there a way to tell the SNES to terminate with a negative "converged reason" because we've encountered some issue during the callback?
>>>>>> 
>>>>>>    In your callback you should call SNESSetFunctionDomainError() and make sure the function value has an infinity or NaN in it (you can call VecFlag() for this purpose)). 
>>>>>> 
>>>>>>    Now SNESConvergedReason will be a completely reasonable SNES_DIVERGED_FUNCTION_DOMAIN
>>>>>> 
>>>>>>   Barry
>>>>>> 
>>>>>> If you are using an ancient version of PETSc (I hope you are using the latest since that always has more bug fixes and features) that does not have SNESSetFunctionDomainError then just make sure the function vector result has an infinity or NaN in it and then SNESConvergedReason will be SNES_DIVERGED_FNORM_NAN
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Dec 17, 2025 at 2:25 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Dec 17, 2025, at 2:08 PM, David Knezevic via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I'm using PETSc via the libMesh framework, so creating a MWE is complicated by that, unfortunately.
>>>>>>>>> 
>>>>>>>>> The situation is that I am not modifying the solution vector in a callback. The SNES solve has terminated, with PetscErrorCode 82, and I then want to update the solution vector (reset it to the "previously converged value") and then try to solve again with a smaller load increment. This is a typical "auto load stepping" strategy in FE.
>>>>>>>> 
>>>>>>>>    Once a PetscError is generated you CANNOT continue the PETSc program, it is not designed to allow this and trying to continue will lead to further problems. 
>>>>>>>> 
>>>>>>>>    So what you need to do is prevent PETSc from getting to the point where an actual PetscErrorCode of 82 is generated.  Normally SNESSolve() returns without generating an error even if the nonlinear solver failed (for example did not converge). One then uses SNESGetConvergedReason to check if it converged or not. Normally when SNESSolve() returns, regardless of whether the converged reason is negative or positive, there will be no locked vectors and one can modify the SNES object and call SNESSolve again. 
>>>>>>>> 
>>>>>>>>   So my guess is that an actual PETSc error is being generated because SNESSetErrorIfNotConverged(snes,PETSC_TRUE) is being called by either your code or libMesh or the option -snes_error_if_not_converged is being used. In your case when you wish the code to work after a non-converged SNESSolve() these options should never be set instead you should check the result of SNESGetConvergedReason() to check if SNESSolve has failed. If SNESSetErrorIfNotConverged() is never being set that may indicate you are using an old version of PETSc or have it a bug inside PETSc's SNES that does not handle errors correctly and we can help fix the problem if you can provide a full debug output version of when the error occurs.
>>>>>>>> 
>>>>>>>>   Barry
>>>>>>>> 
>>>>>>>> 
>>>>>>>>   
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I think the key piece of info I'd like to know is, at what point is the solution vector "unlocked" by the SNES object? Should it be unlocked as soon as the SNES solve has terminated with PetscErrorCode 82? Since it seems to me that it hasn't been unlocked yet (maybe just on a subset of the processes). Should I manually "unlock" the solution vector by calling VecLockWriteSet?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Dec 17, 2025 at 2:02 PM Stefano Zampini <stefano.zampini at gmail.com <mailto:stefano.zampini at gmail.com>> wrote:
>>>>>>>>>> You are not allowed to call VecGetArray on the solution vector of an SNES object within a user callback, nor to modify its values in any other way.
>>>>>>>>>> Put in C++ lingo, the solution vector is a "const" argument
>>>>>>>>>> It would be great if you could provide an MWE to help us understand your problem
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Il giorno mer 17 dic 2025 alle ore 20:51 David Knezevic via petsc-users <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> ha scritto:
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> I have a question about this error:
>>>>>>>>>>>> Vector 'Vec_0x84000005_0' (argument #2) was locked for read-only access in unknown_function() at unknown file:0 (line numbers only accurate to function begin)
>>>>>>>>>>> 
>>>>>>>>>>> I'm encountering this error in an FE solve where there is an error encountered during the residual/jacobian assembly, and what we normally do in that situation is shrink the load step and continue, starting from the "last converged solution". However, in this case I'm running on 32 processes, and 5 of the processes report the error above about a "locked vector".
>>>>>>>>>>> 
>>>>>>>>>>> We clear the SNES object (via SNESDestroy) before we reset the solution to the "last converged solution", and then we make a new SNES object subsequently. But it seems to me that somehow the solution vector is still marked as "locked" on 5 of the processes when we modify the solution vector, which leads to the error above.
>>>>>>>>>>> 
>>>>>>>>>>> I was wondering if someone could advise on what the best way to handle this would be? I thought one option could be to add an MPI barrier call prior to updating the solution vector to "last converged solution", to make sure that the SNES object is destroyed on all procs (and hence the locks cleared) before editing the solution vector, but I'm unsure if that would make a difference. Any  help would be most appreciated!
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Stefano
>>>>>>>> 
>>>>>> 
>>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20251224/118b2c4b/attachment-0001.html>