[petsc-users] strange error using fgmres

Smith, Barry F. bsmith at mcs.anl.gov
Sun May 5 19:18:12 CDT 2019



  Even if you don't get failures on the smaller version of a code it can still be worth running with valgrind (when you can't run valgrind on the massive problem) because often the problem is still there on the smaller problem, just less directly visible but valgrind can still find it.


> [13]PETSC ERROR: Object is in wrong state
> [13]PETSC ERROR: Clearing DM of global vectors that has a global vector obtained with DMGetGlobalVector()

   You probably have a work vector obtained with DMGetGlobalVector() that you forgot to return with DMRestoreGlobalVector(). Though I would expect that this would reproduce on any size problem.

   Barry


> On May 5, 2019, at 5:21 PM, Randall Mackie via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> In solving a nonlinear optimization problem, I was recently experimenting with fgmres using the following options:
> 
> -nlcg_ksp_type fgmres \
> -nlcg_pc_type ksp \
> -nlcg_ksp_ksp_type bcgs \
> -nlcg_ksp_pc_type jacobi \
> -nlcg_ksp_rtol 1e-6 \
> -nlcg_ksp_ksp_max_it 300 \
> -nlcg_ksp_max_it 200 \
> -nlcg_ksp_converged_reason \
> -nlcg_ksp_monitor_true_residual \
> 
> I sometimes randomly will get an error like the following:
> 
> Residual norms for nlcg_ solve.
>   0 KSP unpreconditioned resid norm 3.371606868500e+04 true resid norm 3.371606868500e+04 ||r(i)||/||b|| 1.000000000000e+00
>   1 KSP unpreconditioned resid norm 2.322590778002e+02 true resid norm 2.322590778002e+02 ||r(i)||/||b|| 6.888676137487e-03
>   2 KSP unpreconditioned resid norm 8.262440884758e+01 true resid norm 8.262440884758e+01 ||r(i)||/||b|| 2.450594392232e-03
>   3 KSP unpreconditioned resid norm 3.660428333809e+01 true resid norm 3.660428333809e+01 ||r(i)||/||b|| 1.085662853522e-03
>   3 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm           -nan ||r(i)||/||b||           -nan
> Linear nlcg_ solve did not converge due to DIVERGED_PC_FAILED iterations 3
>                PC_FAILED due to SUBPC_ERROR 
>  
> This usually happens after a few nonlinear optimization iterations, meaning that it’s worked perfectly fine until this point.
> How can using jacobi pc all of a sudden cause a NaN, if it’s worked perfectly fine before?
> 
> Some other errors in the output log file are as follows, although I have no idea if they result from the above error or not:
> 
> [13]PETSC ERROR: Object is in wrong state
> [13]PETSC ERROR: Clearing DM of global vectors that has a global vector obtained with DMGetGlobalVector()
> [13]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [13]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019
>  
>  
> [27]PETSC ERROR: #1 DMClearGlobalVectors() line 196 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dmget.c
> [27]PETSC ERROR: Configure options --with-clean=1 --with-scalar-type=complex --with-debugging=0 --with-fortran=1 --with-blaslapack-dir=/state/std2/intel_2018/m
> kl --with-mkl_pardiso-dir=/state/std2/intel_2018/mkl --with-mkl_cpardiso-dir=/state/std2/intel_2018/mkl --download-mumps=../external/mumps_v5.1.2-p1.tar.gz --d
> ownload-scalapack=../external/scalapack-2.0.2.tgz --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicc --FOPTFLAGS="-O3 -xHost" --COPTFLAGS="-O3 -xHost" --CXX
> OPTFLAGS="-O3 -xHost"
>  
>  
> #2 DMDestroy() line 752 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dm.c
> [72]PETSC ERROR: #3 PetscObjectDereference() line 624 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c
> [72]PETSC ERROR: #4 PetscObjectListDestroy() line 156 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/olist.c
> [72]PETSC ERROR: #5 PetscHeaderDestroy_Private() line 122 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c
> [72]PETSC ERROR: #6 VecDestroy() line 412 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/vec/vec/interface/vector.c
> 
> 
> 
> This is a large run taking many hours to get to this problem. I will try to run in debug mode, but given that this seems to be randomly happening (this has happened maybe 30% of the time I have used the fgmres option), there is no guarantee that will show anything useful. Valgrind is obviously out of the question for a large run, and I have yet to reproduce this on a smaller run.
> 
> Anyone have any ideas as to what’s causing this?
> 
> Thanks in advance,
> 
> Randy M.



More information about the petsc-users mailing list