[petsc-users] strange error using fgmres

Smith, Barry F. bsmith at mcs.anl.gov
Sun May 5 20:03:54 CDT 2019

  Run with -ksp_error_if_not_converged -info this will provide more detail at locating the exact location the error occurred.


> On May 5, 2019, at 5:21 PM, Randall Mackie via petsc-users <petsc-users at mcs.anl.gov> wrote:
> In solving a nonlinear optimization problem, I was recently experimenting with fgmres using the following options:
> -nlcg_ksp_type fgmres \
> -nlcg_pc_type ksp \
> -nlcg_ksp_ksp_type bcgs \
> -nlcg_ksp_pc_type jacobi \
> -nlcg_ksp_rtol 1e-6 \
> -nlcg_ksp_ksp_max_it 300 \
> -nlcg_ksp_max_it 200 \
> -nlcg_ksp_converged_reason \
> -nlcg_ksp_monitor_true_residual \
> I sometimes randomly will get an error like the following:
> Residual norms for nlcg_ solve.
>   0 KSP unpreconditioned resid norm 3.371606868500e+04 true resid norm 3.371606868500e+04 ||r(i)||/||b|| 1.000000000000e+00
>   1 KSP unpreconditioned resid norm 2.322590778002e+02 true resid norm 2.322590778002e+02 ||r(i)||/||b|| 6.888676137487e-03
>   2 KSP unpreconditioned resid norm 8.262440884758e+01 true resid norm 8.262440884758e+01 ||r(i)||/||b|| 2.450594392232e-03
>   3 KSP unpreconditioned resid norm 3.660428333809e+01 true resid norm 3.660428333809e+01 ||r(i)||/||b|| 1.085662853522e-03
>   3 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm           -nan ||r(i)||/||b||           -nan
> Linear nlcg_ solve did not converge due to DIVERGED_PC_FAILED iterations 3
>                PC_FAILED due to SUBPC_ERROR 
> This usually happens after a few nonlinear optimization iterations, meaning that it’s worked perfectly fine until this point.
> How can using jacobi pc all of a sudden cause a NaN, if it’s worked perfectly fine before?
> Some other errors in the output log file are as follows, although I have no idea if they result from the above error or not:
> [13]PETSC ERROR: Object is in wrong state
> [13]PETSC ERROR: Clearing DM of global vectors that has a global vector obtained with DMGetGlobalVector()
> [13]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
> [13]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019
> [27]PETSC ERROR: #1 DMClearGlobalVectors() line 196 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dmget.c
> [27]PETSC ERROR: Configure options --with-clean=1 --with-scalar-type=complex --with-debugging=0 --with-fortran=1 --with-blaslapack-dir=/state/std2/intel_2018/m
> kl --with-mkl_pardiso-dir=/state/std2/intel_2018/mkl --with-mkl_cpardiso-dir=/state/std2/intel_2018/mkl --download-mumps=../external/mumps_v5.1.2-p1.tar.gz --d
> ownload-scalapack=../external/scalapack-2.0.2.tgz --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicc --FOPTFLAGS="-O3 -xHost" --COPTFLAGS="-O3 -xHost" --CXX
> OPTFLAGS="-O3 -xHost"
> #2 DMDestroy() line 752 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dm.c
> [72]PETSC ERROR: #3 PetscObjectDereference() line 624 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c
> [72]PETSC ERROR: #4 PetscObjectListDestroy() line 156 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/olist.c
> [72]PETSC ERROR: #5 PetscHeaderDestroy_Private() line 122 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c
> [72]PETSC ERROR: #6 VecDestroy() line 412 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/vec/vec/interface/vector.c
> This is a large run taking many hours to get to this problem. I will try to run in debug mode, but given that this seems to be randomly happening (this has happened maybe 30% of the time I have used the fgmres option), there is no guarantee that will show anything useful. Valgrind is obviously out of the question for a large run, and I have yet to reproduce this on a smaller run.
> Anyone have any ideas as to what’s causing this?
> Thanks in advance,
> Randy M.

