<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">In solving a nonlinear optimization problem, I was recently experimenting with fgmres using the following options:<div class=""><br class=""></div><div class=""><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">-nlcg_ksp_type fgmres \<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">-nlcg_pc_type ksp \<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">-nlcg_ksp_ksp_type bcgs \<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">-nlcg_ksp_pc_type jacobi \<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">-nlcg_ksp_rtol 1e-6 \<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">-nlcg_ksp_ksp_max_it 300 \<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">-nlcg_ksp_max_it 200 \<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">-nlcg_ksp_converged_reason \<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">-nlcg_ksp_monitor_true_residual \</div></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class="">I sometimes randomly will get an error like the following:</div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;" class=""><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">Residual norms for nlcg_ solve.<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""> 0 KSP unpreconditioned resid norm 3.371606868500e+04 true resid norm 3.371606868500e+04 ||r(i)||/||b|| 1.000000000000e+00<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""> 1 KSP unpreconditioned resid norm 2.322590778002e+02 true resid norm 2.322590778002e+02 ||r(i)||/||b|| 6.888676137487e-03<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""> 2 KSP unpreconditioned resid norm 8.262440884758e+01 true resid norm 8.262440884758e+01 ||r(i)||/||b|| 2.450594392232e-03<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""> 3 KSP unpreconditioned resid norm 3.660428333809e+01 true resid norm 3.660428333809e+01 ||r(i)||/||b|| 1.085662853522e-03<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""> 3 KSP unpreconditioned resid norm 0.000000000000e+00 true resid norm -nan ||r(i)||/||b|| -nan<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">Linear nlcg_ solve did not converge due to DIVERGED_PC_FAILED iterations 3<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""> PC_FAILED due to SUBPC_ERROR <o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><o:p class=""> </o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><o:p class="">This usually happens after a few nonlinear optimization iterations, meaning that it’s worked perfectly fine until this point.</o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">How can using jacobi pc all of a sudden cause a NaN, if it’s worked perfectly fine before?</div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">Some other errors in the output log file are as follows, although I have no idea if they result from the above error or not:</div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">[13]PETSC ERROR: Object is in wrong state<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">[13]PETSC ERROR: Clearing DM of global vectors that has a global vector obtained with DMGetGlobalVector()<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">[13]PETSC ERROR: See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html" style="color: rgb(149, 79, 114);" class="">http://www.mcs.anl.gov/petsc/documentation/faq.html</a> for trouble shooting.<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">[13]PETSC ERROR: Petsc Release Version 3.11.1, Apr, 12, 2019<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><o:p class=""> </o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><o:p class=""> </o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">[27]PETSC ERROR: #1 DMClearGlobalVectors() line 196 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dmget.c<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">[27]PETSC ERROR: Configure options --with-clean=1 --with-scalar-type=complex --with-debugging=0 --with-fortran=1 --with-blaslapack-dir=/state/std2/intel_2018/m<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">kl --with-mkl_pardiso-dir=/state/std2/intel_2018/mkl --with-mkl_cpardiso-dir=/state/std2/intel_2018/mkl --download-mumps=../external/mumps_v5.1.2-p1.tar.gz --d<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">ownload-scalapack=../external/scalapack-2.0.2.tgz --with-cc=mpiicc --with-fc=mpiifort --with-cxx=mpiicc --FOPTFLAGS="-O3 -xHost" --COPTFLAGS="-O3 -xHost" --CXX<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">OPTFLAGS="-O3 -xHost"<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><o:p class=""> </o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><o:p class=""> </o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">#2 DMDestroy() line 752 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/dm/interface/dm.c<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">[72]PETSC ERROR: #3 PetscObjectDereference() line 624 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">[72]PETSC ERROR: #4 PetscObjectListDestroy() line 156 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/olist.c<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">[72]PETSC ERROR: #5 PetscHeaderDestroy_Private() line 122 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/sys/objects/inherit.c<o:p class=""></o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">[72]PETSC ERROR: #6 VecDestroy() line 412 in /state/std2/FEMI/PETSc/petsc-3.11.1/src/vec/vec/interface/vector.c</div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">This is a large run taking many hours to get to this problem. I will try to run in debug mode, but given that this seems to be randomly happening (this has happened maybe 30% of the time I have used the fgmres option), there is no guarantee that will show anything useful. Valgrind is obviously out of the question for a large run, and I have yet to reproduce this on a smaller run.</div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">Anyone have any ideas as to what’s causing this?</div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">Thanks in advance,</div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class=""><br class=""></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt;" class="">Randy M.</div></div></div></body></html>