[petsc-users] consider an error message as a petscinfo??

Barry Smith bsmith at mcs.anl.gov
Fri Feb 24 14:56:13 CST 2017


    Fande,

    Yes. Say one is doing a timestepping and using a direct solver. With time stepping we do not want to necessarily stop (in fact we almost never) on a failed solve (due to for example a failed factorization). We want the code to continue up the stack until it gets to the time stepper with the information that the linear solver failed; then the timestepper can decide what to do, decrease the time step and try again (common) or switch to another method etc. The same could be true for linear solvers called from within optimization routines. 

    The reason we do the funny business with putting infinity into the vector is so that errors that occur on any process get propagated to all processes during the next inner product or norm computation, otherwise we would need to have some "side channel" communication which MPI doesn't really provide to propagate errors to all processes, this would be a lot of performance crippling communication that would have to take place during the entire solution process. Plus our approach since it doesn't change the flow of control cannot result in lost memory etc. Note that C (and MPI) don't have any exception mechanism that could be used to handle these "exceptional cases" directly.

   Note that you can use options like -snes_error_if_not_converged or -ksp_error_if_not_converged to force the error to be set as soon as any problem with convergence or zero factorization is found (good for debugging, but not usually for production unless production is solving a linear system only).

   Barry

> On Feb 24, 2017, at 2:40 PM, Kong, Fande <fande.kong at inl.gov> wrote:
> 
> Hi All,
> 
> In MatSolve(), there is a piece of code:
> 
>   if (mat->errortype) {
>     ierr = PetscInfo1(mat,"MatFactorError %D\n",mat->errortype);CHKERRQ(ierr);
>     ierr = VecSetInf(x);CHKERRQ(ierr);
>   } else {
>     ierr = (*mat->ops->solve)(mat,b,x);CHKERRQ(ierr);
>   }
> 
> If a direct solver such as LU or superlu fails to create a factorization, why we do not stop here and throw out an error message here or earlier?  Now we just let solver keep doing garbage computation, and finally we have a solution like this:
> 
> Vec Object: 1 MPI processes
>   type: mpi
> Process [0]
> inf.
> inf.
> inf.
> inf.
> inf.
> inf.
> inf.
> inf.
> inf.
> inf.
> inf.
> inf.
> inf.
> inf.
> inf.
> inf.
> 
> Any particular reason to handle the thing in this way?
> 
> Fande,
> 
> 



More information about the petsc-users mailing list