[petsc-users] DIVERGED_NONLINEAR_SOLVE error

Jin, Shuangshuang Shuangshuang.Jin at pnnl.gov
Fri Aug 2 18:05:14 CDT 2013


Thank you. I'll definitely try this to make things easier. 

Shuangshuang

-----Original Message-----
From: Barry Smith [mailto:bsmith at mcs.anl.gov] 
Sent: Friday, August 02, 2013 2:03 PM
To: Jin, Shuangshuang
Cc: Matthew Knepley; petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] DIVERGED_NONLINEAR_SOLVE error


  Use two PETSC_ARCH   PETSC_ARCH=arch-complex-debug and PETSC_ARCH=arch-complex-opt then you can switch back and forth between them without rebuilding.

   Barry

On Aug 2, 2013, at 3:41 PM, "Jin, Shuangshuang" <Shuangshuang.Jin at pnnl.gov> wrote:

> Is there a quick way to turn on the debugging in my build, or I have to do the following again?
>  
> Configure options --with-scalar-type=complex --with-clanguage=C++ 
> PETSC_ARCH=arch-complex --with-fortran-kernels=generic 
> --download-superlu_dist --download-mumps --download-scalapack 
> --download-parmetis --download-metis --download-elemental
>  
> It usually takes over an hour to reconfigure PETSc on my machine...
>  
> Thanks,
> Shuangshuang
>  
> From: Matthew Knepley [mailto:knepley at gmail.com]
> Sent: Friday, August 02, 2013 1:33 PM
> To: Jin, Shuangshuang
> Cc: petsc-users at mcs.anl.gov
> Subject: Re: [petsc-users] DIVERGED_NONLINEAR_SOLVE error
>  
> On Sat, Aug 3, 2013 at 4:22 AM, Jin, Shuangshuang <Shuangshuang.Jin at pnnl.gov> wrote:
> Hello,
>  
>      My code solves a linear system AX=B using superlu_dist in PETSc, and use some of X's data to solve a DAE problem. I get a very wild error:
>  
>      When I use less than 8 processors to run the code, it runs just fine with correct results. When I use greater than 8 processors, such as 16 or 32 processors, I'll get an error and a lot of generated core.##### files.
>  
> [0]PETSC ERROR: --------------------- Error Message ------------------------------------
> [0]PETSC ERROR:   !
> [0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery!
> [0]PETSC ERROR: 
> ----------------------------------------------------------------------
> -- [0]PETSC ERROR: Petsc Development GIT revision: 
> a0a914e661bf6402b8edabe0f5a2dad46323f69f  GIT Date: 2013-06-05 
> 14:18:39 -0500 [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR: 
> ----------------------------------------------------------------------
> -- [0]PETSC ERROR: dynSim on a arch-complex named node0055.local by 
> d3m956 Fri Aug  2 11:56:10 2013 [0]PETSC ERROR: Libraries linked from 
> /pic/projects/ds/petsc-dev.6.06.13/arch-complex/lib
> [0]PETSC ERROR: Configure run at Fri Jul 26 14:32:37 2013 [0]PETSC 
> ERROR: Configure options --with-scalar-type=complex 
> --with-clanguage=C++ PETSC_ARCH=arch-complex 
> --with-fortran-kernels=generic --download-superlu_dist 
> --download-mumps --download-scalapack --download-parmetis 
> --download-metis --download-elemental --with-debugging=0 [0]PETSC 
> ERROR: 
> ----------------------------------------------------------------------
> -- [0]PETSC ERROR: TSStep() line 2515 in 
> /pic/projects/ds/petsc-dev.6.06.13/src/ts/interface/ts.c
> [0]PETSC ERROR: TSSolve() line 2632 in 
> /pic/projects/ds/petsc-dev.6.06.13/src/ts/interface/ts.c
> [0]PETSC ERROR: simu() line 566 in "unknowndirectory/"simulation.C 
> [0]PETSC ERROR: runSimulation() line 99 in "unknowndirectory/"dynSim.h 
> [node0055:32539] *** Process received signal *** [node0055:32535] *** 
> Process received signal *** [node0055:32535] Signal: Aborted (6) 
> [node0055:32535] Signal code:  (24153104) [node0055:32534] *** Process 
> received signal *** [node0055:32534] Signal: Aborted (6) 
> [node0055:32534] Signal code:  (24199552) [node0055:32539] Signal: 
> Aborted (6) [node0055:32539] Signal code:  (24157648) [node0055:32537] 
> *** Process received signal *** [node0055:32537] Signal: Aborted (6) 
> [node0055:32537] Signal code:  (24546704) [node0055:32538] *** Process 
> received signal ***
>  
> The Error Message from PETSc pointed out that "TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery!", but I think it's because the superlu_dist computed an all "nan" X as I printed it out.
>  
> However, I don't understand why using 8 or 16 processors should make such a difference.
>  
> It sounds like you are computing a NaN somewhere, possibly your 
> residual evaluation. However, we should catch this when we evaluate the norm. Please turn on debugging in your build.
>  
>    Matt
>  
> Can anyone give me some help for the trouble shooting?
>  
> Thanks,
> Shuangshuang
>  
> 
> 
>  
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener



More information about the petsc-users mailing list