[petsc-users] DIVERGED_NONLINEAR_SOLVE error

Jin, Shuangshuang Shuangshuang.Jin at pnnl.gov
Fri Aug 2 18:04:37 CDT 2013


Thank you, Mat, problem resolved. It's floating point errors. Located it after turning on the debugging option.

Thanks,
Shuangshuang

From: Matthew Knepley [mailto:knepley at gmail.com]
Sent: Friday, August 02, 2013 1:43 PM
To: Jin, Shuangshuang
Cc: petsc-users at mcs.anl.gov
Subject: Re: [petsc-users] DIVERGED_NONLINEAR_SOLVE error

On Sat, Aug 3, 2013 at 4:41 AM, Jin, Shuangshuang <Shuangshuang.Jin at pnnl.gov<mailto:Shuangshuang.Jin at pnnl.gov>> wrote:
Is there a quick way to turn on the debugging in my build, or I have to do the following again?

Configure options --with-scalar-type=complex --with-clanguage=C++ PETSC_ARCH=arch-complex --with-fortran-kernels=generic --download-superlu_dist --download-mumps --download-scalapack --download-parmetis --download-metis --download-elemental

It usually takes over an hour to reconfigure PETSc on my machine...

That is the way. I think its time to upgrade your Commodore 64 :)

    Matt


Thanks,
Shuangshuang

From: Matthew Knepley [mailto:knepley at gmail.com<mailto:knepley at gmail.com>]
Sent: Friday, August 02, 2013 1:33 PM
To: Jin, Shuangshuang
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] DIVERGED_NONLINEAR_SOLVE error

On Sat, Aug 3, 2013 at 4:22 AM, Jin, Shuangshuang <Shuangshuang.Jin at pnnl.gov<mailto:Shuangshuang.Jin at pnnl.gov>> wrote:
Hello,

     My code solves a linear system AX=B using superlu_dist in PETSc, and use some of X's data to solve a DAE problem. I get a very wild error:

     When I use less than 8 processors to run the code, it runs just fine with correct results. When I use greater than 8 processors, such as 16 or 32 processors, I'll get an error and a lot of generated core.##### files.

[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR:   !
[0]PETSC ERROR: TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery!
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Development GIT revision: a0a914e661bf6402b8edabe0f5a2dad46323f69f  GIT Date: 2013-06-05 14:18:39 -0500
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: dynSim on a arch-complex named node0055.local by d3m956 Fri Aug  2 11:56:10 2013
[0]PETSC ERROR: Libraries linked from /pic/projects/ds/petsc-dev.6.06.13/arch-complex/lib
[0]PETSC ERROR: Configure run at Fri Jul 26 14:32:37 2013
[0]PETSC ERROR: Configure options --with-scalar-type=complex --with-clanguage=C++ PETSC_ARCH=arch-complex --with-fortran-kernels=generic --download-superlu_dist --download-mumps --download-scalapack --download-parmetis --download-metis --download-elemental --with-debugging=0
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: TSStep() line 2515 in /pic/projects/ds/petsc-dev.6.06.13/src/ts/interface/ts.c
[0]PETSC ERROR: TSSolve() line 2632 in /pic/projects/ds/petsc-dev.6.06.13/src/ts/interface/ts.c
[0]PETSC ERROR: simu() line 566 in "unknowndirectory/"simulation.C
[0]PETSC ERROR: runSimulation() line 99 in "unknowndirectory/"dynSim.h
[node0055:32539] *** Process received signal ***
[node0055:32535] *** Process received signal ***
[node0055:32535] Signal: Aborted (6)
[node0055:32535] Signal code:  (24153104)
[node0055:32534] *** Process received signal ***
[node0055:32534] Signal: Aborted (6)
[node0055:32534] Signal code:  (24199552)
[node0055:32539] Signal: Aborted (6)
[node0055:32539] Signal code:  (24157648)
[node0055:32537] *** Process received signal ***
[node0055:32537] Signal: Aborted (6)
[node0055:32537] Signal code:  (24546704)
[node0055:32538] *** Process received signal ***

The Error Message from PETSc pointed out that "TSStep has failed due to DIVERGED_NONLINEAR_SOLVE, increase -ts_max_snes_failures or make negative to attempt recovery!", but I think it's because the superlu_dist computed an all "nan" X as I printed it out.

However, I don't understand why using 8 or 16 processors should make such a difference.

It sounds like you are computing a NaN somewhere, possibly your residual evaluation. However, we should
catch this when we evaluate the norm. Please turn on debugging in your build.

   Matt

Can anyone give me some help for the trouble shooting?

Thanks,
Shuangshuang




--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener



--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130802/8aa4831c/attachment.html>


More information about the petsc-users mailing list