[petsc-users] Snes behavior

Barry Smith bsmith at mcs.anl.gov
Sun Jan 10 22:22:00 CST 2010


    Like I said in my previous mail this means that the Jacobian being  
computing before was not accurate enough for some reason. This will  
happen if your function evaluation does NOT respect the stencil you  
provided when you created the DA. For example if you told the DA to  
use a stencil width of 1 but somewhere your function evaluation grabs  
a value 2 cells away and uses it in the computation. For example on  
some boundary condition. Or you told it was a box stencil but the  
function evaluation grabbed a value from the diagonal stencil.
There is a 99% probability this is your problem.

   If you cannot find the bug then send your function evaluation and  
how you form the DA to petsc-maint at mcs.anl.gov and we'll find it.


    Barry


On Jan 10, 2010, at 9:13 PM, Ryan Yan wrote:

> Hi Barry,
> Here is the result for -snes_mf_operator:
>
> yy2250 at sci-m8n6 ~/test/1d $ srun -p sci-comp -N 2 -n 2 ./ 
> HeatProfile1D -snes_mf_operator -dmmg_grid_sequence -pc_type bjacobi  
> -snes_rtol 1e-15 -snes_monitor -ksp_rtol 1.e-12 - 
> ksp_converged_reason -snes_converged_reason > out.mf_operator &
>
>
> yy2250 at sci-m8n6 ~/test/1d $ cat out.mf_operator
>       0 SNES Function norm 1.411468156752e+08
> Linear solve converged due to CONVERGED_ITS iterations 1
>       1 SNES Function norm 1.396727866424e+08
> Linear solve converged due to CONVERGED_ITS iterations 1
> Nonlinear solve did not converge due to DIVERGED_LS_FAILURE
>     0 SNES Function norm 1.045985796073e+08
> Linear solve converged due to CONVERGED_RTOL iterations 44
>     1 SNES Function norm 7.912999174650e+07
> Linear solve converged due to CONVERGED_RTOL iterations 38
>     2 SNES Function norm 6.079225520436e+07
> Linear solve converged due to CONVERGED_RTOL iterations 66
>     3 SNES Function norm 4.610252725173e+07
> Linear solve converged due to CONVERGED_RTOL iterations 66
>     4 SNES Function norm 3.333587574896e+07
> Linear solve converged due to CONVERGED_RTOL iterations 46
>     5 SNES Function norm 2.326587775240e+07
> Linear solve converged due to CONVERGED_RTOL iterations 59
>     6 SNES Function norm 1.233942497170e+05
> Linear solve converged due to CONVERGED_RTOL iterations 38
>     7 SNES Function norm 2.309536978272e+03
> Linear solve converged due to CONVERGED_RTOL iterations 38
>     8 SNES Function norm 7.064055492974e-03
> Linear solve converged due to CONVERGED_RTOL iterations 38
>     9 SNES Function norm 3.419662119536e-06
> Nonlinear solve converged due to CONVERGED_PNORM_RELATIVE
>   0 SNES Function norm 3.420486720202e+09
> Linear solve converged due to CONVERGED_RTOL iterations 10
>   1 SNES Function norm 1.426355492941e+08
> Linear solve converged due to CONVERGED_RTOL iterations 36
>   2 SNES Function norm 1.429830907344e+06
> Linear solve converged due to CONVERGED_RTOL iterations 37
>   3 SNES Function norm 1.755275678702e+03
> Linear solve converged due to CONVERGED_RTOL iterations 37
>   4 SNES Function norm 1.216161531652e-03
> Linear solve converged due to CONVERGED_RTOL iterations 36
>   5 SNES Function norm 4.309776935609e-06
> Nonlinear solve converged due to CONVERGED_PNORM_RELATIVE
> Number of Newton iterations = 5
> Converged reason is 4
>
> This time the convergence is very aggressive, and it is quadratic  
> from iteration 0 to iteration 4.
>
> The solution vector x from this run is the same as the run without  
> the option "-snes_mf_operator". Any hint for why this is totally  
> quadratic convergence?
>
> Thank you very much,
>
> Yan
>
>
> On Sun, Jan 10, 2010 at 7:25 PM, Barry Smith <bsmith at mcs.anl.gov>  
> wrote:
>
> On Jan 10, 2010, at 6:09 PM, Ryan Yan wrote:
>
> Hi Barry,
>
> I run the code on another machine.
>
> yy2250 at sci-m8n6 ~/test/1d $ srun -p sci-comp -N 2 -n 2 ./ 
> HeatProfile1D -snes_type test > out.snes_test
> [0]PETSC ERROR: SNESSolve() line 2221 in src/snes/interface/snes.c
> [1]PETSC ERROR: SNESSolve() line 2221 in src/snes/interface/snes.c
> [0]PETSC ERROR: DMMGSolveSNES() line 510 in src/snes/utils/damgsnes.c
> [0]PETSC ERROR: DMMGSolve() line 372 in src/snes/utils/damg.c
> [0]PETSC ERROR: main() line 270 in /home/vyan2000/local/PPETSc/ 
> petsc-2.3.3-p15/src/ksp/ksp/examples/tutorials/ 
> NCprojectHeatProfile1D.c
> application called MPI_Abort(MPI_COMM_WORLD, 73) - process 0
> [1]PETSC ERROR: DMMGSolveSNES() line 510 in src/snes/utils/damgsnes.c
> [1]PETSC ERROR: DMMGSolve() line 372 in src/snes/utils/damg.c
> [1]PETSC ERROR: main() line 270 in /home/vyan2000/local/PPETSc/ 
> petsc-2.3.3-p15/src/ksp/ksp/examples/tutorials/ 
> NCprojectHeatProfile1D.c
> In: PMI_Abort(73, application called MPI_Abort(MPI_COMM_WORLD, 73) -  
> process 0)
> application called MPI_Abort(MPI_COMM_WORLD, 73) - process 1
> In: PMI_Abort(73, application called MPI_Abort(MPI_COMM_WORLD, 73) -  
> process 1)
> srun: error: task 1: Exited with exit code 73
> srun: error: task 0: Exited with exit code 73
>
> I was using 3 level grids. The error comes out during the snes  
> solving on the fine level grid. The output in the file out.snes_test  
> is  :
> yy2250 at sci-m8n6 ~/test/1d $ cat out.snes_test
> Testing hand-coded Jacobian, if the ratio is
> O(1.e-8), the hand-coded Jacobian is probably correct.
> Run with -snes_test_display to show difference
> of hand-coded and finite difference Jacobian.
> Norm of matrix ratio 1.21314e-10 difference 1.41447
> Norm of matrix ratio 4.30115e-06 difference 1.41427
> Norm of matrix ratio 4.30108e-06 difference 1.41427
>
>
>   Ok, Jacobians being used are probably ok.
>
>
>
> For the -snes_mf_operator,
> yy2250 at sci-m8n6 ~/test/1d $ srun -p sci-comp -N 2 -n 2 ./ 
> HeatProfile1D -snes_mf_operator
> the program is still running. I suspect that the program hanged.
>
>  It's not hanging. It is just running. Use -snes_monitor -ksp_rtol  
> 1.e-12 -ksp_converged_reason -snes_converged_reason and run again
>
>   Barry
>
>
>
> Is there any useful information in the file "out.snes_test"?
>
> BTW, I did not hand-code any Jacobian, since I passed 0 for the  
> Jacobian evaluation subroutine in the DMMGSetSNESLocal().
>
> Thank you very much,
>
> Yan
>
>
>
> On Sun, Jan 10, 2010 at 6:11 PM, Barry Smith <bsmith at mcs.anl.gov>  
> wrote:
>
>  That message ain't from PETSc. Something is likely killing the job.
>
>  Barry
>
>
> On Jan 10, 2010, at 5:09 PM, Ryan Yan wrote:
>
> Hi Barry,
> I got the following result:
>
> vyan2000 at vyan2000-linux ~/NCproject/general $ mpirun -np 1 ./ 
> HeatProfile1D -snes_mf_operator
> Alarm clock
> vyan2000 at vyan2000-linux ~/NCproject/general $ mpirun -np 1 ./ 
> HeatProfile1D -snes_type test
> Alarm clock
>
> Are they normal responses and what do they indicate?
>
> Thanks a lot,
>
> Yan
>
>
> On Sun, Jan 10, 2010 at 5:57 PM, Barry Smith <bsmith at mcs.anl.gov>  
> wrote:
>
>  JUST run with -snes_mf_operator and then with -snes_type test NOT  
> together,
>
>  Barry
>
>
> On Jan 10, 2010, at 4:42 PM, Ryan Yan wrote:
>
> Hi Barry,
> Please see reply below,
>
> On Sun, Jan 10, 2010 at 4:57 PM, Barry Smith <bsmith at mcs.anl.gov>  
> wrote:
>
> On Jan 10, 2010, at 3:52 PM, Ryan Yan wrote:
>
> Hi Barry,
> Yes, exactly. The original multi-components system scale quite  
> unevenly. I will try to rescale it.
> Could this be helpful to show some promise on quadratic convergence?
>
>  I won't be concerned about "quadratic convergence" I'd only be  
> concerned that it is converging to the correct answer and that you  
> are getting close enough to the correct answer.
>
> Yes, I agree.
>
>  You can run with -snes_mf_operator and -snes_type test to verify if  
> the Jacobian being computed is accurate. Perhaps in your function  
> evaluation you are not using the stencil that you set with the DA,  
> this would cause the wrong Jacobian to be computed.
>
>
> After passing in -snes_mf_operator and -snes_type test as follows:
> vyan2000 at vyan2000-linux ~/NCproject/general : mpirun -np 2 ./ 
> HeatProfile1D -snes_mf_operator -snes_type test
> I got errors, as expected:
>
> [0]PETSC ERROR: --------------------- Error Message  
> ------------------------------------
> [0]PETSC ERROR: Invalid argument!
> [0]PETSC ERROR: Cannot test with alternative preconditioner!
> [0]PETSC ERROR:  
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.0.0, Patch 5, Mon Apr 13  
> 09:15:37 CDT 2009
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:  
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ./HeatProfile1D on a O-hypre-p named vyan2000-linux  
> by vyan2000 Sun Jan 10 17:16:00 2010
> [0]PETSC ERROR: Libraries linked from /home/vyan2000/local/PPETSc/ 
> petsc-3.0.0-p5/O-hypre-prometheus/lib
> [0]PETSC ERROR: Configure run at Thu Jun 25 13:49:36 2009
> [0]PETSC ERROR: Configure options --download-mpich=1 --with- 
> debugger=gdb --download-hypre=1 --download-parmetis=1 --download- 
> prometheus=1 --with-shared=0
> [0]PETSC ERROR:  
> ------------------------------------------------------------------------
> [0]PETSC ERROR: SNESSolve_Test() line 28 in src/snes/impls/test/ 
> snestest.c
> [0]PETSC ERROR: SNESSolve() line 2221 in src/snes/interface/snes.c
> [0]PETSC ERROR: DMMGSolveSNES() line 510 in src/snes/utils/damgsnes.c
> [0]PETSC ERROR: DMMGSolve() line 372 in src/snes/utils/damg.c
> [0]PETSC ERROR: main() line 270 in /home/vyan2000/local/PPETSc/ 
> petsc-2.3.3-p15/src/ksp/ksp/examples/tutorials/ 
> NCprojectHeatProfile1D.c
> application called MPI_Abort(MPI_COMM_WORLD, 62) - process 0[cli_0]:  
> aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 62) - process 0
>
> The error makes sense, since I did not pass in the analytical  
> Jacobian for the preconditioner matrix. In stead, I was using  
> DMMGSetSNESLocal(dmmg,FormFunctionLocal, 
> 0,ad_FormFunctionLocal,admf_FormFunctionLocal). I am going to change  
> the code a bit, pass in the analytical Jacobian and do the - 
> snes_type test.
>
> I will bear your suggestion in mind during the test.
>
> Thanks a lot,
>
> Yan
>
>
>
>  Barry
>
>
>
> Thanks a lot,
>
> Yan
>
> On Sun, Jan 10, 2010 at 4:35 PM, Barry Smith <bsmith at mcs.anl.gov>  
> wrote:
>
>  You already got a 10^16 drop in the residual norm. It is not  
> realistic to expect to get much more than that for double precision  
> calculations. Perhaps your original F() has some funky scaling of  
> different components that you can fix.
>
>
>
>  Barry
>
>
> On Jan 10, 2010, at 2:55 PM, Ryan Yan wrote:
>
> Hi All,
> I am solving a nonlinear system using snes. The -snes_monitor option  
> has the following output:
>
>  0 SNES Function norm 2.640163923729e+09
>  1 SNES Function norm 1.047643565314e+08
>  2 SNES Function norm 1.712732074788e+06
>  3 SNES Function norm 1.002169173269e+04
>  4 SNES Function norm 1.655878303433e+03
>  5 SNES Function norm 3.746498305706e+02
>  6 SNES Function norm 8.317435704773e+01
>  7 SNES Function norm 1.857639969641e+01
>  8 SNES Function norm 4.149691057773e+00
>  9 SNES Function norm 9.265604042412e-01
>  10 SNES Function norm 2.069527103214e-01
>  11 SNES Function norm 4.624186491082e-02
>  12 SNES Function norm 1.035558432688e-02
>  13 SNES Function norm 2.341362958811e-03
>  14 SNES Function norm 5.507445427277e-04
>  15 SNES Function norm 1.485123568354e-04
>  16 SNES Function norm 5.180043781814e-05
>  17 SNES Function norm 2.341966514486e-05
>  18 SNES Function norm 1.344936158651e-05
>  19 SNES Function norm 1.054812641176e-05
> Number of Newton iterations = 19
> Converged reason is 4
>
> It looks like the iterate never falls into a quadratic convergence  
> region before it converges. Is there any hint to understand this  
> behavior?
>
> Thanks a lot,
>
> Yan
>
>
>
>
>
>
>
>
>
>
>
>



More information about the petsc-users mailing list