[petsc-users] Krylov Method Takes Too Long to Solve
Matthew Knepley
knepley at gmail.com
Fri Apr 22 22:56:17 CDT 2016
On Fri, Apr 22, 2016 at 10:50 PM, Jie Cheng <chengj5 at rpi.edu> wrote:
> Hi Jed and Barry
>
> Thanks for your help. After I reconfigured PETSc without debugging, it did
> become much faster. But still not fast enough. The problem remains: what
> could be the factor that slows the solver down? In this example, only 4662
> nodes and 2920 elements are used. Typically I use 3D hexahedral element,
> the number of degrees of freedom in the mixed formulation I use is
> 3*4662+2920 = 16906. At this point, my code runs only in serial. I am
> planning to parallelize it and then work with much finer mesh in the
> future. If as Barry said, the number of degrees of freedom is too small for
> iterative solvers, then which direct solver do you recommend to use?
>
SuperLU or MUMPS. Iterative solvers usually do not win until > 50K unknowns
or maybe 100K depending on your system.
Also, the solver you showed last time made no sense. You had GMRES/ILU(0)
preconditioning GMRES. You should remove
the outer solver, and use a better preconditioner since iLU(0) is quite
weak.
Thanks,
Matt
> Here is the log when I ran without debugging. It is not fast enough
> because this is only 1 step, and there are 99 more steps.
>
> Thank you.
> Jie
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./main on a arch-darwin-c-opt named jiecheng.local with 1 processor, by
> Jie Fri Apr 22 22:36:38 2016
> Using Petsc Release Version 3.6.3, Dec, 03, 2015
>
> Max Max/Min Avg Total
> Time (sec): 1.520e+02 1.00000 1.520e+02
> Objects: 6.800e+01 1.00000 6.800e+01
> Flops: 2.590e+11 1.00000 2.590e+11 2.590e+11
> Flops/sec: 1.704e+09 1.00000 1.704e+09 1.704e+09
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 1.5195e+02 100.0% 2.5896e+11 100.0% 0.000e+00
> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this
> phase
> %M - percent messages in this phase %L - percent message lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMDot 49906 1.0 8.4904e+00 1.0 2.61e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 6 10 0 0 0 6 10 0 0 0 3074
> VecNorm 51599 1.0 1.0644e+00 1.0 1.74e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 1 0 0 0 1 1 0 0 0 1639
> VecScale 51587 1.0 4.5893e-01 1.0 8.72e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1900
> VecCopy 1682 1.0 2.9097e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 1781 1.0 1.9067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 3324 1.0 5.8797e-02 1.0 1.12e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1911
> VecAYPX 12 1.0 3.0708e-04 1.0 4.06e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1321
> VecMAXPY 51581 1.0 7.2805e+00 1.0 2.78e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 5 11 0 0 0 5 11 0 0 0 3817
> VecAssemblyBegin 18 1.0 4.0531e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAssemblyEnd 18 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecNormalize 51581 1.0 1.5588e+00 1.0 2.62e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 1 0 0 0 1 1 0 0 0 1678
> MatMult 51555 1.0 6.2301e+01 1.0 1.01e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 41 39 0 0 0 41 39 0 0 0 1622
> MatSolve 51561 1.0 5.8966e+01 1.0 1.01e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 39 39 0 0 0 39 39 0 0 0 1714
> MatLUFactorNum 6 1.0 1.4240e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1569
> MatILUFactorSym 1 1.0 1.0796e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 6 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 6 1.0 6.2988e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 1 1.0 5.9605e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 5.3310e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 6 1.0 5.8441e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPGMRESOrthog 49906 1.0 1.5348e+01 1.0 5.22e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 10 20 0 0 0 10 20 0 0 0 3401
> KSPSetUp 7 1.0 1.5509e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 6 1.0 1.3919e+02 1.0 2.59e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 92100 0 0 0 92100 0 0 0 1861
> PCSetUp 6 1.0 1.5490e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1442
> PCApply 20 1.0 1.3901e+02 1.0 2.59e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 91100 0 0 0 91100 0 0 0 1861
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Vector 56 56 7659456 0
> Matrix 2 2 24271340 0
> Krylov Solver 2 2 36720 0
> Preconditioner 2 2 1832 0
> Index Set 5 5 237080 0
> Viewer 1 0 0 0
>
> ========================================================================================================================
> Average time to get PetscTime(): 2.14577e-07
> #PETSc Option Table entries:
> -log_summary
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --with-debugging=no
>
> On Apr 22, 2016, at 6:30 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> Allow me to call your attention to some fine print:
>
> ##########################################################
> # #
> # WARNING!!! #
> # #
> # This code was compiled with a debugging option, #
> # To get timing results run ./configure #
> # using --with-debugging=no, the performance will #
> # be generally two or three times faster. #
> # #
> ##########################################################
>
>
> Please always use "reply-all" so that your messages go to the list.
> This is standard mailing list etiquette. It is important to preserve
> threading for people who find this discussion later and so that we do
> not waste our time re-answering the same questions that have already
> been answered in private side-conversations. You'll likely get an
> answer faster that way too.
>
> Jie Cheng <chengj5 at rpi.edu> writes:
>
> Hello Jed
>
> Thanks for your reply. Here is the log: (the first a couple of lines are
> printed by my own code during a time step, “iteration” means the number of
> Newton’s iteration, “GMRES_iteration” denotes the number of GMRES
> iterations it takes in a particular Newton’s iteration, err1 and err2 are
> my criteria of convergence)
>
>
> Number of nonzeros allocated: 988540
> Step = 1, LoadFactor = 1.0000e-02
> REASON: 2
> Iteration = 1 GMRES_Iteration = 2 Err1 = 1.0000e+00,
> Err2 = 1.8375e-11
> REASON: 2
> Iteration = 2 GMRES_Iteration = 4 Err1 = 4.1151e-02,
> Err2 = 5.9467e-11
> REASON: 2
> Iteration = 3 GMRES_Iteration = 2 Err1 = 1.0265e-02,
> Err2 = 2.1268e-12
> REASON: 2
> Iteration = 4 GMRES_Iteration = 2 Err1 = 8.9824e-04,
> Err2 = 1.2622e-14
> REASON: 2
> Iteration = 5 GMRES_Iteration = 2 Err1 = 1.5741e-04,
> Err2 = 8.1248e-16
> REASON: 2
> Iteration = 6 GMRES_Iteration = 2 Err1 = 7.1515e-06,
> Err2 = 1.6605e-16
>
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
> -fCourier9' to print this document ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./main on a arch-darwin-c-debug named jiecheng.local with 1 processor, by
> Jie Fri Apr 22 18:06:33 2016
> Using Petsc Release Version 3.6.3, Dec, 03, 2015
>
> Max Max/Min Avg Total
> Time (sec): 4.119e+02 1.00000 4.119e+02
> Objects: 6.800e+01 1.00000 6.800e+01
> Flops: 2.590e+11 1.00000 2.590e+11 2.590e+11
> Flops/sec: 6.287e+08 1.00000 6.287e+08 6.287e+08
> Memory: 3.308e+07 1.00000 3.308e+07
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total
> Avg %Total counts %Total
> 0: Main Stage: 4.1191e+02 100.0% 2.5896e+11 100.0% 0.000e+00 0.0%
> 0.000e+00 0.0% 0.000e+00 0.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this
> phase
> %M - percent messages in this phase %L - percent message lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
>
>
> ##########################################################
> # #
> # WARNING!!! #
> # #
> # This code was compiled with a debugging option, #
> # To get timing results run ./configure #
> # using --with-debugging=no, the performance will #
> # be generally two or three times faster. #
> # #
> ##########################################################
>
>
> Event Count Time (sec) Flops
> --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMDot 49906 1.0 2.1200e+01 1.0 2.61e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 5 10 0 0 0 5 10 0 0 0 1231
> VecNorm 51599 1.0 1.5052e+00 1.0 1.74e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 1 0 0 0 0 1 0 0 0 1159
> VecScale 51587 1.0 5.6397e+00 1.0 8.72e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 155
> VecCopy 1682 1.0 5.0184e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 1781 1.0 3.5445e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 3324 1.0 4.0115e-01 1.0 1.12e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 280
> VecAYPX 12 1.0 1.0931e-03 1.0 4.06e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 371
> VecMAXPY 51581 1.0 2.4876e+01 1.0 2.78e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 6 11 0 0 0 6 11 0 0 0 1117
> VecAssemblyBegin 18 1.0 1.6809e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAssemblyEnd 18 1.0 1.2112e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecNormalize 51581 1.0 8.1458e+00 1.0 2.62e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 1 0 0 0 2 1 0 0 0 321
> MatMult 51555 1.0 1.5226e+02 1.0 1.01e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 37 39 0 0 0 37 39 0 0 0 664
> MatSolve 51561 1.0 1.7415e+02 1.0 1.01e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 42 39 0 0 0 42 39 0 0 0 580
> MatLUFactorNum 6 1.0 6.3280e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 353
> MatILUFactorSym 1 1.0 2.1535e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 6 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 6 1.0 1.7512e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 1 1.0 6.8307e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 1.3239e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 6 1.0 9.4421e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPGMRESOrthog 49906 1.0 4.6091e+01 1.0 5.22e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 11 20 0 0 0 11 20 0 0 0 1133
> KSPSetUp 7 1.0 3.3112e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 6 1.0 3.8880e+02 1.0 2.59e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 94100 0 0 0 94100 0 0 0 666
> PCSetUp 6 1.0 6.8447e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 326
> PCApply 20 1.0 3.8806e+02 1.0 2.59e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 94100 0 0 0 94100 0 0 0 667
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Vector 56 56 7659456 0
> Matrix 2 2 24271340 0
> Krylov Solver 2 2 36720 0
> Preconditioner 2 2 1832 0
> Index Set 5 5 237080 0
> Viewer 1 0 0 0
>
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> #PETSc Option Table entries:
> -log_summary
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --with-mpi-dir=/usr/local
> -----------------------------------------
> Libraries compiled on Tue Mar 22 14:05:36 2016 on
> calcium-68.dynamic2.rpi.edu
> Machine characteristics: Darwin-15.4.0-x86_64-i386-64bit
> Using PETSc directory: /usr/local/petsc/petsc-3.6.3
> Using PETSc arch: arch-darwin-c-debug
> -----------------------------------------
>
> Using C compiler: /usr/local/bin/mpicc -fPIC -Wall -Wwrite-strings
> -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: /usr/local/bin/mpif90 -fPIC -Wall
> -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -g -O0
> ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths:
> -I/usr/local/petsc/petsc-3.6.3/arch-darwin-c-debug/include
> -I/usr/local/petsc/petsc-3.6.3/include
> -I/usr/local/petsc/petsc-3.6.3/include
> -I/usr/local/petsc/petsc-3.6.3/arch-darwin-c-debug/include
> -I/usr/local/include
> -----------------------------------------
>
> Using C linker: /usr/local/bin/mpicc
> Using Fortran linker: /usr/local/bin/mpif90
> Using libraries:
> -Wl,-rpath,/usr/local/petsc/petsc-3.6.3/arch-darwin-c-debug/lib
> -L/usr/local/petsc/petsc-3.6.3/arch-darwin-c-debug/lib -lpetsc -llapack
> -lblas -Wl,-rpath,/usr/local/lib -L/usr/local/lib
> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/7.3.0/lib/darwin
> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/7.3.0/lib/darwin
> -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran
> -Wl,-rpath,/usr/local/gfortran/lib/gcc/x86_64-apple-darwin14/4.9.2
> -L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin14/4.9.2
> -Wl,-rpath,/usr/local/gfortran/lib -L/usr/local/gfortran/lib -lgfortran
> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpi_cxx -lc++
> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/lib/darwin
> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/lib/darwin
> -lclang_rt.osx -Wl,-rpath,/usr/local/lib -L/usr/local/lib -ldl -lmpi
> -lSystem
> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/lib/darwin
> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/lib/darwin
> -lclang_rt.osx -ldl
> -----------------------------------------
>
> On Apr 22, 2016, at 5:12 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> Always send the output with -log_summary when asking about performance.
>
> Jie Cheng <chengj5 at rpi.edu> writes:
>
> Hi
>
> I’m implementing finite element method on nonlinear solid mechanics. The
> main portion of my code that involves PETSc is that in each step, the
> tangent stiffness matrix A is formed and the increment of the nodal degrees
> of freedom is solved. Standard Newton’s iteration. The problem is: when I
> use Krylov methods to solve the linear system, the KSPsolve process takes
> too long, although only 2 or 3 iterations are needed.
>
> The finite element formulation is the displacement/pressure mixed
> formulation, which I believe is symmetric and positive-definite. However if
> I pick conjugate gradient method with ICC preconditioned, PETSc gives me a
> -8 converged reason, which indicates a non-positive-definite matrix. After
> a couple of trials and errors, the only pair that works is GMRES plus
> PCKSP. But as I said, the KSPsolve function takes too much time.
>
> A typical problem I’m trying has 16906 rows and 16906 cols. The message
> printed out by ksp_view is as following:
>
> KSP Object: 1 MPI processes
> type: gmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
> type: ksp
> KSP and PC on KSP preconditioner follow
> ---------------------------------
> KSP Object: (ksp_) 1 MPI processes
> type: gmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object: (ksp_) 1 MPI processes
> type: ilu
> ILU: out-of-place factorization
> 0 levels of fill
> tolerance for zero pivot 2.22045e-14
> matrix ordering: natural
> factor fill ratio given 1, needed 1
> Factored matrix follows:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=16906, cols=16906
> package used to perform factorization: petsc
> total: nonzeros=988540, allocated nonzeros=988540
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 7582 nodes, limit used is 5
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=16906, cols=16906
> total: nonzeros=988540, allocated nonzeros=988540
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 7582 nodes, limit used is 5
> ---------------------------------
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
> type: seqaij
> rows=16906, cols=16906
> total: nonzeros=988540, allocated nonzeros=988540
> total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 7582 nodes, limit used is 5
>
> Could anyone give me any suggestion please?
>
> Thanks
> Jie Cheng
>
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160422/4628c750/attachment-0001.html>
More information about the petsc-users
mailing list