[petsc-users] Krylov Method Takes Too Long to Solve
Jed Brown
jed at jedbrown.org
Fri Apr 22 17:30:32 CDT 2016
Allow me to call your attention to some fine print:
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option, #
# To get timing results run ./configure #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
Please always use "reply-all" so that your messages go to the list.
This is standard mailing list etiquette. It is important to preserve
threading for people who find this discussion later and so that we do
not waste our time re-answering the same questions that have already
been answered in private side-conversations. You'll likely get an
answer faster that way too.
Jie Cheng <chengj5 at rpi.edu> writes:
> Hello Jed
>
> Thanks for your reply. Here is the log: (the first a couple of lines are printed by my own code during a time step, “iteration” means the number of Newton’s iteration, “GMRES_iteration” denotes the number of GMRES iterations it takes in a particular Newton’s iteration, err1 and err2 are my criteria of convergence)
>
>
> Number of nonzeros allocated: 988540
> Step = 1, LoadFactor = 1.0000e-02
> REASON: 2
> Iteration = 1 GMRES_Iteration = 2 Err1 = 1.0000e+00, Err2 = 1.8375e-11
> REASON: 2
> Iteration = 2 GMRES_Iteration = 4 Err1 = 4.1151e-02, Err2 = 5.9467e-11
> REASON: 2
> Iteration = 3 GMRES_Iteration = 2 Err1 = 1.0265e-02, Err2 = 2.1268e-12
> REASON: 2
> Iteration = 4 GMRES_Iteration = 2 Err1 = 8.9824e-04, Err2 = 1.2622e-14
> REASON: 2
> Iteration = 5 GMRES_Iteration = 2 Err1 = 1.5741e-04, Err2 = 8.1248e-16
> REASON: 2
> Iteration = 6 GMRES_Iteration = 2 Err1 = 7.1515e-06, Err2 = 1.6605e-16
> ************************************************************************************************************************
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
>
> ./main on a arch-darwin-c-debug named jiecheng.local with 1 processor, by Jie Fri Apr 22 18:06:33 2016
> Using Petsc Release Version 3.6.3, Dec, 03, 2015
>
> Max Max/Min Avg Total
> Time (sec): 4.119e+02 1.00000 4.119e+02
> Objects: 6.800e+01 1.00000 6.800e+01
> Flops: 2.590e+11 1.00000 2.590e+11 2.590e+11
> Flops/sec: 6.287e+08 1.00000 6.287e+08 6.287e+08
> Memory: 3.308e+07 1.00000 3.308e+07
> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 2N flops
> and VecAXPY() for complex vectors of length N --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts %Total Avg %Total counts %Total
> 0: Main Stage: 4.1191e+02 100.0% 2.5896e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> Avg. len: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this phase
> %M - percent messages in this phase %L - percent message lengths in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
>
>
> ##########################################################
> # #
> # WARNING!!! #
> # #
> # This code was compiled with a debugging option, #
> # To get timing results run ./configure #
> # using --with-debugging=no, the performance will #
> # be generally two or three times faster. #
> # #
> ##########################################################
>
>
> Event Count Time (sec) Flops --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMDot 49906 1.0 2.1200e+01 1.0 2.61e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 10 0 0 0 5 10 0 0 0 1231
> VecNorm 51599 1.0 1.5052e+00 1.0 1.74e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1159
> VecScale 51587 1.0 5.6397e+00 1.0 8.72e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 155
> VecCopy 1682 1.0 5.0184e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 1781 1.0 3.5445e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 3324 1.0 4.0115e-01 1.0 1.12e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 280
> VecAYPX 12 1.0 1.0931e-03 1.0 4.06e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 371
> VecMAXPY 51581 1.0 2.4876e+01 1.0 2.78e+10 1.0 0.0e+00 0.0e+00 0.0e+00 6 11 0 0 0 6 11 0 0 0 1117
> VecAssemblyBegin 18 1.0 1.6809e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAssemblyEnd 18 1.0 1.2112e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecNormalize 51581 1.0 8.1458e+00 1.0 2.62e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 321
> MatMult 51555 1.0 1.5226e+02 1.0 1.01e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 39 0 0 0 37 39 0 0 0 664
> MatSolve 51561 1.0 1.7415e+02 1.0 1.01e+11 1.0 0.0e+00 0.0e+00 0.0e+00 42 39 0 0 0 42 39 0 0 0 580
> MatLUFactorNum 6 1.0 6.3280e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 353
> MatILUFactorSym 1 1.0 2.1535e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyBegin 6 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatAssemblyEnd 6 1.0 1.7512e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetRowIJ 1 1.0 6.8307e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatGetOrdering 1 1.0 1.3239e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 6 1.0 9.4421e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPGMRESOrthog 49906 1.0 4.6091e+01 1.0 5.22e+10 1.0 0.0e+00 0.0e+00 0.0e+00 11 20 0 0 0 11 20 0 0 0 1133
> KSPSetUp 7 1.0 3.3112e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 6 1.0 3.8880e+02 1.0 2.59e+11 1.0 0.0e+00 0.0e+00 0.0e+00 94100 0 0 0 94100 0 0 0 666
> PCSetUp 6 1.0 6.8447e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 326
> PCApply 20 1.0 3.8806e+02 1.0 2.59e+11 1.0 0.0e+00 0.0e+00 0.0e+00 94100 0 0 0 94100 0 0 0 667
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
> Vector 56 56 7659456 0
> Matrix 2 2 24271340 0
> Krylov Solver 2 2 36720 0
> Preconditioner 2 2 1832 0
> Index Set 5 5 237080 0
> Viewer 1 0 0 0
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> #PETSc Option Table entries:
> -log_summary
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --with-mpi-dir=/usr/local
> -----------------------------------------
> Libraries compiled on Tue Mar 22 14:05:36 2016 on calcium-68.dynamic2.rpi.edu
> Machine characteristics: Darwin-15.4.0-x86_64-i386-64bit
> Using PETSc directory: /usr/local/petsc/petsc-3.6.3
> Using PETSc arch: arch-darwin-c-debug
> -----------------------------------------
>
> Using C compiler: /usr/local/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0 ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: /usr/local/bin/mpif90 -fPIC -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -g -O0 ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths: -I/usr/local/petsc/petsc-3.6.3/arch-darwin-c-debug/include -I/usr/local/petsc/petsc-3.6.3/include -I/usr/local/petsc/petsc-3.6.3/include -I/usr/local/petsc/petsc-3.6.3/arch-darwin-c-debug/include -I/usr/local/include
> -----------------------------------------
>
> Using C linker: /usr/local/bin/mpicc
> Using Fortran linker: /usr/local/bin/mpif90
> Using libraries: -Wl,-rpath,/usr/local/petsc/petsc-3.6.3/arch-darwin-c-debug/lib -L/usr/local/petsc/petsc-3.6.3/arch-darwin-c-debug/lib -lpetsc -llapack -lblas -Wl,-rpath,/usr/local/lib -L/usr/local/lib -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/7.3.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/7.3.0/lib/darwin -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -Wl,-rpath,/usr/local/gfortran/lib/gcc/x86_64-apple-darwin14/4.9.2 -L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin14/4.9.2 -Wl,-rpath,/usr/local/gfortran/lib -L/usr/local/gfortran/lib -lgfortran -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpi_cxx -lc++ -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/lib/darwin -lclang_rt.osx -Wl,-rpath,/usr/local/lib -L/usr/local/lib -ldl -lmpi -lSystem -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/lib/darwin -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/lib/darwin -lclang_rt.osx -ldl
> -----------------------------------------
>
>> On Apr 22, 2016, at 5:12 PM, Jed Brown <jed at jedbrown.org> wrote:
>>
>> Always send the output with -log_summary when asking about performance.
>>
>> Jie Cheng <chengj5 at rpi.edu> writes:
>>
>>> Hi
>>>
>>> I’m implementing finite element method on nonlinear solid mechanics. The main portion of my code that involves PETSc is that in each step, the tangent stiffness matrix A is formed and the increment of the nodal degrees of freedom is solved. Standard Newton’s iteration. The problem is: when I use Krylov methods to solve the linear system, the KSPsolve process takes too long, although only 2 or 3 iterations are needed.
>>>
>>> The finite element formulation is the displacement/pressure mixed formulation, which I believe is symmetric and positive-definite. However if I pick conjugate gradient method with ICC preconditioned, PETSc gives me a -8 converged reason, which indicates a non-positive-definite matrix. After a couple of trials and errors, the only pair that works is GMRES plus PCKSP. But as I said, the KSPsolve function takes too much time.
>>>
>>> A typical problem I’m trying has 16906 rows and 16906 cols. The message printed out by ksp_view is as following:
>>>
>>> KSP Object: 1 MPI processes
>>> type: gmres
>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>> GMRES: happy breakdown tolerance 1e-30
>>> maximum iterations=10000, initial guess is zero
>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
>>> left preconditioning
>>> using PRECONDITIONED norm type for convergence test
>>> PC Object: 1 MPI processes
>>> type: ksp
>>> KSP and PC on KSP preconditioner follow
>>> ---------------------------------
>>> KSP Object: (ksp_) 1 MPI processes
>>> type: gmres
>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
>>> GMRES: happy breakdown tolerance 1e-30
>>> maximum iterations=10000, initial guess is zero
>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000
>>> left preconditioning
>>> using PRECONDITIONED norm type for convergence test
>>> PC Object: (ksp_) 1 MPI processes
>>> type: ilu
>>> ILU: out-of-place factorization
>>> 0 levels of fill
>>> tolerance for zero pivot 2.22045e-14
>>> matrix ordering: natural
>>> factor fill ratio given 1, needed 1
>>> Factored matrix follows:
>>> Mat Object: 1 MPI processes
>>> type: seqaij
>>> rows=16906, cols=16906
>>> package used to perform factorization: petsc
>>> total: nonzeros=988540, allocated nonzeros=988540
>>> total number of mallocs used during MatSetValues calls =0
>>> using I-node routines: found 7582 nodes, limit used is 5
>>> linear system matrix = precond matrix:
>>> Mat Object: 1 MPI processes
>>> type: seqaij
>>> rows=16906, cols=16906
>>> total: nonzeros=988540, allocated nonzeros=988540
>>> total number of mallocs used during MatSetValues calls =0
>>> using I-node routines: found 7582 nodes, limit used is 5
>>> ---------------------------------
>>> linear system matrix = precond matrix:
>>> Mat Object: 1 MPI processes
>>> type: seqaij
>>> rows=16906, cols=16906
>>> total: nonzeros=988540, allocated nonzeros=988540
>>> total number of mallocs used during MatSetValues calls =0
>>> using I-node routines: found 7582 nodes, limit used is 5
>>>
>>> Could anyone give me any suggestion please?
>>>
>>> Thanks
>>> Jie Cheng
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160422/d50736a5/attachment.pgp>
More information about the petsc-users
mailing list