[petsc-users] Krylov Method Takes Too Long to Solve

Fri Apr 22 22:56:17 CDT 2016

On Fri, Apr 22, 2016 at 10:50 PM, Jie Cheng <chengj5 at rpi.edu> wrote:

> Hi Jed and Barry
>
> Thanks for your help. After I reconfigured PETSc without debugging, it did
> become much faster. But still not fast enough. The problem remains: what
> could be the factor that slows the solver down? In this example, only 4662
> nodes and 2920 elements are used. Typically I use 3D hexahedral element,
> the number of degrees of freedom in the mixed formulation I use is
> 3*4662+2920 = 16906. At this point, my code runs only in serial. I am
> planning to parallelize it and then work with much finer mesh in the
> future. If as Barry said, the number of degrees of freedom is too small for
> iterative solvers, then which direct solver do you recommend to use?
>

SuperLU or MUMPS. Iterative solvers usually do not win until > 50K unknowns
or maybe 100K depending on your system.

Also, the solver you showed last time made no sense. You had GMRES/ILU(0)
preconditioning GMRES. You should remove
the outer solver, and use a better preconditioner since iLU(0) is quite
weak.

  Thanks,

      Matt

> Here is the log when I ran without debugging. It is not fast enough
> because this is only 1 step, and there are 99 more steps.
>
> Thank you.
> Jie
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./main on a arch-darwin-c-opt named jiecheng.local with 1 processor, by
> Jie Fri Apr 22 22:36:38 2016
> Using Petsc Release Version 3.6.3, Dec, 03, 2015
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           1.520e+02      1.00000   1.520e+02
> Objects:              6.800e+01      1.00000   6.800e+01
> Flops:                2.590e+11      1.00000   2.590e+11  2.590e+11
> Flops/sec:            1.704e+09      1.00000   1.704e+09  1.704e+09
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00      0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 1.5195e+02 100.0%  2.5896e+11 100.0%  0.000e+00
> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops
>       --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMDot            49906 1.0 8.4904e+00 1.0 2.61e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00  6 10  0  0  0   6 10  0  0  0  3074
> VecNorm            51599 1.0 1.0644e+00 1.0 1.74e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  1  0  0  0   1  1  0  0  0  1639
> VecScale           51587 1.0 4.5893e-01 1.0 8.72e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1900
> VecCopy             1682 1.0 2.9097e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              1781 1.0 1.9067e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY             3324 1.0 5.8797e-02 1.0 1.12e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1911
> VecAYPX               12 1.0 3.0708e-04 1.0 4.06e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1321
> VecMAXPY           51581 1.0 7.2805e+00 1.0 2.78e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00  5 11  0  0  0   5 11  0  0  0  3817
> VecAssemblyBegin      18 1.0 4.0531e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd        18 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNormalize       51581 1.0 1.5588e+00 1.0 2.62e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  1  0  0  0   1  1  0  0  0  1678
> MatMult            51555 1.0 6.2301e+01 1.0 1.01e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 41 39  0  0  0  41 39  0  0  0  1622
> MatSolve           51561 1.0 5.8966e+01 1.0 1.01e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 39 39  0  0  0  39 39  0  0  0  1714
> MatLUFactorNum         6 1.0 1.4240e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1569
> MatILUFactorSym        1 1.0 1.0796e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin       6 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         6 1.0 6.2988e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            1 1.0 5.9605e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 5.3310e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries         6 1.0 5.8441e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPGMRESOrthog     49906 1.0 1.5348e+01 1.0 5.22e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 10 20  0  0  0  10 20  0  0  0  3401
> KSPSetUp               7 1.0 1.5509e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               6 1.0 1.3919e+02 1.0 2.59e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 92100  0  0  0  92100  0  0  0  1861
> PCSetUp                6 1.0 1.5490e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1442
> PCApply               20 1.0 1.3901e+02 1.0 2.59e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 91100  0  0  0  91100  0  0  0  1861
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>               Vector    56             56      7659456     0
>               Matrix     2              2     24271340     0
>        Krylov Solver     2              2        36720     0
>       Preconditioner     2              2         1832     0
>            Index Set     5              5       237080     0
>               Viewer     1              0            0     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 2.14577e-07
> #PETSc Option Table entries:
> -log_summary
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --with-debugging=no
>
> On Apr 22, 2016, at 6:30 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> Allow me to call your attention to some fine print:
>
>        ##########################################################
>        #                                                        #
>        #                          WARNING!!!                    #
>        #                                                        #
>        #   This code was compiled with a debugging option,      #
>        #   To get timing results run ./configure                #
>        #   using --with-debugging=no, the performance will      #
>        #   be generally two or three times faster.              #
>        #                                                        #
>        ##########################################################
>
>
> Please always use "reply-all" so that your messages go to the list.
> This is standard mailing list etiquette.  It is important to preserve
> threading for people who find this discussion later and so that we do
> not waste our time re-answering the same questions that have already
> been answered in private side-conversations.  You'll likely get an
> answer faster that way too.
>
> Jie Cheng <chengj5 at rpi.edu> writes:
>
> Hello Jed
>
> Thanks for your reply. Here is the log: (the first a couple of lines are
> printed by my own code during a time step, “iteration” means the number of
> Newton’s iteration, “GMRES_iteration” denotes the number of GMRES
> iterations it takes in a particular Newton’s iteration, err1 and err2 are
> my criteria of convergence)
>
>
> Number of nonzeros allocated: 988540
> Step =    1, LoadFactor =   1.0000e-02
> REASON: 2
> Iteration =    1     GMRES_Iteration =    2     Err1 =   1.0000e+00,
>     Err2 =   1.8375e-11
> REASON: 2
> Iteration =    2     GMRES_Iteration =    4     Err1 =   4.1151e-02,
>     Err2 =   5.9467e-11
> REASON: 2
> Iteration =    3     GMRES_Iteration =    2     Err1 =   1.0265e-02,
>     Err2 =   2.1268e-12
> REASON: 2
> Iteration =    4     GMRES_Iteration =    2     Err1 =   8.9824e-04,
>     Err2 =   1.2622e-14
> REASON: 2
> Iteration =    5     GMRES_Iteration =    2     Err1 =   1.5741e-04,
>     Err2 =   8.1248e-16
> REASON: 2
> Iteration =    6     GMRES_Iteration =    2     Err1 =   7.1515e-06,
>     Err2 =   1.6605e-16
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> ./main on a arch-darwin-c-debug named jiecheng.local with 1 processor, by
> Jie Fri Apr 22 18:06:33 2016
> Using Petsc Release Version 3.6.3, Dec, 03, 2015
>
>                         Max       Max/Min        Avg      Total
> Time (sec):           4.119e+02      1.00000   4.119e+02
> Objects:              6.800e+01      1.00000   6.800e+01
> Flops:                2.590e+11      1.00000   2.590e+11  2.590e+11
> Flops/sec:            6.287e+08      1.00000   6.287e+08  6.287e+08
> Memory:               3.308e+07      1.00000              3.308e+07
> MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00      0.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                            e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                            and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                        Avg     %Total     Avg     %Total   counts   %Total
>     Avg         %Total   counts   %Total
> 0:      Main Stage: 4.1191e+02 100.0%  2.5896e+11 100.0%  0.000e+00   0.0%
>  0.000e+00        0.0%  0.000e+00   0.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>   Count: number of times phase was executed
>   Time and Flops: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all processors
>   Mess: number of messages sent
>   Avg. len: average message length (bytes)
>   Reduct: number of global reductions
>   Global: entire computation
>   Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>      %T - percent time in this phase         %F - percent flops in this
> phase
>      %M - percent messages in this phase     %L - percent message lengths
> in this phase
>      %R - percent reductions in this phase
>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
> all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
>
>
>      ##########################################################
>      #                                                        #
>      #                          WARNING!!!                    #
>      #                                                        #
>      #   This code was compiled with a debugging option,      #
>      #   To get timing results run ./configure                #
>      #   using --with-debugging=no, the performance will      #
>      #   be generally two or three times faster.              #
>      #                                                        #
>      ##########################################################
>
>
> Event                Count      Time (sec)     Flops
>                             --- Global ---  --- Stage ---   Total
>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMDot            49906 1.0 2.1200e+01 1.0 2.61e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00  5 10  0  0  0   5 10  0  0  0  1231
> VecNorm            51599 1.0 1.5052e+00 1.0 1.74e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0  1159
> VecScale           51587 1.0 5.6397e+00 1.0 8.72e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0   155
> VecCopy             1682 1.0 5.0184e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              1781 1.0 3.5445e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY             3324 1.0 4.0115e-01 1.0 1.12e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   280
> VecAYPX               12 1.0 1.0931e-03 1.0 4.06e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   371
> VecMAXPY           51581 1.0 2.4876e+01 1.0 2.78e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00  6 11  0  0  0   6 11  0  0  0  1117
> VecAssemblyBegin      18 1.0 1.6809e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd        18 1.0 1.2112e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNormalize       51581 1.0 8.1458e+00 1.0 2.62e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  2  1  0  0  0   2  1  0  0  0   321
> MatMult            51555 1.0 1.5226e+02 1.0 1.01e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 37 39  0  0  0  37 39  0  0  0   664
> MatSolve           51561 1.0 1.7415e+02 1.0 1.01e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 42 39  0  0  0  42 39  0  0  0   580
> MatLUFactorNum         6 1.0 6.3280e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   353
> MatILUFactorSym        1 1.0 2.1535e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin       6 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         6 1.0 1.7512e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            1 1.0 6.8307e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         1 1.0 1.3239e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries         6 1.0 9.4421e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPGMRESOrthog     49906 1.0 4.6091e+01 1.0 5.22e+10 1.0 0.0e+00 0.0e+00
> 0.0e+00 11 20  0  0  0  11 20  0  0  0  1133
> KSPSetUp               7 1.0 3.3112e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               6 1.0 3.8880e+02 1.0 2.59e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 94100  0  0  0  94100  0  0  0   666
> PCSetUp                6 1.0 6.8447e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   326
> PCApply               20 1.0 3.8806e+02 1.0 2.59e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 94100  0  0  0  94100  0  0  0   667
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
>
> --- Event Stage 0: Main Stage
>
>              Vector    56             56      7659456     0
>              Matrix     2              2     24271340     0
>       Krylov Solver     2              2        36720     0
>      Preconditioner     2              2         1832     0
>           Index Set     5              5       237080     0
>              Viewer     1              0            0     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> #PETSc Option Table entries:
> -log_summary
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure options: --with-mpi-dir=/usr/local
> -----------------------------------------
> Libraries compiled on Tue Mar 22 14:05:36 2016 on
> calcium-68.dynamic2.rpi.edu
> Machine characteristics: Darwin-15.4.0-x86_64-i386-64bit
> Using PETSc directory: /usr/local/petsc/petsc-3.6.3
> Using PETSc arch: arch-darwin-c-debug
> -----------------------------------------
>
> Using C compiler: /usr/local/bin/mpicc  -fPIC -Wall -Wwrite-strings
> -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0  ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: /usr/local/bin/mpif90  -fPIC -Wall
> -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -g -O0
>   ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
>
> Using include paths:
> -I/usr/local/petsc/petsc-3.6.3/arch-darwin-c-debug/include
> -I/usr/local/petsc/petsc-3.6.3/include
> -I/usr/local/petsc/petsc-3.6.3/include
> -I/usr/local/petsc/petsc-3.6.3/arch-darwin-c-debug/include
> -I/usr/local/include
> -----------------------------------------
>
> Using C linker: /usr/local/bin/mpicc
> Using Fortran linker: /usr/local/bin/mpif90
> Using libraries:
> -Wl,-rpath,/usr/local/petsc/petsc-3.6.3/arch-darwin-c-debug/lib
> -L/usr/local/petsc/petsc-3.6.3/arch-darwin-c-debug/lib -lpetsc -llapack
> -lblas -Wl,-rpath,/usr/local/lib -L/usr/local/lib
> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/7.3.0/lib/darwin
> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/7.3.0/lib/darwin
> -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran
> -Wl,-rpath,/usr/local/gfortran/lib/gcc/x86_64-apple-darwin14/4.9.2
> -L/usr/local/gfortran/lib/gcc/x86_64-apple-darwin14/4.9.2
> -Wl,-rpath,/usr/local/gfortran/lib -L/usr/local/gfortran/lib -lgfortran
> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpi_cxx -lc++
> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/lib/darwin
> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/lib/darwin
> -lclang_rt.osx -Wl,-rpath,/usr/local/lib -L/usr/local/lib -ldl -lmpi
> -lSystem
> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/lib/darwin
> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/7.3.0/lib/darwin
> -lclang_rt.osx -ldl
> -----------------------------------------
>
> On Apr 22, 2016, at 5:12 PM, Jed Brown <jed at jedbrown.org> wrote:
>
> Always send the output with -log_summary when asking about performance.
>
> Jie Cheng <chengj5 at rpi.edu> writes:
>
> Hi
>
> I’m implementing finite element method on nonlinear solid mechanics. The
> main portion of my code that involves PETSc is that in each step, the
> tangent stiffness matrix A is formed and the increment of the nodal degrees
> of freedom is solved. Standard Newton’s iteration. The problem is: when I
> use Krylov methods to solve the linear system, the KSPsolve process takes
> too long, although only 2 or 3 iterations are needed.
>
> The finite element formulation is the displacement/pressure mixed
> formulation, which I believe is symmetric and positive-definite. However if
> I pick conjugate gradient method with ICC preconditioned, PETSc gives me a
> -8 converged reason, which indicates a non-positive-definite matrix. After
> a couple of trials and errors, the only pair that works is GMRES plus
> PCKSP. But as I said, the KSPsolve function takes too much time.
>
> A typical problem I’m trying has 16906 rows and 16906 cols. The message
> printed out by ksp_view is as following:
>
> KSP Object: 1 MPI processes
> type: gmres
>   GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>   GMRES: happy breakdown tolerance 1e-30
> maximum iterations=10000, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
> left preconditioning
> using PRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
> type: ksp
> KSP and PC on KSP preconditioner follow
> ---------------------------------
>   KSP Object:    (ksp_)     1 MPI processes
>     type: gmres
>       GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>       GMRES: happy breakdown tolerance 1e-30
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>     left preconditioning
>     using PRECONDITIONED norm type for convergence test
>   PC Object:    (ksp_)     1 MPI processes
>     type: ilu
>       ILU: out-of-place factorization
>       0 levels of fill
>       tolerance for zero pivot 2.22045e-14
>       matrix ordering: natural
>       factor fill ratio given 1, needed 1
>         Factored matrix follows:
>           Mat Object:             1 MPI processes
>             type: seqaij
>             rows=16906, cols=16906
>             package used to perform factorization: petsc
>             total: nonzeros=988540, allocated nonzeros=988540
>             total number of mallocs used during MatSetValues calls =0
>               using I-node routines: found 7582 nodes, limit used is 5
>     linear system matrix = precond matrix:
>     Mat Object:       1 MPI processes
>       type: seqaij
>       rows=16906, cols=16906
>       total: nonzeros=988540, allocated nonzeros=988540
>       total number of mallocs used during MatSetValues calls =0
>         using I-node routines: found 7582 nodes, limit used is 5
> ---------------------------------
> linear system matrix = precond matrix:
> Mat Object:   1 MPI processes
>   type: seqaij
>   rows=16906, cols=16906
>   total: nonzeros=988540, allocated nonzeros=988540
>   total number of mallocs used during MatSetValues calls =0
>     using I-node routines: found 7582 nodes, limit used is 5
>
> Could anyone give me any suggestion please?
>
> Thanks
> Jie Cheng
>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20160422/4628c750/attachment-0001.html>