[petsc-users] Performance of Fieldsplit PC

Bernardo Rocha bernardomartinsrocha at gmail.com
Tue Nov 7 09:55:19 CST 2017


Thanks for the reply.

1) This is block-Jacobi, why not use PCBJACOBI? Is it because you want to
> select rows?
>

I'm only using it to understand the performance behavior of PCFieldSplit
since I'm also
having the same issue in a large and more complex problem.
​

> 2) We cannot tell anything without knowing how many iterates were used
>
  -ksp_monitor_true_residual -ksp_converged_reason
> -pc_fieldsplit_[0,1]_ksp_monitor_true_residual
>
> 3) We cannot say anything about performance without seeing the log for
> both runs
>   -log_view
>

I'm sending to you the log files with the recommended command line
arguments for the three cases.

1-scalar case
2-PCFieldSplit (as we were initially running)
3-PCFieldSplit with Preonly/Jacobi in each block, as suggested by Patrick.

As Patrick pointed out, with Preonly/Jacobi the behavior is closer to what
I expected.

Please note that the log was taken for 100 calls to KSPSolve, I just
simplified it.

What would be the proper way of creating this block preconditioner

As you can see, the timing with PCFieldSplit is bigger for case 3.
For case 2 it is nearly 2x, as I expected (I don't know if this idea makes
sense).

So for the case 2, the reason for the large timing is due to the
inner/outer solver?

​Does the "machinery" behind the PCFieldSplit for a block preconditioner
results
in some performance overhead? (neglecting the efficiency of the PC itself)

Best regards,
Bernardo​
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171107/6be92ef6/attachment-0001.html>
-------------- next part --------------
  0 KSP preconditioned resid norm 9.909609586673e+01 true resid norm 6.621816260761e+10 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 1.273902381670e+01 true resid norm 1.094099204442e+09 ||r(i)||/||b|| 1.652264516800e-02
  2 KSP preconditioned resid norm 6.460292016408e+00 true resid norm 2.813270803973e+08 ||r(i)||/||b|| 4.248488168788e-03
  3 KSP preconditioned resid norm 4.267523086682e+00 true resid norm 1.227432629606e+08 ||r(i)||/||b|| 1.853619280982e-03
  4 KSP preconditioned resid norm 2.956794930503e+00 true resid norm 5.892889289089e+07 ||r(i)||/||b|| 8.899203869502e-04
  5 KSP preconditioned resid norm 1.553867540080e+00 true resid norm 1.629422769996e+07 ||r(i)||/||b|| 2.460688587286e-04
  6 KSP preconditioned resid norm 5.863411243068e-01 true resid norm 2.339364215736e+06 ||r(i)||/||b|| 3.532813541805e-05
  7 KSP preconditioned resid norm 2.949598244316e-01 true resid norm 6.042774335918e+05 ||r(i)||/||b|| 9.125554225548e-06
  8 KSP preconditioned resid norm 1.810861505194e-01 true resid norm 2.303685149906e+05 ||r(i)||/||b|| 3.478932454766e-06
  9 KSP preconditioned resid norm 1.063228930690e-01 true resid norm 7.738363702513e+04 ||r(i)||/||b|| 1.168616493993e-06
 10 KSP preconditioned resid norm 5.539338985670e-02 true resid norm 1.753917253023e+04 ||r(i)||/||b|| 2.648695137339e-07
 11 KSP preconditioned resid norm 2.897182710946e-02 true resid norm 2.729986729636e+03 ||r(i)||/||b|| 4.122715916799e-08
 12 KSP preconditioned resid norm 1.695869301131e-02 true resid norm 4.766670897136e+02 ||r(i)||/||b|| 7.198434250408e-09
 13 KSP preconditioned resid norm 9.226255542270e-03 true resid norm 2.547340821913e+02 ||r(i)||/||b|| 3.846891429181e-09
 14 KSP preconditioned resid norm 4.999664022085e-03 true resid norm 2.173100659726e+02 ||r(i)||/||b|| 3.281729021392e-09
 15 KSP preconditioned resid norm 2.822889124856e-03 true resid norm 1.227719812726e+02 ||r(i)||/||b|| 1.854052973353e-09
 16 KSP preconditioned resid norm 1.612223553945e-03 true resid norm 7.400379602761e+01 ||r(i)||/||b|| 1.117575497619e-09
 17 KSP preconditioned resid norm 8.796911180671e-04 true resid norm 7.103508796600e+01 ||r(i)||/||b|| 1.072743265121e-09
 18 KSP preconditioned resid norm 4.819064182097e-04 true resid norm 7.368434157044e+01 ||r(i)||/||b|| 1.112751225175e-09
 19 KSP preconditioned resid norm 2.758550257952e-04 true resid norm 8.624440250822e+01 ||r(i)||/||b|| 1.302428202656e-09
 20 KSP preconditioned resid norm 1.527969474414e-04 true resid norm 9.706581744817e+01 ||r(i)||/||b|| 1.465848849104e-09
 21 KSP preconditioned resid norm 8.423993997128e-05 true resid norm 1.055048065519e+02 ||r(i)||/||b|| 1.593291060900e-09
 22 KSP preconditioned resid norm 4.630697601290e-05 true resid norm 1.164992856073e+02 ||r(i)||/||b|| 1.759325251860e-09
 23 KSP preconditioned resid norm 2.582157010506e-05 true resid norm 1.251285547265e+02 ||r(i)||/||b|| 1.889640995748e-09
 24 KSP preconditioned resid norm 1.439945755021e-05 true resid norm 1.468650092478e+02 ||r(i)||/||b|| 2.217896170241e-09
 25 KSP preconditioned resid norm 7.649560254453e-06 true resid norm 1.686932293751e+02 ||r(i)||/||b|| 2.547537151926e-09
 26 KSP preconditioned resid norm 4.116629854365e-06 true resid norm 1.862186959283e+02 ||r(i)||/||b|| 2.812199683519e-09
 27 KSP preconditioned resid norm 2.293702349842e-06 true resid norm 2.098309185181e+02 ||r(i)||/||b|| 3.168781951283e-09
 28 KSP preconditioned resid norm 1.302840266880e-06 true resid norm 2.200686132505e+02 ||r(i)||/||b|| 3.323387490447e-09
 29 KSP preconditioned resid norm 7.703427859997e-07 true resid norm 1.626698090949e+02 ||r(i)||/||b|| 2.456573886818e-09
Linear solve converged due to CONVERGED_RTOL iterations 29
KSP Object: 1 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: jacobi
  linear system matrix = precond matrix:
  Mat Object:   1 MPI processes
    type: seqaij
    rows=9953, cols=9953
    total: nonzeros=132617, allocated nonzeros=298590
    total number of mallocs used during MatSetValues calls =0
      not using I-node routines
 Number of iterations: 29
 Residual norm: 7.70343e-07
 Total time: 2.67914
Writing data file
Done
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./poisson on a arch-linux2-c-debug named localhost.localdomain with 1 processor, by joventino Tue Nov  7 13:00:48 2017
Using Petsc Release Version 3.5.4, May, 23, 2015 

                         Max       Max/Min        Avg      Total 
Time (sec):           3.104e+00      1.00000   3.104e+00
Objects:              6.042e+03      1.00000   6.042e+03
Flops:                4.430e+09      1.00000   4.430e+09  4.430e+09
Flops/sec:            1.427e+09      1.00000   1.427e+09  1.427e+09
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.1044e+00 100.0%  4.4303e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             5900 1.0 1.1548e+00 1.0 1.51e+09 1.0 0.0e+00 0.0e+00 0.0e+00 37 34  0  0  0  37 34  0  0  0  1304
MatAssemblyBegin       1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 9.4891e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView              100 1.0 4.6666e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot             2900 1.0 3.2593e-01 1.0 8.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 20  0  0  0  10 20  0  0  0  2657
VecNorm             6001 1.0 5.5606e-02 1.0 1.19e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  3  0  0  0   2  3  0  0  0  2148
VecScale            3000 1.0 1.6580e-02 1.0 2.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  1801
VecCopy             3100 1.0 3.1643e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecSet              9141 1.0 5.4358e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecAXPY             3000 1.0 2.2286e-02 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  2680
VecAYPX             3000 1.0 4.8099e-02 1.0 2.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   621
VecMAXPY            5900 1.0 6.5193e-01 1.0 1.79e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 40  0  0  0  21 40  0  0  0  2745
VecPointwiseMult    3000 1.0 5.8018e-02 1.0 2.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   515
VecNormalize        3000 1.0 4.7506e-02 1.0 8.96e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  2  0  0  0   2  2  0  0  0  1886
KSPGMRESOrthog      2900 1.0 6.4069e-01 1.0 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 39  0  0  0  21 39  0  0  0  2703
KSPSetUp             100 1.0 2.6917e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             100 1.0 2.6619e+00 1.0 4.43e+09 1.0 0.0e+00 0.0e+00 0.0e+00 86100  0  0  0  86100  0  0  0  1664
PCSetUp                1 1.0 7.1526e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             3000 1.0 6.0910e-02 1.0 2.99e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   490
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     1              0            0     0
              Vector  6038           6036    489688608     0
       Krylov Solver     1              1        18616     0
      Preconditioner     1              1          856     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 4.76837e-08
#PETSc Option Table entries:
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_rtol 1e-8
-ksp_type gmres
-ksp_view
-log_view
-m /home/joventino/Downloads/russa.xml
-pc_type jacobi
#End of PETSc Option Table entries
-------------- next part --------------
  0 KSP preconditioned resid norm 9.515173597913e+01 true resid norm 9.364662363510e+10 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 3.204643328070e-06 true resid norm 2.672963301097e+00 ||r(i)||/||b|| 2.854308246619e-11
  2 KSP preconditioned resid norm 2.804824521054e-13 true resid norm 2.770364057767e-04 ||r(i)||/||b|| 2.958317075650e-15
Linear solve converged due to CONVERGED_RTOL iterations 2
KSP Object: 1 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: fieldsplit
    FieldSplit with ADDITIVE composition: total splits = 2
    Solver info for each split is in the following KSP objects:
    Split number 0 Defined by IS
    KSP Object:    (fieldsplit_X_)     1 MPI processes
      type: cg
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
      left preconditioning
      using PRECONDITIONED norm type for convergence test
    PC Object:    (fieldsplit_X_)     1 MPI processes
      type: jacobi
      linear system matrix = precond matrix:
      Mat Object:      (fieldsplit_X_)       1 MPI processes
        type: seqaij
        rows=9953, cols=9953
        total: nonzeros=132617, allocated nonzeros=132617
        total number of mallocs used during MatSetValues calls =0
          not using I-node routines
    Split number 1 Defined by IS
    KSP Object:    (fieldsplit_Y_)     1 MPI processes
      type: cg
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
      left preconditioning
      using PRECONDITIONED norm type for convergence test
    PC Object:    (fieldsplit_Y_)     1 MPI processes
      type: jacobi
      linear system matrix = precond matrix:
      Mat Object:      (fieldsplit_Y_)       1 MPI processes
        type: seqaij
        rows=9953, cols=9953
        total: nonzeros=132617, allocated nonzeros=132617
        total number of mallocs used during MatSetValues calls =0
          not using I-node routines
  linear system matrix = precond matrix:
  Mat Object:   1 MPI processes
    type: seqaij
    rows=19906, cols=19906
    total: nonzeros=265234, allocated nonzeros=1.19436e+06
    total number of mallocs used during MatSetValues calls =0
      not using I-node routines
 Number of iterations: 2
 Residual norm: 2.80482e-13
 Total time: 5.1269
Writing data file
Done
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./poisson on a arch-linux2-c-debug named localhost.localdomain with 1 processor, by joventino Tue Nov  7 13:01:56 2017
Using Petsc Release Version 3.5.4, May, 23, 2015 

                         Max       Max/Min        Avg      Total 
Time (sec):           5.609e+00      1.00000   5.609e+00
Objects:              6.370e+02      1.00000   6.370e+02
Flops:                7.589e+09      1.00000   7.589e+09  7.589e+09
Flops/sec:            1.353e+09      1.00000   1.353e+09  1.353e+09
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 5.6092e+00 100.0%  7.5885e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult            19300 1.0 3.5085e+00 1.0 5.05e+09 1.0 0.0e+00 0.0e+00 0.0e+00 63 67  0  0  0  63 67  0  0  0  1441
MatAssemblyBegin       3 1.0 1.4305e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         3 1.0 3.1419e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       2 1.0 2.2733e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView              300 1.0 1.8467e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot              200 1.0 7.8726e-03 1.0 1.19e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1517
VecTDot            37600 1.0 3.4746e-01 1.0 7.48e+08 1.0 0.0e+00 0.0e+00 0.0e+00  6 10  0  0  0   6 10  0  0  0  2154
VecNorm            20100 1.0 1.8720e-01 1.0 4.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00  3  5  0  0  0   3  5  0  0  0  2212
VecScale             300 1.0 2.8970e-03 1.0 5.97e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2061
VecCopy             1600 1.0 2.0661e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1925 1.0 2.7724e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY            37900 1.0 3.0721e-01 1.0 7.60e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5 10  0  0  0   5 10  0  0  0  2475
VecAYPX            18500 1.0 2.6606e-01 1.0 3.68e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  5  0  0  0   5  5  0  0  0  1384
VecMAXPY             500 1.0 1.6370e-02 1.0 3.18e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1946
VecPointwiseMult   19400 1.0 2.5479e-01 1.0 1.93e+08 1.0 0.0e+00 0.0e+00 0.0e+00  5  3  0  0  0   5  3  0  0  0   758
VecScatterBegin     1200 1.0 2.5261e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize         300 1.0 9.1097e-03 1.0 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1967
KSPGMRESOrthog       200 1.0 1.4125e-02 1.0 2.39e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1691
KSPSetUp             102 1.0 3.4118e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             100 1.0 5.0754e+00 1.0 7.59e+09 1.0 0.0e+00 0.0e+00 0.0e+00 90100  0  0  0  90100  0  0  0  1495
PCSetUp                3 1.0 2.5232e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply              300 1.0 4.7249e+00 1.0 7.24e+09 1.0 0.0e+00 0.0e+00 0.0e+00 84 95  0  0  0  84 95  0  0  0  1532
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              2      3506688     0
              Vector   621            619     98550000     0
      Vector Scatter     2              2         1288     0
       Krylov Solver     3              3        21064     0
      Preconditioner     3              3         2720     0
              Viewer     1              0            0     0
           Index Set     4              2         1568     0
========================================================================================================================
Average time to get PetscTime(): 4.76837e-08
#PETSc Option Table entries:
-fieldsplit_X_ksp_rtol 1e-8
-fieldsplit_X_ksp_type cg
-fieldsplit_X_pc_type jacobi
-fieldsplit_Y_ksp_rtol 1e-8
-fieldsplit_Y_ksp_type cg
-fieldsplit_Y_pc_type jacobi
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_view
-log_view
-m /home/joventino/Downloads/russa.xml
-pc_fieldsplit_[0,1]_ksp_monitor_true_residual
-pc_fieldsplit_type additive
-pc_type fieldsplit
#End of PETSc Option Table entries
-------------- next part --------------
  0 KSP preconditioned resid norm 1.401430427530e+02 true resid norm 9.364662363510e+10 ||r(i)||/||b|| 1.000000000000e+00
  1 KSP preconditioned resid norm 1.801570025298e+01 true resid norm 1.547289933503e+09 ||r(i)||/||b|| 1.652264516799e-02
  2 KSP preconditioned resid norm 9.136232586494e+00 true resid norm 3.978565725596e+08 ||r(i)||/||b|| 4.248488168777e-03
  3 KSP preconditioned resid norm 6.035189026926e+00 true resid norm 1.735851871679e+08 ||r(i)||/||b|| 1.853619280971e-03
  4 KSP preconditioned resid norm 4.181539491874e+00 true resid norm 8.333803954090e+07 ||r(i)||/||b|| 8.899203869392e-04
  5 KSP preconditioned resid norm 2.197500549312e+00 true resid norm 2.304351780071e+07 ||r(i)||/||b|| 2.460688587183e-04
  6 KSP preconditioned resid norm 8.292115701718e-01 true resid norm 3.308360600270e+06 ||r(i)||/||b|| 3.532813540786e-05
  7 KSP preconditioned resid norm 4.171361840663e-01 true resid norm 8.545773410120e+05 ||r(i)||/||b|| 9.125554214767e-06
  8 KSP preconditioned resid norm 2.560944900224e-01 true resid norm 3.257902772942e+05 ||r(i)||/||b|| 3.478932444630e-06
  9 KSP preconditioned resid norm 1.503632773689e-01 true resid norm 1.094369879940e+05 ||r(i)||/||b|| 1.168616483392e-06
 10 KSP preconditioned resid norm 7.833808320116e-02 true resid norm 2.480413464487e+04 ||r(i)||/||b|| 2.648695028400e-07
 11 KSP preconditioned resid norm 4.097235082492e-02 true resid norm 3.860783217718e+03 ||r(i)||/||b|| 4.122714805780e-08
 12 KSP preconditioned resid norm 2.398321365671e-02 true resid norm 6.741080801947e+02 ||r(i)||/||b|| 7.198423755473e-09
 13 KSP preconditioned resid norm 1.304789571780e-02 true resid norm 3.602473867603e+02 ||r(i)||/||b|| 3.846880675208e-09
 14 KSP preconditioned resid norm 7.070592667344e-03 true resid norm 3.073218548774e+02 ||r(i)||/||b|| 3.281718474708e-09
 15 KSP preconditioned resid norm 3.992168085451e-03 true resid norm 1.736249159923e+02 ||r(i)||/||b|| 1.854043522902e-09
 16 KSP preconditioned resid norm 2.280028415568e-03 true resid norm 1.046566169144e+02 ||r(i)||/||b|| 1.117569570071e-09
 17 KSP preconditioned resid norm 1.244071109869e-03 true resid norm 1.004589356606e+02 ||r(i)||/||b|| 1.072744876014e-09
 18 KSP preconditioned resid norm 6.815185924237e-04 true resid norm 1.042054174147e+02 ||r(i)||/||b|| 1.112751462570e-09
 19 KSP preconditioned resid norm 3.901179187272e-04 true resid norm 1.219681423290e+02 ||r(i)||/||b|| 1.302429682935e-09
 20 KSP preconditioned resid norm 2.160875153605e-04 true resid norm 1.372716915795e+02 ||r(i)||/||b|| 1.465847739630e-09
 21 KSP preconditioned resid norm 1.191332656017e-04 true resid norm 1.492063885795e+02 ||r(i)||/||b|| 1.593291704364e-09
 22 KSP preconditioned resid norm 6.548795351426e-05 true resid norm 1.647547652308e+02 ||r(i)||/||b|| 1.759324136157e-09
 23 KSP preconditioned resid norm 3.651721464750e-05 true resid norm 1.769585591874e+02 ||r(i)||/||b|| 1.889641637021e-09
 24 KSP preconditioned resid norm 2.036390815968e-05 true resid norm 2.076983894694e+02 ||r(i)||/||b|| 2.217895118981e-09
 25 KSP preconditioned resid norm 1.081811185654e-05 true resid norm 2.385683116238e+02 ||r(i)||/||b|| 2.547537779401e-09
 26 KSP preconditioned resid norm 5.821793768980e-06 true resid norm 2.633529192793e+02 ||r(i)||/||b|| 2.812198764426e-09
 27 KSP preconditioned resid norm 3.243784970505e-06 true resid norm 2.967457868685e+02 ||r(i)||/||b|| 3.168782550290e-09
 28 KSP preconditioned resid norm 1.842494373206e-06 true resid norm 3.112239423017e+02 ||r(i)||/||b|| 3.323386687324e-09
 29 KSP preconditioned resid norm 1.089429216198e-06 true resid norm 2.300499177590e+02 ||r(i)||/||b|| 2.456574608129e-09
Linear solve converged due to CONVERGED_RTOL iterations 29
KSP Object: 1 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: fieldsplit
    FieldSplit with ADDITIVE composition: total splits = 2
    Solver info for each split is in the following KSP objects:
    Split number 0 Defined by IS
    KSP Object:    (fieldsplit_X_)     1 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (fieldsplit_X_)     1 MPI processes
      type: jacobi
      linear system matrix = precond matrix:
      Mat Object:      (fieldsplit_X_)       1 MPI processes
        type: seqaij
        rows=9953, cols=9953
        total: nonzeros=132617, allocated nonzeros=132617
        total number of mallocs used during MatSetValues calls =0
          not using I-node routines
    Split number 1 Defined by IS
    KSP Object:    (fieldsplit_Y_)     1 MPI processes
      type: preonly
      maximum iterations=10000, initial guess is zero
      tolerances:  relative=1e-08, absolute=1e-50, divergence=10000
      left preconditioning
      using NONE norm type for convergence test
    PC Object:    (fieldsplit_Y_)     1 MPI processes
      type: jacobi
      linear system matrix = precond matrix:
      Mat Object:      (fieldsplit_Y_)       1 MPI processes
        type: seqaij
        rows=9953, cols=9953
        total: nonzeros=132617, allocated nonzeros=132617
        total number of mallocs used during MatSetValues calls =0
          not using I-node routines
  linear system matrix = precond matrix:
  Mat Object:   1 MPI processes
    type: seqaij
    rows=19906, cols=19906
    total: nonzeros=265234, allocated nonzeros=1.19436e+06
    total number of mallocs used during MatSetValues calls =0
      not using I-node routines
 Number of iterations: 29
 Residual norm: 1.08943e-06
 Total time: 6.58937
0
0
Writing data file
Done
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./poisson on a arch-linux2-c-debug named localhost.localdomain with 1 processor, by joventino Tue Nov  7 13:04:06 2017
Using Petsc Release Version 3.5.4, May, 23, 2015 

                         Max       Max/Min        Avg      Total 
Time (sec):           7.098e+00      1.00000   7.098e+00
Objects:              6.060e+03      1.00000   6.060e+03
Flops:                8.865e+09      1.00000   8.865e+09  8.865e+09
Flops/sec:            1.249e+09      1.00000   1.249e+09  1.249e+09
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 7.0984e+00 100.0%  8.8646e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             5900 1.0 2.8474e+00 1.0 3.01e+09 1.0 0.0e+00 0.0e+00 0.0e+00 40 34  0  0  0  40 34  0  0  0  1058
MatAssemblyBegin       3 1.0 1.6689e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         3 1.0 2.9294e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       2 1.0 2.2478e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView              300 1.0 1.9589e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot             2900 1.0 7.7983e-01 1.0 1.73e+09 1.0 0.0e+00 0.0e+00 0.0e+00 11 20  0  0  0  11 20  0  0  0  2221
VecNorm             6100 1.0 1.1267e-01 1.0 2.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2  3  0  0  0   2  3  0  0  0  2155
VecScale            3000 1.0 3.1897e-02 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  1872
VecCopy             3100 1.0 7.6415e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecSet             18148 1.0 2.6946e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   4  0  0  0  0     0
VecAXPY             3000 1.0 5.7204e-02 1.0 1.19e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  2088
VecAYPX             3000 1.0 1.0830e-01 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00  2  1  0  0  0   2  1  0  0  0   551
VecMAXPY            5900 1.0 1.5695e+00 1.0 3.58e+09 1.0 0.0e+00 0.0e+00 0.0e+00 22 40  0  0  0  22 40  0  0  0  2280
VecPointwiseMult    6000 1.0 1.0145e-01 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0   589
VecScatterBegin    12000 1.0 1.8266e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
VecNormalize        3000 1.0 9.1894e-02 1.0 1.79e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0  1950
KSPGMRESOrthog      2900 1.0 1.5169e+00 1.0 3.46e+09 1.0 0.0e+00 0.0e+00 0.0e+00 21 39  0  0  0  21 39  0  0  0  2283
KSPSetUp             102 1.0 2.6441e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve             100 1.0 6.5351e+00 1.0 8.86e+09 1.0 0.0e+00 0.0e+00 0.0e+00 92100  0  0  0  92100  0  0  0  1356
PCSetUp                3 1.0 2.5163e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             3000 1.0 4.7949e-01 1.0 5.97e+07 1.0 0.0e+00 0.0e+00 0.0e+00  7  1  0  0  0   7  1  0  0  0   125
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              2      3506688     0
              Vector  6044           6042    970785840     0
      Vector Scatter     2              2         1288     0
       Krylov Solver     3              3        20936     0
      Preconditioner     3              3         2720     0
              Viewer     1              0            0     0
           Index Set     4              2         1568     0
========================================================================================================================
Average time to get PetscTime(): 4.76837e-08
#PETSc Option Table entries:
-fieldsplit_X_ksp_rtol 1e-8
-fieldsplit_X_ksp_type preonly
-fieldsplit_X_pc_type jacobi
-fieldsplit_Y_ksp_rtol 1e-8
-fieldsplit_Y_ksp_type preonly
-fieldsplit_Y_pc_type jacobi
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_view
-log_view
-m /home/joventino/Downloads/russa.xml
-pc_fieldsplit_[0,1]_ksp_monitor_true_residual
-pc_fieldsplit_type additive
-pc_type fieldsplit
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-suitesparse --download-hypre --download-mpich --with-debugging=0
-----------------------------------------
Libraries compiled on Fri Aug  4 16:15:03 2017 on localhost.localdomain 
Machine characteristics: Linux-4.11.9-200.fc25.x86_64-x86_64-with-fedora-25-Twenty_Five
Using PETSc directory: /home/joventino/source/petsc-3.5.4
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------

Using C compiler: /home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/bin/mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/bin/mpif90  -fPIC  -Wall -Wno-unused-variable -ffree-line-length-0 -O  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/include -I/home/joventino/source/petsc-3.5.4/include -I/home/joventino/source/petsc-3.5.4/include -I/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/include
-----------------------------------------

Using C linker: /home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/bin/mpicc
Using Fortran linker: /home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/bin/mpif90
Using libraries: -Wl,-rpath,/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -L/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -L/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lHYPRE -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -L/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -lmpichcxx -lstdc++ -lflapack -lfblas -lpthread -lm -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -Wl,-rpath,/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -L/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -L/usr/lib/gcc/x86_64-redhat-linux/6.3.1 -ldl -Wl,-rpath,/home/joventino/source/petsc-3.5.4/arch-linux2-c-debug/lib -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl  
-----------------------------------------


More information about the petsc-users mailing list