[petsc-users] superlinear scale-up with hypre

Barry Smith bsmith at mcs.anl.gov
Wed Mar 10 13:29:46 CST 2010


Christian,

    The multiply, the triangular solves and the preconditioner  
application are all getting super-linear speedup. My guess is that  
this is due to cache-effects. Since the working set on each process is  
smaller more of it stays in the cache more of the time so the run time  
depends less on the time for memory access hence superlinear speedup.

    If you use a nonzero initial guess the stopping criteria for the  
Krylov solvers, by default is a reduction in the 2-norm of the  
residual RELATIVE to the RIGHT HAND SIDE, not the initial residual.  
Hence it converges "sooner than you expect". You can use the option - 
ksp_converged_use_initial_residual_norm to have the decrease be  
relative to the initial residual instead but I think the default is  
best for time-dependent problems. If you use a zero initial guess I  
cannot explain why it seems to converge "early" You can run with - 
ksp_converged_reason to have it print why it stops or in the debugger  
put a break point in KSPDefaultConverged() to see what is going on  
with the test.

    Barry


On Mar 10, 2010, at 1:03 PM, Christian Klettner wrote:

Dear Barry,

Below is the performance on 32 and 64 cores respectively. I run my case
for 19 time steps and for each time step there are 4 parabolic equations
to be solved (Step 1 (u,v) and Step 3 (u,v)) and 1 elliptic equation  
(Step
2). This is why there are 95 KSPSolves.
The biggest difference I can see is in KSPSolve but I'm guessing this is
made up of other functions?
Also, as you can see I set "-poeq_ksp_rtol 0.000000001" for the Poisson
solve however when I print it out it says

Residual norms for poeq_ solve.
  0 KSP Residual norm 7.862045205096e-02
  1 KSP Residual norm 1.833734529269e-02
  2 KSP Residual norm 9.243822053526e-04
  3 KSP Residual norm 1.534786635844e-04
  4 KSP Residual norm 2.032435231176e-05
  5 KSP Residual norm 3.201182258546e-06

so the tolerance has not been reached. Should I set the tolerance with a
different command?

Thanks for any advice,
Christian


> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance  
> Summary:
> ----------------------------------------------
>
> ./ex115 on a linux-gnu named node-c47 with 32 processors, by ucemckl  
> Wed
> Mar 10 02:12:45 2010
> Using Petsc Release Version 3.0.0, Patch 10, Tue Nov 24 16:38:09 CST  
> 2009
>
>                         Max       Max/Min        Avg      Total
> Time (sec):           5.424e+02      1.00012   5.423e+02
> Objects:              2.860e+02      1.00000   2.860e+02
> Flops:                1.675e+10      1.02726   1.635e+10  5.232e+11
> Flops/sec:            3.088e+07      1.02726   3.015e+07  9.647e+08
> MPI Messages:         3.603e+03      2.00278   3.447e+03  1.103e+05
> MPI Message Lengths:  8.272e+06      1.90365   2.285e+03  2.520e+08
> MPI Reductions:       4.236e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                            e.g., VecAXPY() for real vectors of  
> length N
> --> 2N flops
>                            and VecAXPY() for complex vectors of  
> length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  ---  
> Messages
> ---  -- Message Lengths --  -- Reductions --
>                        Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
> 0:      Main Stage: 5.4232e+02 100.0%  5.2317e+11 100.0%  1.103e+05
> 100.0%  2.285e+03      100.0%  4.056e+03  95.8%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>   Count: number of times phase was executed
>   Time and Flops: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all  
> processors
>   Mess: number of messages sent
>   Avg. len: average message length
>   Reduct: number of global reductions
>   Global: entire computation
>   Stage: stages of a computation. Set stages with  
> PetscLogStagePush() and
> PetscLogStagePop().
>      %T - percent time in this phase         %F - percent flops in  
> this
> phase
>      %M - percent messages in this phase     %L - percent message  
> lengths
> in this phase
>      %R - percent reductions in this phase
>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops
>      --- Global ---  --- Stage ---   Total
>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg  
> len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMin                19 1.0 9.5495e-02 2.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecDot              1362 1.0 1.0272e+01 1.4 1.38e+09 1.0 0.0e+00 0.0e 
> +00
> 1.4e+03  2  8  0  0 32   2  8  0  0 34  4212
> VecMDot              101 1.0 1.3028e+00 1.0 3.44e+08 1.0 0.0e+00 0.0e 
> +00
> 1.0e+02  0  2  0  0  2   0  2  0  0  2  8241
> VecNorm              972 1.0 1.0458e+01 1.6 9.88e+08 1.0 0.0e+00 0.0e 
> +00
> 9.7e+02  1  6  0  0 23   1  6  0  0 24  2952
> VecScale             139 1.0 4.4759e-01 1.1 7.07e+07 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  4932
> VecCopy              133 1.0 6.7746e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              1136 1.0 4.2686e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY             1666 1.0 1.0439e+01 1.0 1.69e+09 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  2 10  0  0  0   2 10  0  0  0  5069
> VecAYPX              681 1.0 4.1510e+00 1.1 6.92e+08 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  1  4  0  0  0   1  4  0  0  0  5211
> VecAXPBYCZ            38 1.0 3.5104e-01 1.1 7.73e+07 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  6877
> VecMAXPY             120 1.0 1.7512e+00 1.0 4.46e+08 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  3  0  0  0   0  3  0  0  0  7963
> VecAssemblyBegin     290 1.0 1.4337e+0164.9 0.00e+00 0.0 3.6e+03 1.0e 
> +03
> 8.7e+02  2  0  3  1 21   2  0  3  1 21     0
> VecAssemblyEnd       290 1.0 8.1372e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecPointwiseMult     280 1.0 2.5121e+00 1.1 1.42e+08 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0  1770
> VecScatterBegin     1373 1.0 5.1618e-02 1.7 0.00e+00 0.0 7.7e+04 1.3e 
> +03
> 0.0e+00  0  0 70 40  0   0  0 70 40  0     0
> VecScatterEnd       1373 1.0 6.2953e-0118.2 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNormalize         120 1.0 1.1371e+00 1.0 1.83e+08 1.0 0.0e+00 0.0e 
> +00
> 1.2e+02  0  1  0  0  3   0  1  0  0  3  5028
> MatMult             1048 1.0 5.6495e+01 1.1 6.86e+09 1.0 6.5e+04 1.3e 
> +03
> 0.0e+00 10 41 59 34  0  10 41 59 34  0  3793
> MatMultTranspose      57 1.0 3.4194e+00 1.1 4.02e+08 1.0 3.5e+03 1.3e 
> +03
> 0.0e+00  1  2  3  2  0   1  2  3  2  0  3673
> MatSolve             553 1.0 4.6169e+01 1.1 3.62e+09 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  8 22  0  0  0   8 22  0  0  0  2448
> MatLUFactorNum         2 1.0 7.9745e-01 1.2 2.78e+07 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1088
> MatILUFactorSym        2 1.0 2.7597e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatCopy              133 1.0 4.7596e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> MatConvert            27 1.0 1.7435e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin     263 1.0 1.3145e+0132.9 0.00e+00 0.0 2.4e+04 3.7e 
> +03
> 5.3e+02  2  0 22 36 12   2  0 22 36 13     0
> MatAssemblyEnd       263 1.0 9.1696e+00 1.0 0.00e+00 0.0 2.5e+02 3.3e 
> +02
> 6.6e+01  2  0  0  0  2   2  0  0  0  2     0
> MatGetRow         901474 1.5 2.9092e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            4 1.0 5.0068e-06 2.6 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         2 1.0 7.2280e-02 3.2 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries       160 1.0 3.0731e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> KSPGMRESOrthog       101 1.0 2.6510e+00 1.0 6.87e+08 1.0 0.0e+00 0.0e 
> +00
> 1.0e+02  0  4  0  0  2   0  4  0  0  2  8100
> KSPSetup              78 1.0 1.4449e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve              95 1.0 3.0155e+02 1.0 1.49e+10 1.0 5.4e+04 1.3e 
> +03
> 2.4e+03 56 89 49 28 58  56 89 49 28 60  1540
> PCSetUp                6 1.0 6.2894e+00 1.0 2.78e+07 1.0 0.0e+00 0.0e 
> +00
> 6.0e+00  1  0  0  0  0   1  0  0  0  0   138
> PCSetUpOnBlocks       57 1.0 1.0523e+00 1.2 2.78e+07 1.0 0.0e+00 0.0e 
> +00
> 6.0e+00  0  0  0  0  0   0  0  0  0  0   824
> PCApply              972 1.0 2.1798e+02 1.0 3.76e+09 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00 40 22  0  0  0  40 22  0  0  0   539
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions   Memory  Descendants'  
> Mem.
>
> --- Event Stage 0: Main Stage
>
>   Application Order     4              4  142960400     0
>           Index Set    42             42   11937496     0
>   IS L to G Mapping    18             18   39700456     0
>                 Vec   131            131  335147648     0
>         Vec Scatter    31             31      26412     0
>              Matrix    47             47  1003139256     0
>       Krylov Solver     6              6      22376     0
>      Preconditioner     6              6       4256     0
>              Viewer     1              1        544     0
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
> Average time to get PetscTime(): 2.86102e-07
> Average time for MPI_Barrier(): 1.27792e-05
> Average time for zero size MPI_Send(): 1.71363e-06
> #PETSc Option Table entries:
> -log_summary
> -moeq_ksp_rtol 0.000000001
> -moeq_ksp_type cg
> -moeq_pc_type jacobi
> -poeq_ksp_monitor
> -poeq_ksp_rtol 0.000000001
> -poeq_ksp_type gmres
> -poeq_pc_hypre_type boomeramg
> -poeq_pc_type hypre
> -ueq_ksp_rtol 0.000000001
> -ueq_ksp_type cg
> -veq_ksp_rtol 0.000000001
> -veq_ksp_type cg
> #End o PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8
> Configure run at: Fri Jan 29 15:15:03 2010
> Configure options: --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpiCC
> --with-blas-lapack-dir=/cvos/shared/apps/intel/mkl/10.0.2.018/lib/ 
> em64t/
> --download-triangle --download-hypre --with-debugging=0 COPTFLAGS="  
> -03
> -ffast-math -finline-functions" CXXOPTFLAGS=" -03 -ffast-math
> -finline-functions" --with-shared=0
> -----------------------------------------
> Libraries compiled on Fri Jan 29 15:17:56 GMT 2010 on login01
> Machine characteristics: Linux login01 2.6.9-89.el4_lustre. 
> 1.6.7.2ddn1 #11
> SMP Wed Sep 9 18:48:21 CEST 2009 x86_64 x86_64 x86_64 GNU/Linux
> Using PETSc directory: /shared/home/ucemckl/petsc-3.0.0-p10
> Using PETSc arch: linux-gnu-c-opt
> -----------------------------------------
> Using C compiler: mpicc
> Using Fortran compiler: mpif90 -O
> -----------------------------------------
> Using include paths:
> -I/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/include
> -I/shared/home/ucemckl/petsc-3.0.0-p10/include
> -I/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/include
> -I/usr/X11R6/include
> ------------------------------------------
> Using C linker: mpicc
> Using Fortran linker: mpif90 -O
> Using libraries:
> -Wl,-rpath,/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib
> -L/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib -lpetscts
> -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
> -Wl,-rpath,/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib
> -L/shared/home/ucemckl/petsc-3.0.0-p10/linux-gnu-c-opt/lib -ltriangle
> -L/usr/X11R6/lib64 -lX11 -lHYPRE -lstdc++
> -Wl,-rpath,/cvos/shared/apps/intel/mkl/10.0.2.018/lib/em64t
> -L/cvos/shared/apps/intel/mkl/10.0.2.018/lib/em64t -lmkl_lapack -lmkl
> -lguide -lpthread -lnsl -laio -lrt -lPEPCF90
> -L/cvos/shared/apps/infinipath/2.1/mpi/lib64 -ldl -lmpich
> -L/cvos/shared/apps/intel/cce/10.1.008/lib
> -L/usr/lib/gcc/x86_64-redhat-linux/3.4.6 -limf -lsvml -lipgo -lirc - 
> lgcc_s
> -lirc_s -lmpichf90nc -lmpichabiglue_intel9
> -L/cvos/shared/apps/intel/fce/10.1.008/lib -lifport -lifcore -lm -lm
> -lstdc++ -lstdc++ -ldl -lmpich -limf -lsvml -lipgo -lirc -lgcc_s - 
> lirc_s
> -ldl
> ------------------------------------------
>
>
> ////////////////////////////////////////////////////////////////////////
> /////////////////////////////////////////////////////////////////////////
> ////////////////////////////////////////////////////////////////////////
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
> ************************************************************************************************************************
>
> ---------------------------------------------- PETSc Performance  
> Summary:
> ----------------------------------------------
>
> ./ex115 on a linux-gnu named node-f56 with 64 processors, by ucemckl  
> Wed
> Mar 10 04:33:32 2010
> Using Petsc Release Version 3.0.0, Patch 10, Tue Nov 24 16:38:09 CST  
> 2009
>
>                         Max       Max/Min        Avg      Total
> Time (sec):           2.394e+02      1.00022   2.394e+02
> Objects:              2.860e+02      1.00000   2.860e+02
> Flops:                8.606e+09      1.04191   8.283e+09  5.301e+11
> Flops/sec:            3.595e+07      1.04196   3.461e+07  2.215e+09
> MPI Messages:         3.627e+03      1.98414   3.565e+03  2.282e+05
> MPI Message Lengths:  7.563e+06      1.99911   2.009e+03  4.584e+08
> MPI Reductions:       4.269e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                            e.g., VecAXPY() for real vectors of  
> length N
> --> 2N flops
>                            and VecAXPY() for complex vectors of  
> length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  ---  
> Messages
> ---  -- Message Lengths --  -- Reductions --
>                        Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
> 0:      Main Stage: 2.3936e+02 100.0%  5.3013e+11 100.0%  2.282e+05
> 100.0%  2.009e+03      100.0%  4.089e+03  95.8%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>   Count: number of times phase was executed
>   Time and Flops: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all  
> processors
>   Mess: number of messages sent
>   Avg. len: average message length
>   Reduct: number of global reductions
>   Global: entire computation
>   Stage: stages of a computation. Set stages with  
> PetscLogStagePush() and
> PetscLogStagePop().
>      %T - percent time in this phase         %F - percent flops in  
> this
> phase
>      %M - percent messages in this phase     %L - percent message  
> lengths
> in this phase
>      %R - percent reductions in this phase
>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops
>      --- Global ---  --- Stage ---   Total
>                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg  
> len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMin                19 1.0 4.7353e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecDot              1380 1.0 5.3245e+00 1.7 7.11e+08 1.0 0.0e+00 0.0e 
> +00
> 1.4e+03  2  8  0  0 32   2  8  0  0 34  8224
> VecMDot              104 1.0 6.9024e-01 1.0 1.84e+08 1.0 0.0e+00 0.0e 
> +00
> 1.0e+02  0  2  0  0  2   0  2  0  0  3 16458
> VecNorm              984 1.0 5.8349e+00 1.7 5.07e+08 1.0 0.0e+00 0.0e 
> +00
> 9.8e+02  2  6  0  0 23   2  6  0  0 24  5351
> VecScale             142 1.0 1.5187e-01 1.7 3.66e+07 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 14835
> VecCopy              133 1.0 3.9400e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              1148 1.0 2.0722e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> VecAXPY             1684 1.0 5.1021e+00 1.1 8.67e+08 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  2 10  0  0  0   2 10  0  0  0 10473
> VecAYPX              690 1.0 1.9134e+00 1.1 3.55e+08 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  1  4  0  0  0   1  4  0  0  0 11443
> VecAXPBYCZ            38 1.0 1.7525e-01 1.1 3.91e+07 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 13761
> VecMAXPY             123 1.0 8.9613e-01 1.1 2.38e+08 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  3  0  0  0   0  3  0  0  0 16359
> VecAssemblyBegin     290 1.0 6.6559e+0015.4 0.00e+00 0.0 7.3e+03 1.0e 
> +03
> 8.7e+02  2  0  3  2 20   2  0  3  2 21     0
> VecAssemblyEnd       290 1.0 1.5714e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecPointwiseMult     280 1.0 1.2558e+00 1.1 7.21e+07 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0  3538
> VecScatterBegin     1385 1.0 4.7455e-02 1.8 0.00e+00 0.0 1.6e+05 1.3e 
> +03
> 0.0e+00  0  0 69 45  0   0  0 69 45  0     0
> VecScatterEnd       1385 1.0 4.8537e-0115.5 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNormalize         123 1.0 6.2763e-01 1.1 9.50e+07 1.0 0.0e+00 0.0e 
> +00
> 1.2e+02  0  1  0  0  3   0  1  0  0  3  9328
> MatMult             1060 1.0 2.4949e+01 1.1 3.51e+09 1.0 1.3e+05 1.3e 
> +03
> 0.0e+00 10 41 59 38  0  10 41 59 38  0  8678
> MatMultTranspose      57 1.0 1.4921e+00 1.2 2.04e+08 1.0 7.2e+03 1.3e 
> +03
> 0.0e+00  1  2  3  2  0   1  2  3  2  0  8409
> MatSolve             562 1.0 2.1214e+01 1.1 1.86e+09 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  8 22  0  0  0   8 22  0  0  0  5409
> MatLUFactorNum         2 1.0 3.7373e-01 1.2 1.41e+07 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  2320
> MatILUFactorSym        2 1.0 1.2428e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatCopy              133 1.0 2.3860e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> MatConvert            27 1.0 8.3217e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyBegin     263 1.0 8.3536e+0040.7 0.00e+00 0.0 5.0e+04 3.7e 
> +03
> 5.3e+02  3  0 22 40 12   3  0 22 40 13     0
> MatAssemblyEnd       263 1.0 4.4723e+00 1.1 0.00e+00 0.0 5.0e+02 3.3e 
> +02
> 6.6e+01  2  0  0  0  2   2  0  0  0  2     0
> MatGetRow         453796 1.5 1.8176e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ            4 1.0 5.0068e-06 2.6 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering         2 1.0 3.0140e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries       160 1.0 1.5786e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
> KSPGMRESOrthog       104 1.0 1.3677e+00 1.0 3.69e+08 1.0 0.0e+00 0.0e 
> +00
> 1.0e+02  1  4  0  0  2   1  4  0  0  3 16612
> KSPSetup              78 1.0 4.9393e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e 
> +00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve              95 1.0 1.3637e+02 1.0 7.65e+09 1.0 1.1e+05 1.3e 
> +03
> 2.5e+03 57 89 49 32 58  57 89 49 32 61  3457
> PCSetUp                6 1.0 2.7957e+00 1.0 1.41e+07 1.0 0.0e+00 0.0e 
> +00
> 6.0e+00  1  0  0  0  0   1  0  0  0  0   310
> PCSetUpOnBlocks       57 1.0 5.0076e-01 1.2 1.41e+07 1.0 0.0e+00 0.0e 
> +00
> 6.0e+00  0  0  0  0  0   0  0  0  0  0  1732
> PCApply              984 1.0 9.8020e+01 1.0 1.93e+09 1.0 0.0e+00 0.0e 
> +00
> 0.0e+00 41 22  0  0  0  41 22  0  0  0  1216
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions   Memory  Descendants'  
> Mem.
>
> --- Event Stage 0: Main Stage
>
>   Application Order     4              4  134876056     0
>           Index Set    42             42    5979736     0
>   IS L to G Mapping    18             18   19841256     0
>                 Vec   131            131  167538256     0
>         Vec Scatter    31             31      26412     0
>              Matrix    47             47  501115544     0
>       Krylov Solver     6              6      22376     0
>      Preconditioner     6              6       4256     0
>              Viewer     1              1        544     0
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
> Average time to get PetscTime(): 1.90735e-07
> Average time for MPI_Barrier(): 1.35899e-05
> Average time for zero size MPI_Send(): 1.79559e-06
> #PETSc Option Table entries:
> -log_summary
> -moeq_ksp_rtol 0.000000001
> -moeq_ksp_type cg
> -moeq_pc_type jacobi
> -poeq_ksp_monitor
> -poeq_ksp_rtol 0.000000001
> -poeq_ksp_type gmres
> -poeq_pc_hypre_type boomeramg
> -poeq_pc_type hypre
> -ueq_ksp_rtol 0.000000001



More information about the petsc-users mailing list