[petsc-users] Need help improving working PETSC code

Ganesh Vijayakumar ganesh.iitm at gmail.com
Fri Jul 10 12:34:53 CDT 2015


Hello,

On Thu, Jul 9, 2015 at 7:32 PM Barry Smith <bsmith at mcs.anl.gov> wrote:

>
>   Ok, it is block Jacobi with ICC on each block (one per process) so
> -ksp_type cg -pc_type bjacobi -sub_pc_type icc with PETSc should give
> similar results to what they get.
>
> >
> > Where is all the data?  It should list all the events and time it spends
> in each. Did  you use PetscOptionsSetValue() to provide -log_summary? That
> won't work you need to pass it on the command line or in the PETSC_OPTIONS
> environmental variable or in a file called petscrc
>

Using Petsc Release Version 3.5.3, Jan, 31, 2015

                         Max       Max/Min        Avg      Total
Time (sec):           9.756e+01      1.00369   9.726e+01
Objects:              4.500e+01      1.00000   4.500e+01
Flops:                1.256e+08      1.17291   1.184e+08  3.031e+10
Flops/sec:            1.292e+06      1.17364   1.217e+06  3.116e+08
MPI Messages:         3.956e+03     21.50000   1.167e+03  2.986e+05
MPI Message Lengths:  6.769e+06      4.61934   3.787e+03  1.131e+09
MPI Reductions:       3.120e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 9.7259e+01 100.0%  3.0311e+10 100.0%  2.986e+05
100.0%  3.787e+03      100.0%  3.110e+02  99.7%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)
Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMDot               75 1.0 1.1153e-01 2.0 2.28e+07 1.0 0.0e+00 0.0e+00
7.5e+01  0 19  0  0 24   0 19  0  0 24 51865
VecNorm              105 1.0 2.6864e-01 1.1 1.02e+07 1.0 0.0e+00 0.0e+00
1.0e+02  0  8  0  0 34   0  8  0  0 34  9580
VecScale              90 1.0 2.2329e-02 6.7 4.35e+06 1.0 0.0e+00 0.0e+00
0.0e+00  0  4  0  0  0   0  4  0  0  0 49394
VecSet               121 1.0 1.1327e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               15 1.0 1.6739e-03 1.2 1.45e+06 1.0 0.0e+00 0.0e+00
0.0e+00  0  1  0  0  0   0  1  0  0  0 219629
VecWAXPY              15 1.0 2.1994e-03 1.9 7.25e+05 1.0 0.0e+00 0.0e+00
0.0e+00  0  1  0  0  0   0  1  0  0  0 83578
VecMAXPY              90 1.0 2.7625e-02 1.8 3.01e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0 25  0  0  0   0 25  0  0  0 275924
VecAssemblyBegin      30 1.0 1.2747e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
9.0e+01  0  0  0  0 29   0  0  0  0 29     0
VecAssemblyEnd        30 1.0 5.1475e-0426.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin       90 1.0 1.4369e-02 5.4 0.00e+00 0.0 2.9e+05 3.9e+03
0.0e+00  0  0 98 99  0   0  0 98 99  0     0
VecScatterEnd         90 1.0 4.2581e-0211.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               90 1.0 1.6290e-01 1.6 5.63e+07 1.5 2.9e+05 3.9e+03
0.0e+00  0 42 98 99  0   0 42 98 99  0 77813
MatConvert             5 1.0 4.0061e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      10 1.0 1.2128e-01 3.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+01  0  0  0  0  6   0  0  0  0  6     0
MatAssemblyEnd        10 1.0 5.6291e-02 1.0 0.00e+00 0.0 6.5e+03 9.6e+02
8.0e+00  0  0  2  1  3   0  0  2  1  3     0
MatGetRowIJ           10 1.0 9.0599e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         5 1.0 5.0242e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                5 1.0 3.0882e-03 3.5 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00  0  0  0  0  2   0  0  0  0  2     0
KSPGMRESOrthog        75 1.0 1.2176e-01 1.9 4.56e+07 1.0 0.0e+00 0.0e+00
7.5e+01  0 38  0  0 24   0 38  0  0 24 95014
KSPSetUp               1 1.0 2.6391e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              15 1.0 9.3209e+01 1.0 1.26e+08 1.2 2.9e+05 3.9e+03
1.8e+02 96100 98 99 59  96100 98 99 59   325
PCSetUp                5 1.0 7.7425e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 80  0  0  0  1  80  0  0  0  1     0
PCApply               75 1.0 1.5272e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 16  0  0  0  0  16  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    36             34     12361096     0
      Vector Scatter     1              1         1060     0
              Matrix     3              3     10175808     0
       Krylov Solver     1              1        18960     0
      Preconditioner     1              1         1096     0
              Viewer     1              0            0     0
           Index Set     2              2        38884     0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 1.7786e-05
Average time for zero size MPI_Send(): 0.000176195
#PETSc Option Table entries:
-info blah
-log_summary
-mat_view ::ascii_info
-parallel
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-x=0 -with-pic
--with-external-packages-dir=/opt/apps/intel13/mvapich2_1_9/petsc/3.5/externalpackages
--with-mpi-compilers=1 --with-mpi-dir=/opt/apps/intel13/mvapich2/1.9
--with-scalar-type=real --with-shared-libraries=1 --with-precision=double
--with-hypre=1 --download-hypre --with-ml=1 --download-ml --with-ml=1
--download-ml --with-superlu_dist=1 --download-superlu_dist
--with-superlu=1 --download-superlu --with-parmetis=1 --download-parmetis
--with-metis=1 --download-metis --with-spai=1 --download-spai
--with-mumps=1 --download-mumps --with-parmetis=1 --download-parmetis
--with-metis=1 --download-metis --with-scalapack=1 --download-scalapack
--with-blacs=1 --download-blacs --with-spooles=1 --download-spooles
--with-hdf5=1 --with-hdf5-dir=/opt/apps/intel13/mvapich2_1_9/phdf5/1.8.9
--with-debugging=no
--with-blas-lapack-dir=/opt/apps/intel/13/composer_xe_2013.2.146/mkl
--with-mpiexec=mpirun_rsh --COPTFLAGS= --FOPTFLAGS= --CXXOPTFLAGS=
-----------------------------------------
Libraries compiled on Thu Apr  2 10:06:57 2015 on
staff.stampede.tacc.utexas.edu
Machine characteristics:
Linux-2.6.32-431.17.1.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /opt/apps/intel13/mvapich2_1_9/petsc/3.5
Using PETSc arch: sandybridge
-----------------------------------------

Using C compiler: /opt/apps/intel13/mvapich2/1.9/bin/mpicc  -fPIC -wd1572
${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/apps/intel13/mvapich2/1.9/bin/mpif90  -fPIC
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths:
-I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/include
-I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/include
-I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/include
-I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/include
-I/opt/apps/intel13/mvapich2_1_9/phdf5/1.8.9/include
-I/opt/apps/intel13/mvapich2/1.9/include
-----------------------------------------
Using C linker: /opt/apps/intel13/mvapich2/1.9/bin/mpicc
Using Fortran linker: /opt/apps/intel13/mvapich2/1.9/bin/mpif90
Using libraries:
-Wl,-rpath,/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/lib
-L/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/lib -lpetsc
-Wl,-rpath,/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/lib
-L/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/lib -lsuperlu_4.3
-lHYPRE -Wl,-rpath,/opt/ofed/lib64 -L/opt/ofed/lib64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -L/opt/apps/limic2/0.5.5/lib
-Wl,-rpath,/opt/apps/intel13/mvapich2/1.9/lib
-L/opt/apps/intel13/mvapich2/1.9/lib
-Wl,-rpath,/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-L/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lmpichcxx -lml -lmpichcxx -lspai
-lsuperlu_dist_3.3 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common
-lpord -lscalapack
-Wl,-rpath,/opt/apps/intel/13/composer_xe_2013.2.146/mkl/lib/intel64
-L/opt/apps/intel/13/composer_xe_2013.2.146/mkl/lib/intel64
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lparmetis
-lmetis -lpthread -lssl -lcrypto
-Wl,-rpath,/opt/apps/intel13/mvapich2_1_9/phdf5/1.8.9/lib
-L/opt/apps/intel13/mvapich2_1_9/phdf5/1.8.9/lib -lhdf5hl_fortran
-lhdf5_fortran -lhdf5_hl -lhdf5 -lmpichf90 -lifport -lifcore -lm -lmpichcxx
-Wl,-rpath,/opt/ofed/lib64 -L/opt/ofed/lib64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -L/opt/apps/limic2/0.5.5/lib
-Wl,-rpath,/opt/ofed/lib64 -L/opt/ofed/lib64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -L/opt/apps/limic2/0.5.5/lib -ldl
-Wl,-rpath,/opt/apps/intel13/mvapich2/1.9/lib
-L/opt/apps/intel13/mvapich2/1.9/lib -lmpich -lopa -lmpl -libmad -lrdmacm
-libumad -libverbs -lrt -llimic2 -lpthread -Wl,-rpath,/opt/ofed/lib64
-L/opt/ofed/lib64 -Wl,-rpath,/opt/apps/limic2/0.5.5/lib
-L/opt/apps/limic2/0.5.5/lib -Wl,-rpath,/opt/ofed/lib64 -L/opt/ofed/lib64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -L/opt/apps/limic2/0.5.5/lib
-Wl,-rpath,/opt/apps/intel13/mvapich2/1.9/lib
-L/opt/apps/intel13/mvapich2/1.9/lib
-Wl,-rpath,/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-L/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7
-Wl,-rpath,/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -Wl,-rpath,/opt/ofed/lib64
-Wl,-rpath,/opt/apps/intel13/mvapich2/1.9/lib -limf -lsvml -lirng -lipgo
-ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s
-Wl,-rpath,/opt/ofed/lib64 -L/opt/ofed/lib64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -L/opt/apps/limic2/0.5.5/lib
-Wl,-rpath,/opt/ofed/lib64 -L/opt/ofed/lib64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -L/opt/apps/limic2/0.5.5/lib
-Wl,-rpath,/opt/apps/intel13/mvapich2/1.9/lib
-L/opt/apps/intel13/mvapich2/1.9/lib
-Wl,-rpath,/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-L/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -ldl
-----------------------------------------

Finalising parallel run


> 1. How I do I tell PETSC that my matrix is symmetric. I tried setting my
matrix as follows... but am apprehensive of it.

> >
> > MatCreateSBAIJ(PETSC_COMM_WORLD, 1, nCellsCurProc, nCellsCurProc,
> nTotalCells, nTotalCells, 10, NULL, 5, NULL, &A);
> >
> >  Could I still use MatSetValue on both upper and lower diagonal part of
> the matrix. Will PETSC understand that it's redundant?
>
>   Yes, run with -mat_ignore_lower_triangular or call
> MatSetOption(mat,MAT_IGNORE_LOWER_TRIANGULAR,PETSC_TRUE)
>

This is very useful.. thanks.

I have a question on setting block sizes. Should I create 1 block per
processor? If so what do I set the d_nz and o_nz as? Right now I allocate
memory for 10 non-zero elements per row that are local to the processor and
5 non-zero elements that are non-local. So my understanding was that

MatCreateSBAIJ(PETSC_COMM_WORLD, 1, nCellsCurProc, nCellsCurProc,
nTotalCells, nTotalCells, 10, NULL, 5, NULL, &A);

should become

MatCreateSBAIJ(PETSC_COMM_WORLD, nCellsCurProc, nCellsCurProc,
nCellsCurProc, nTotalCells, nTotalCells, 10*nCellsCurProc, NULL,
5*nCellsCurProc, NULL, &A);


But PETSC doesn't seem to like this. It complains that it's out of memory
and throws a whole lot of error messages. Clearly something's wrong. Could
you please tell me what is.

> 2. Do I need PCFactorSetShiftType(pc,MAT_SHIFT_POSITIVE_DEFINITE); ?
>
>   I hope not. But you might.
>

Ok. I tried with and without it... doesn't seem to make a difference. So
off for now. Will turn it on if necessary.

> 3. What does KSPSetReusePreconditioner(ksp, PETSC_TRUE) do? Should I use
> it?
>
>   Not at first. What it does it not build a new preconditioner for each
> solve. If the matrix is changing "slowly" you can often get away with
> setting this for some number of linear solvers, then set it back to false
> for the next solve then set it to true again for some number of linear
> solvers.  You could try it with hypre, say keeping it the same for 10, 50,
> 100 solves and see what happens time wise.
>

This was most useful. I did two things. First I shifted the creation of the
KSP object to the initialization stage. So no more creation and deletion of
KSP objects. Second I set ReusePreconditioner to true when the matrix
changes and false when it doesn't. All of this got my execution time down
from 250s to about 103s! I think that's great. Thanks again.

ganesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150710/ce982815/attachment-0001.html>


More information about the petsc-users mailing list