[petsc-users] Need help improving working PETSC code
Ganesh Vijayakumar
ganesh.iitm at gmail.com
Fri Jul 10 12:34:53 CDT 2015
Hello,
On Thu, Jul 9, 2015 at 7:32 PM Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> Ok, it is block Jacobi with ICC on each block (one per process) so
> -ksp_type cg -pc_type bjacobi -sub_pc_type icc with PETSc should give
> similar results to what they get.
>
> >
> > Where is all the data? It should list all the events and time it spends
> in each. Did you use PetscOptionsSetValue() to provide -log_summary? That
> won't work you need to pass it on the command line or in the PETSC_OPTIONS
> environmental variable or in a file called petscrc
>
Using Petsc Release Version 3.5.3, Jan, 31, 2015
Max Max/Min Avg Total
Time (sec): 9.756e+01 1.00369 9.726e+01
Objects: 4.500e+01 1.00000 4.500e+01
Flops: 1.256e+08 1.17291 1.184e+08 3.031e+10
Flops/sec: 1.292e+06 1.17364 1.217e+06 3.116e+08
MPI Messages: 3.956e+03 21.50000 1.167e+03 2.986e+05
MPI Message Lengths: 6.769e+06 4.61934 3.787e+03 1.131e+09
MPI Reductions: 3.120e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 9.7259e+01 100.0% 3.0311e+10 100.0% 2.986e+05
100.0% 3.787e+03 100.0% 3.110e+02 99.7%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec)
Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 75 1.0 1.1153e-01 2.0 2.28e+07 1.0 0.0e+00 0.0e+00
7.5e+01 0 19 0 0 24 0 19 0 0 24 51865
VecNorm 105 1.0 2.6864e-01 1.1 1.02e+07 1.0 0.0e+00 0.0e+00
1.0e+02 0 8 0 0 34 0 8 0 0 34 9580
VecScale 90 1.0 2.2329e-02 6.7 4.35e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 4 0 0 0 0 4 0 0 0 49394
VecSet 121 1.0 1.1327e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 15 1.0 1.6739e-03 1.2 1.45e+06 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 219629
VecWAXPY 15 1.0 2.1994e-03 1.9 7.25e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 83578
VecMAXPY 90 1.0 2.7625e-02 1.8 3.01e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 25 0 0 0 0 25 0 0 0 275924
VecAssemblyBegin 30 1.0 1.2747e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
9.0e+01 0 0 0 0 29 0 0 0 0 29 0
VecAssemblyEnd 30 1.0 5.1475e-0426.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 90 1.0 1.4369e-02 5.4 0.00e+00 0.0 2.9e+05 3.9e+03
0.0e+00 0 0 98 99 0 0 0 98 99 0 0
VecScatterEnd 90 1.0 4.2581e-0211.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 90 1.0 1.6290e-01 1.6 5.63e+07 1.5 2.9e+05 3.9e+03
0.0e+00 0 42 98 99 0 0 42 98 99 0 77813
MatConvert 5 1.0 4.0061e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 10 1.0 1.2128e-01 3.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+01 0 0 0 0 6 0 0 0 0 6 0
MatAssemblyEnd 10 1.0 5.6291e-02 1.0 0.00e+00 0.0 6.5e+03 9.6e+02
8.0e+00 0 0 2 1 3 0 0 2 1 3 0
MatGetRowIJ 10 1.0 9.0599e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 5 1.0 5.0242e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 5 1.0 3.0882e-03 3.5 0.00e+00 0.0 0.0e+00 0.0e+00
5.0e+00 0 0 0 0 2 0 0 0 0 2 0
KSPGMRESOrthog 75 1.0 1.2176e-01 1.9 4.56e+07 1.0 0.0e+00 0.0e+00
7.5e+01 0 38 0 0 24 0 38 0 0 24 95014
KSPSetUp 1 1.0 2.6391e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 15 1.0 9.3209e+01 1.0 1.26e+08 1.2 2.9e+05 3.9e+03
1.8e+02 96100 98 99 59 96100 98 99 59 325
PCSetUp 5 1.0 7.7425e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 80 0 0 0 1 80 0 0 0 1 0
PCApply 75 1.0 1.5272e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 16 0 0 0 0 16 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 36 34 12361096 0
Vector Scatter 1 1 1060 0
Matrix 3 3 10175808 0
Krylov Solver 1 1 18960 0
Preconditioner 1 1 1096 0
Viewer 1 0 0 0
Index Set 2 2 38884 0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 1.7786e-05
Average time for zero size MPI_Send(): 0.000176195
#PETSc Option Table entries:
-info blah
-log_summary
-mat_view ::ascii_info
-parallel
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-x=0 -with-pic
--with-external-packages-dir=/opt/apps/intel13/mvapich2_1_9/petsc/3.5/externalpackages
--with-mpi-compilers=1 --with-mpi-dir=/opt/apps/intel13/mvapich2/1.9
--with-scalar-type=real --with-shared-libraries=1 --with-precision=double
--with-hypre=1 --download-hypre --with-ml=1 --download-ml --with-ml=1
--download-ml --with-superlu_dist=1 --download-superlu_dist
--with-superlu=1 --download-superlu --with-parmetis=1 --download-parmetis
--with-metis=1 --download-metis --with-spai=1 --download-spai
--with-mumps=1 --download-mumps --with-parmetis=1 --download-parmetis
--with-metis=1 --download-metis --with-scalapack=1 --download-scalapack
--with-blacs=1 --download-blacs --with-spooles=1 --download-spooles
--with-hdf5=1 --with-hdf5-dir=/opt/apps/intel13/mvapich2_1_9/phdf5/1.8.9
--with-debugging=no
--with-blas-lapack-dir=/opt/apps/intel/13/composer_xe_2013.2.146/mkl
--with-mpiexec=mpirun_rsh --COPTFLAGS= --FOPTFLAGS= --CXXOPTFLAGS=
-----------------------------------------
Libraries compiled on Thu Apr 2 10:06:57 2015 on
staff.stampede.tacc.utexas.edu
Machine characteristics:
Linux-2.6.32-431.17.1.el6.x86_64-x86_64-with-centos-6.6-Final
Using PETSc directory: /opt/apps/intel13/mvapich2_1_9/petsc/3.5
Using PETSc arch: sandybridge
-----------------------------------------
Using C compiler: /opt/apps/intel13/mvapich2/1.9/bin/mpicc -fPIC -wd1572
${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/apps/intel13/mvapich2/1.9/bin/mpif90 -fPIC
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths:
-I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/include
-I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/include
-I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/include
-I/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/include
-I/opt/apps/intel13/mvapich2_1_9/phdf5/1.8.9/include
-I/opt/apps/intel13/mvapich2/1.9/include
-----------------------------------------
Using C linker: /opt/apps/intel13/mvapich2/1.9/bin/mpicc
Using Fortran linker: /opt/apps/intel13/mvapich2/1.9/bin/mpif90
Using libraries:
-Wl,-rpath,/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/lib
-L/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/lib -lpetsc
-Wl,-rpath,/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/lib
-L/opt/apps/intel13/mvapich2_1_9/petsc/3.5/sandybridge/lib -lsuperlu_4.3
-lHYPRE -Wl,-rpath,/opt/ofed/lib64 -L/opt/ofed/lib64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -L/opt/apps/limic2/0.5.5/lib
-Wl,-rpath,/opt/apps/intel13/mvapich2/1.9/lib
-L/opt/apps/intel13/mvapich2/1.9/lib
-Wl,-rpath,/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-L/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lmpichcxx -lml -lmpichcxx -lspai
-lsuperlu_dist_3.3 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common
-lpord -lscalapack
-Wl,-rpath,/opt/apps/intel/13/composer_xe_2013.2.146/mkl/lib/intel64
-L/opt/apps/intel/13/composer_xe_2013.2.146/mkl/lib/intel64
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lparmetis
-lmetis -lpthread -lssl -lcrypto
-Wl,-rpath,/opt/apps/intel13/mvapich2_1_9/phdf5/1.8.9/lib
-L/opt/apps/intel13/mvapich2_1_9/phdf5/1.8.9/lib -lhdf5hl_fortran
-lhdf5_fortran -lhdf5_hl -lhdf5 -lmpichf90 -lifport -lifcore -lm -lmpichcxx
-Wl,-rpath,/opt/ofed/lib64 -L/opt/ofed/lib64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -L/opt/apps/limic2/0.5.5/lib
-Wl,-rpath,/opt/ofed/lib64 -L/opt/ofed/lib64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -L/opt/apps/limic2/0.5.5/lib -ldl
-Wl,-rpath,/opt/apps/intel13/mvapich2/1.9/lib
-L/opt/apps/intel13/mvapich2/1.9/lib -lmpich -lopa -lmpl -libmad -lrdmacm
-libumad -libverbs -lrt -llimic2 -lpthread -Wl,-rpath,/opt/ofed/lib64
-L/opt/ofed/lib64 -Wl,-rpath,/opt/apps/limic2/0.5.5/lib
-L/opt/apps/limic2/0.5.5/lib -Wl,-rpath,/opt/ofed/lib64 -L/opt/ofed/lib64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -L/opt/apps/limic2/0.5.5/lib
-Wl,-rpath,/opt/apps/intel13/mvapich2/1.9/lib
-L/opt/apps/intel13/mvapich2/1.9/lib
-Wl,-rpath,/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-L/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7
-Wl,-rpath,/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -Wl,-rpath,/opt/ofed/lib64
-Wl,-rpath,/opt/apps/intel13/mvapich2/1.9/lib -limf -lsvml -lirng -lipgo
-ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lirc_s
-Wl,-rpath,/opt/ofed/lib64 -L/opt/ofed/lib64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -L/opt/apps/limic2/0.5.5/lib
-Wl,-rpath,/opt/ofed/lib64 -L/opt/ofed/lib64
-Wl,-rpath,/opt/apps/limic2/0.5.5/lib -L/opt/apps/limic2/0.5.5/lib
-Wl,-rpath,/opt/apps/intel13/mvapich2/1.9/lib
-L/opt/apps/intel13/mvapich2/1.9/lib
-Wl,-rpath,/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-L/opt/apps/intel/13/composer_xe_2013.2.146/compiler/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -ldl
-----------------------------------------
Finalising parallel run
> 1. How I do I tell PETSC that my matrix is symmetric. I tried setting my
matrix as follows... but am apprehensive of it.
> >
> > MatCreateSBAIJ(PETSC_COMM_WORLD, 1, nCellsCurProc, nCellsCurProc,
> nTotalCells, nTotalCells, 10, NULL, 5, NULL, &A);
> >
> > Could I still use MatSetValue on both upper and lower diagonal part of
> the matrix. Will PETSC understand that it's redundant?
>
> Yes, run with -mat_ignore_lower_triangular or call
> MatSetOption(mat,MAT_IGNORE_LOWER_TRIANGULAR,PETSC_TRUE)
>
This is very useful.. thanks.
I have a question on setting block sizes. Should I create 1 block per
processor? If so what do I set the d_nz and o_nz as? Right now I allocate
memory for 10 non-zero elements per row that are local to the processor and
5 non-zero elements that are non-local. So my understanding was that
MatCreateSBAIJ(PETSC_COMM_WORLD, 1, nCellsCurProc, nCellsCurProc,
nTotalCells, nTotalCells, 10, NULL, 5, NULL, &A);
should become
MatCreateSBAIJ(PETSC_COMM_WORLD, nCellsCurProc, nCellsCurProc,
nCellsCurProc, nTotalCells, nTotalCells, 10*nCellsCurProc, NULL,
5*nCellsCurProc, NULL, &A);
But PETSC doesn't seem to like this. It complains that it's out of memory
and throws a whole lot of error messages. Clearly something's wrong. Could
you please tell me what is.
> 2. Do I need PCFactorSetShiftType(pc,MAT_SHIFT_POSITIVE_DEFINITE); ?
>
> I hope not. But you might.
>
Ok. I tried with and without it... doesn't seem to make a difference. So
off for now. Will turn it on if necessary.
> 3. What does KSPSetReusePreconditioner(ksp, PETSC_TRUE) do? Should I use
> it?
>
> Not at first. What it does it not build a new preconditioner for each
> solve. If the matrix is changing "slowly" you can often get away with
> setting this for some number of linear solvers, then set it back to false
> for the next solve then set it to true again for some number of linear
> solvers. You could try it with hypre, say keeping it the same for 10, 50,
> 100 solves and see what happens time wise.
>
This was most useful. I did two things. First I shifted the creation of the
KSP object to the initialization stage. So no more creation and deletion of
KSP objects. Second I set ReusePreconditioner to true when the matrix
changes and false when it doesn't. All of this got my execution time down
from 250s to about 103s! I think that's great. Thanks again.
ganesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150710/ce982815/attachment-0001.html>
More information about the petsc-users
mailing list