[petsc-users] Slow speed when using PETSc multigrid
TAY wee-beng
zonexo at gmail.com
Wed Jun 6 15:04:07 CDT 2012
Hi,
I have used 3 KSP, 2 to solve momentum eqns and 1 for the multigrid. I
have used
call KSPSetOptionsPrefix(ksp,"mg_",ierr) for the multigrid.
I run with :
-log_summary -mg_ksp_view so as to single out the multigrid ksp, but I'm
not sure if it's really working...
Here's the output:
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------
./a.out on a petsc-3.2 named n12-50 with 4 processors, by wtay Wed Jun
6 21:57:33 2012
Using Petsc Development HG revision:
c76fb3cac2a4ad0dfc9436df80f678898c867e86 HG Date: Thu May 31 00:33:26
2012 -0500
Max Max/Min Avg Total
Time (sec): 1.064e+01 1.00000 1.064e+01
Objects: 2.700e+01 1.00000 2.700e+01
Flops: 4.756e+08 1.00811 4.744e+08 1.897e+09
Flops/sec: 4.468e+07 1.00811 4.457e+07 1.783e+08
MPI Messages: 4.080e+02 2.00000 3.060e+02 1.224e+03
MPI Message Lengths: 2.328e+06 2.00000 5.706e+03 6.984e+06
MPI Reductions: 8.750e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length
N --> 2N flops
and VecAXPY() for complex vectors of length
N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 1.0644e+01 100.0% 1.8975e+09 100.0% 1.224e+03
100.0% 5.706e+03 100.0% 8.740e+02 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in
this phase
%M - percent messages in this phase %L - percent message
lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec)
Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg
len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 202 1.0 5.5096e-01 1.0 1.38e+08 1.0 1.2e+03 5.7e+03
0.0e+00 5 29 99100 0 5 29 99100 0 998
MatSolve 252 1.0 6.9136e-01 1.1 1.71e+08 1.0 0.0e+00 0.0e+00
0.0e+00 6 36 0 0 0 6 36 0 0 0 986
MatLUFactorNum 50 1.0 4.6002e-01 1.0 7.31e+07 1.0 0.0e+00 0.0e+00
0.0e+00 4 15 0 0 0 4 15 0 0 0 634
MatILUFactorSym 1 1.0 9.5899e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 50 1.0 1.6270e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+02 0 0 0 0 11 0 0 0 0 11 0
MatAssemblyEnd 50 1.0 1.0896e-01 1.0 0.00e+00 0.0 1.2e+01 1.4e+03
8.0e+00 1 0 1 0 1 1 0 1 0 1 0
MatGetRowIJ 1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 7.2002e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 100 1.0 2.9130e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 50 1.0 2.0737e+00 1.0 4.76e+08 1.0 1.2e+03 5.7e+03
4.6e+02 19100 99100 52 19100 99100 53 915
VecDot 202 1.0 7.3588e-02 1.1 1.63e+07 1.0 0.0e+00 0.0e+00
2.0e+02 1 3 0 0 23 1 3 0 0 23 885
VecDotNorm2 101 1.0 3.9155e-02 1.7 1.63e+07 1.0 0.0e+00 0.0e+00
1.0e+02 0 3 0 0 12 0 3 0 0 12 1664
VecNorm 151 1.0 5.8769e-02 1.7 1.22e+07 1.0 0.0e+00 0.0e+00
1.5e+02 0 3 0 0 17 0 3 0 0 17 829
VecCopy 100 1.0 2.3459e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 403 1.0 5.9994e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPBYCZ 202 1.0 6.6376e-02 1.0 3.26e+07 1.0 0.0e+00 0.0e+00
0.0e+00 1 7 0 0 0 1 7 0 0 0 1963
VecWAXPY 202 1.0 6.9311e-02 1.0 1.63e+07 1.0 0.0e+00 0.0e+00
0.0e+00 1 3 0 0 0 1 3 0 0 0 940
VecAssemblyBegin 100 1.0 4.0355e-0214.1 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+02 0 0 0 0 34 0 0 0 0 34 0
VecAssemblyEnd 100 1.0 5.0378e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 202 1.0 6.2275e-03 1.5 0.00e+00 0.0 1.2e+03 5.7e+03
0.0e+00 0 0 99100 0 0 0 99100 0 0
VecScatterEnd 202 1.0 2.0878e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 100 1.0 4.7225e-01 1.0 7.31e+07 1.0 0.0e+00 0.0e+00
5.0e+00 4 15 0 0 1 4 15 0 0 1 617
PCSetUpOnBlocks 50 1.0 4.7191e-01 1.0 7.31e+07 1.0 0.0e+00 0.0e+00
3.0e+00 4 15 0 0 0 4 15 0 0 0 618
PCApply 252 1.0 7.3425e-01 1.1 1.71e+08 1.0 0.0e+00 0.0e+00
0.0e+00 7 36 0 0 0 7 36 0 0 0 928
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 4 4 16900896 0
Krylov Solver 2 2 2168 0
Vector 12 12 2604080 0
Vector Scatter 1 1 1060 0
Index Set 5 5 167904 0
Preconditioner 2 2 1800 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.09673e-06
Average time for MPI_Barrier(): 4.00543e-06
Average time for zero size MPI_Send(): 1.22786e-05
#PETSc Option Table entries:
-log_summary
-mg_ksp_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Thu May 31 09:53:43 2012
Configure options: --with-mpi-dir=/opt/openmpi-1.5.3/
--with-blas-lapack-dir=/opt/intelcpro-11.1.059/mkl/lib/em64t/
--with-debugging=0 --download-hypre=1
--prefix=/home/wtay/Lib/petsc-3.2-dev_shared_rel --known-mpi-shared=1
--with-shared-libraries
-----------------------------------------
Libraries compiled on Thu May 31 09:53:43 2012 on hpc12
Machine characteristics:
Linux-2.6.32-220.2.1.el6.x86_64-x86_64-with-centos-6.2-Final
Using PETSc directory: /home/wtay/Codes/petsc-dev
Using PETSc arch: petsc-3.2-dev_shared_rel
-----------------------------------------
Using C compiler: /opt/openmpi-1.5.3/bin/mpicc -fPIC -wd1572
-Qoption,cpp,--extended_float_type -O3 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/openmpi-1.5.3/bin/mpif90 -fPIC -O3
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths:
-I/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/include
-I/home/wtay/Codes/petsc-dev/include
-I/home/wtay/Codes/petsc-dev/include
-I/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/include
-I/opt/openmpi-1.5.3/include
-----------------------------------------
Using C linker: /opt/openmpi-1.5.3/bin/mpicc
Using Fortran linker: /opt/openmpi-1.5.3/bin/mpif90
Using libraries:
-Wl,-rpath,/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/lib
-L/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/lib -lpetsc -lX11
-lpthread
-Wl,-rpath,/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/lib
-L/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/lib -lHYPRE
-lmpi_cxx -Wl,-rpath,/opt/openmpi-1.5.3/lib
-Wl,-rpath,/opt/intelcpro-11.1.059/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lstdc++
-Wl,-rpath,/opt/intelcpro-11.1.059/mkl/lib/em64t
-L/opt/intelcpro-11.1.059/mkl/lib/em64t -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -ldl
-L/opt/openmpi-1.5.3/lib -lmpi -lnsl -lutil
-L/opt/intelcpro-11.1.059/lib/intel64 -limf
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lsvml -lipgo -ldecimal -lgcc_s
-lirc -lpthread -lirc_s -lmpi_f90 -lmpi_f77 -lm -lm -lifport -lifcore
-lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lnsl
-lutil -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lpthread -lirc_s -ldl
-----------------------------------------
Yours sincerely,
TAY wee-beng
On 5/6/2012 1:34 AM, Barry Smith wrote:
> Also run with -ksp_view to see exasctly what solver options it is using. For example the number of levels, smoother on each level etc. My guess is that the below is running on one level (because I don't see you supplying options to control the number of levels etc).
>
> Barry
>
> On Jun 4, 2012, at 4:15 PM, Jed Brown wrote:
>
>> Always send -log_summary when asking about performance.
>>
>> On Mon, Jun 4, 2012 at 4:11 PM, TAY wee-beng<zonexo at gmail.com> wrote:
>> Hi,
>>
>> I tried using PETSc multigrid on my 2D CFD code. I had converted ksp eg. ex29 to Fortran and then added into my code to solve the Poisson equation.
>>
>> The main subroutines are:
>>
>> call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>>
>> call DMDACreate2d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_STENCIL_STAR,i3,i3,PETSC_DECIDE,PETSC_DECIDE,i1,i1,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,da,ierr)
>> call DMSetFunction(da,ComputeRHS,ierr)
>> call DMSetJacobian(da,ComputeMatrix,ierr)
>> call KSPSetDM(ksp,da,ierr)
>>
>> call KSPSetFromOptions(ksp,ierr)
>> call KSPSetUp(ksp,ierr)
>> call KSPSolve(ksp,PETSC_NULL_OBJECT,PETSC_NULL_OBJECT,ierr)
>> call KSPGetSolution(ksp,x,ierr)
>> call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)
>> call KSPDestroy(ksp,ierr)
>> call DMDestroy(da,ierr)
>> call PetscFinalize(ierr)
>>
>>
>> Since the LHS matrix doesn't change, I only set up at the 1st time step, thereafter I only called ComputeRHS every time step.
>>
>> I was using HYPRE's geometric multigrid and the speed was much faster.
>>
>> What other options can I tweak to improve the speed? Or should I call the subroutines above at every timestep?
>>
>> Thanks!
>>
>>
>> --
>> Yours sincerely,
>>
>> TAY wee-beng
>>
>>
More information about the petsc-users
mailing list