[petsc-users] Slow speed when using PETSc multigrid
TAY wee-beng
zonexo at gmail.com
Thu Jun 7 16:56:59 CDT 2012
On 7/6/2012 1:20 PM, Matthew Knepley wrote:
> On Thu, Jun 7, 2012 at 3:35 PM, TAY wee-beng <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
>
> On 7/6/2012 4:33 AM, Jed Brown wrote:
>> On Wed, Jun 6, 2012 at 4:21 PM, TAY wee-beng <zonexo at gmail.com
>> <mailto:zonexo at gmail.com>> wrote:
>>
>> *call PCMGSetLevels(pc,mg_lvl,MPI_COMM_WORLD,ierr)*
>>
>>
>> The third arguments is an array of length mg_lvl, not a single
>> communicator. You can pass PETSC_NULL_OBJECT just like the man
>> page says to use the default.
>>
>>
> I changed but the same Segmentation still occurs:
>
>
> Look, programming is a skill. It demands you learn to use certain
> tools. A message like "Segmentation still occurs"
> is USELESS since we are not looking at your code or running it.
> Sending in a stack trace from gdb is much more
> informative and means you will get help sooner.
>
> Matt
I have tried to troubleshoot and found the problem. Now after adding
*PCMGSetLevels* with mg_lvl = 1 and using *-log_summary -mg_ksp_view* (
with
call KSPSetOptionsPrefix(ksp,"mg_",ierr)), I got the output below. I looked at the manual but I'm not sure how to get better performance. Also, what are the more common options to start with. Is there an appropriate C example? Some options are:
*PCMGSetLevels* - how many lvls are appropriate?
PCMGSetCycleType -
PCMGSetNumberSmoothUp/down etc
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------
./a.out on a petsc-3.2 named n12-58 with 4 processors, by wtay Thu Jun
7 23:00:37 2012
Using Petsc Development HG revision:
c76fb3cac2a4ad0dfc9436df80f678898c867e86 HG Date: Thu May 31 00:33:26
2012 -0500
Max Max/Min Avg Total
Time (sec): 8.522e+01 1.00001 8.522e+01
Objects: 2.700e+01 1.00000 2.700e+01
Flops: 4.756e+08 1.00811 4.744e+08 1.897e+09
Flops/sec: 5.580e+06 1.00812 5.566e+06 2.227e+07
Memory: 2.075e+07 1.00333 8.291e+07
MPI Messages: 4.080e+02 2.00000 3.060e+02 1.224e+03
MPI Message Lengths: 2.328e+06 2.00000 5.706e+03 6.984e+06
MPI Reductions: 3.057e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length
N --> 2N flops
and VecAXPY() for complex vectors of length
N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 8.5219e+01 100.0% 1.8975e+09 100.0% 1.224e+03
100.0% 5.706e+03 100.0% 3.056e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in
this phase
%M - percent messages in this phase %L - percent message
lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option, #
# To get timing results run ./configure #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
Event Count Time (sec)
Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg
len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 202 1.0 3.0738e+00 1.2 1.38e+08 1.0 1.2e+03 5.7e+03
0.0e+00 3 29 99100 0 3 29 99100 0 179
MatSolve 252 1.0 1.7658e+00 1.1 1.71e+08 1.0 0.0e+00 0.0e+00
0.0e+00 2 36 0 0 0 2 36 0 0 0 386
MatLUFactorNum 50 1.0 2.3908e+00 1.0 7.31e+07 1.0 0.0e+00 0.0e+00
0.0e+00 3 15 0 0 0 3 15 0 0 0 122
MatILUFactorSym 1 1.0 2.5288e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 50 1.0 1.6280e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+02 0 0 0 0 3 0 0 0 0 3 0
MatAssemblyEnd 50 1.0 4.1831e-01 1.0 0.00e+00 0.0 1.2e+01 1.4e+03
2.2e+02 0 0 1 0 7 0 0 1 0 7 0
MatGetRowIJ 1 1.0 4.0531e-06 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.6429e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 100 1.0 4.1475e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 50 1.0 1.8577e+01 1.0 4.76e+08 1.0 1.2e+03 5.7e+03
1.9e+03 22100 99100 63 22100 99100 63 102
VecDot 202 1.0 1.0362e+00 1.4 1.63e+07 1.0 0.0e+00 0.0e+00
2.0e+02 1 3 0 0 7 1 3 0 0 7 63
VecDotNorm2 101 1.0 1.7485e+00 2.6 1.63e+07 1.0 0.0e+00 0.0e+00
1.0e+02 1 3 0 0 3 1 3 0 0 3 37
VecNorm 151 1.0 1.6854e+00 1.1 1.22e+07 1.0 0.0e+00 0.0e+00
1.5e+02 2 3 0 0 5 2 3 0 0 5 29
VecCopy 100 1.0 7.1418e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 403 1.0 1.7004e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPBYCZ 202 1.0 3.0207e-01 1.5 3.26e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 7 0 0 0 0 7 0 0 0 431
VecWAXPY 202 1.0 3.2482e-01 1.4 1.63e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 3 0 0 0 0 3 0 0 0 201
VecAssemblyBegin 100 1.0 3.3056e+00 3.1 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+02 2 0 0 0 10 2 0 0 0 10 0
VecAssemblyEnd 100 1.0 9.0289e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 202 1.0 3.1948e-02 2.6 0.00e+00 0.0 1.2e+03 5.7e+03
0.0e+00 0 0 99100 0 0 0 99100 0 0
VecScatterEnd 202 1.0 9.4827e-01 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
PCSetUp 100 1.0 2.4949e+00 1.0 7.31e+07 1.0 0.0e+00 0.0e+00
8.0e+00 3 15 0 0 0 3 15 0 0 0 117
PCSetUpOnBlocks 50 1.0 2.4723e+00 1.0 7.31e+07 1.0 0.0e+00 0.0e+00
4.0e+00 3 15 0 0 0 3 15 0 0 0 118
PCApply 252 1.0 3.7255e+00 1.4 1.71e+08 1.0 0.0e+00 0.0e+00
5.0e+02 4 36 0 0 16 4 36 0 0 16 183
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 4 4 16900896 0
Krylov Solver 2 2 2168 0
Vector 12 12 2604080 0
Vector Scatter 1 1 1060 0
Index Set 5 5 167904 0
Preconditioner 2 2 1800 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 5.57899e-06
Average time for zero size MPI_Send(): 2.37226e-05
#PETSc Option Table entries:
-log_summary
-mg_ksp_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Thu May 31 10:24:12 2012
Configure options: --with-mpi-dir=/opt/openmpi-1.5.3/
--with-blas-lapack-dir=/opt/intelcpro-11.1.059/mkl/lib/em64t/
--with-debugging=1 --download-hypre=1
--prefix=/home/wtay/Lib/petsc-3.2-dev_shared_debug --known-mpi-shared=1
--with-shared-libraries
-----------------------------------------
Libraries compiled on Thu May 31 10:24:12 2012 on hpc12
Machine characteristics:
Linux-2.6.32-220.2.1.el6.x86_64-x86_64-with-centos-6.2-Final
Using PETSc directory: /home/wtay/Codes/petsc-dev
Using PETSc arch: petsc-3.2-dev_shared_debug
-----------------------------------------
Using C compiler: /opt/openmpi-1.5.3/bin/mpicc -fPIC -wd1572
-Qoption,cpp,--extended_float_type -g ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/openmpi-1.5.3/bin/mpif90 -fPIC -g
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths:
-I/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_debug/include
-I/home/wtay/Codes/petsc-dev/include
-I/home/wtay/Codes/petsc-dev/include
-I/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_debug/include
-I/opt/openmpi-1.5.3/include
-----------------------------------------
Using C linker: /opt/openmpi-1.5.3/bin/mpicc
Using Fortran linker: /opt/openmpi-1.5.3/bin/mpif90
Using libraries:
-Wl,-rpath,/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_debug/lib
-L/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_debug/lib -lpetsc
-lX11 -lpthread
-Wl,-rpath,/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_debug/lib
-L/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_debug/lib -lHYPRE
-lmpi_cxx -Wl,-rpath,/opt/openmpi-1.5.3/lib
-Wl,-rpath,/opt/intelcpro-11.1.059/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lstdc++
-Wl,-rpath,/opt/intelcpro-11.1.059/mkl/lib/em64t
-L/opt/intelcpro-11.1.059/mkl/lib/em64t -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -ldl
-L/opt/openmpi-1.5.3/lib -lmpi -lnsl -lutil
-L/opt/intelcpro-11.1.059/lib/intel64 -limf
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lsvml -lipgo -ldecimal -lgcc_s
-lirc -lpthread -lirc_s -lmpi_f90 -lmpi_f77 -lm -lm -lifport -lifcore
-lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lnsl
-lutil -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lpthread -lirc_s -ldl
-----------------------------------------
> call KSPCreate(MPI_COMM_WORLD,ksp,ierr)
>
> call KSPGetPC(ksp,pc,ierr)
>
> call PCSetType(pc_uv,PCMG,ierr)
>
> mg_lvl = 1 (or 2)
>
> call PCMGSetLevels(pc,mg_lvl,PETSC_NULL_OBJECT,ierr)
>
> call
> DMDACreate2d(MPI_COMM_WORLD,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_STENCIL_STAR,size_x,size_y,1,num_procs,i1,i1,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,da,ierr)
>
> ...
>
> Btw, I tried to look at
> http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex42.c.html
> but I think there's some error in the page formatting.
>
>>
>> However, I get the error:
>>
>> Caught signal number 11 SEGV: Segmentation Violation,
>> probably memory access out of range
>>
>> after calling *PCMGSetLevels*
>>
>> What's the problem? Is there any examples which I can follow?
>>
>>
>> I believe the other examples that use this routine are in C or
>> just tests (not tutorial-style) examples.
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120607/eced704f/attachment-0001.html>
More information about the petsc-users
mailing list