[petsc-users] Slow speed when using PETSc multigrid

TAY wee-beng zonexo at gmail.com
Thu Jun 7 16:56:59 CDT 2012


On 7/6/2012 1:20 PM, Matthew Knepley wrote:
> On Thu, Jun 7, 2012 at 3:35 PM, TAY wee-beng <zonexo at gmail.com 
> <mailto:zonexo at gmail.com>> wrote:
>
>
>     On 7/6/2012 4:33 AM, Jed Brown wrote:
>>     On Wed, Jun 6, 2012 at 4:21 PM, TAY wee-beng <zonexo at gmail.com
>>     <mailto:zonexo at gmail.com>> wrote:
>>
>>         *call PCMGSetLevels(pc,mg_lvl,MPI_COMM_WORLD,ierr)*
>>
>>
>>     The third arguments is an array of length mg_lvl, not a single
>>     communicator. You can pass PETSC_NULL_OBJECT just like the man
>>     page says to use the default.
>>
>>
>     I changed but the same Segmentation still occurs:
>
>
> Look, programming is a skill. It demands you learn to use certain 
> tools. A message like "Segmentation still occurs"
> is USELESS since we are not looking at your code or running it. 
> Sending in a stack trace from gdb is much more
> informative and means you will get help sooner.
>
>    Matt

I have tried to troubleshoot and found the problem. Now after adding 
*PCMGSetLevels* with mg_lvl = 1 and using *-log_summary -mg_ksp_view* ( 
with

call KSPSetOptionsPrefix(ksp,"mg_",ierr)), I got the output below. I looked at the manual but I'm not sure how to get better performance. Also, what are the more common options to start with. Is there an appropriate C example? Some options are:

*PCMGSetLevels*   - how many lvls are appropriate?

PCMGSetCycleType -

PCMGSetNumberSmoothUp/down etc




************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance 
Summary: ----------------------------------------------

./a.out on a petsc-3.2 named n12-58 with 4 processors, by wtay Thu Jun  
7 23:00:37 2012
Using Petsc Development HG revision: 
c76fb3cac2a4ad0dfc9436df80f678898c867e86  HG Date: Thu May 31 00:33:26 
2012 -0500

                          Max       Max/Min        Avg      Total
Time (sec):           8.522e+01      1.00001   8.522e+01
Objects:              2.700e+01      1.00000   2.700e+01
Flops:                4.756e+08      1.00811   4.744e+08  1.897e+09
Flops/sec:            5.580e+06      1.00812   5.566e+06  2.227e+07
Memory:               2.075e+07      1.00333              8.291e+07
MPI Messages:         4.080e+02      2.00000   3.060e+02  1.224e+03
MPI Message Lengths:  2.328e+06      2.00000   5.706e+03  6.984e+06
MPI Reductions:       3.057e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                             e.g., VecAXPY() for real vectors of length 
N --> 2N flops
                             and VecAXPY() for complex vectors of length 
N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages 
---  -- Message Lengths --  -- Reductions --
                         Avg     %Total     Avg     %Total   counts   
%Total     Avg         %Total   counts   %Total
  0:      Main Stage: 8.5219e+01 100.0%  1.8975e+09 100.0%  1.224e+03 
100.0%  5.706e+03      100.0%  3.056e+03 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on 
interpreting output.
Phase summary info:
    Count: number of times phase was executed
    Time and Flops: Max - maximum over all processors
                    Ratio - ratio of maximum to minimum over all processors
    Mess: number of messages sent
    Avg. len: average message length
    Reduct: number of global reductions
    Global: entire computation
    Stage: stages of a computation. Set stages with PetscLogStagePush() 
and PetscLogStagePop().
       %T - percent time in this phase         %f - percent flops in 
this phase
       %M - percent messages in this phase     %L - percent message 
lengths in this phase
       %R - percent reductions in this phase
    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
over all processors)
------------------------------------------------------------------------------------------------------------------------
       ##########################################################
       #                                                        #
       #                          WARNING!!!                    #
       #                                                        #
       #   This code was compiled with a debugging option,      #
       #   To get timing results run ./configure                #
       #   using --with-debugging=no, the performance will      #
       #   be generally two or three times faster.              #
       #                                                        #
       ##########################################################


Event                Count      Time (sec)     
Flops                             --- Global ---  --- Stage ---   Total
                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg 
len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult              202 1.0 3.0738e+00 1.2 1.38e+08 1.0 1.2e+03 5.7e+03 
0.0e+00  3 29 99100  0   3 29 99100  0   179
MatSolve             252 1.0 1.7658e+00 1.1 1.71e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  2 36  0  0  0   2 36  0  0  0   386
MatLUFactorNum        50 1.0 2.3908e+00 1.0 7.31e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  3 15  0  0  0   3 15  0  0  0   122
MatILUFactorSym        1 1.0 2.5288e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      50 1.0 1.6280e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 
1.0e+02  0  0  0  0  3   0  0  0  0  3     0
MatAssemblyEnd        50 1.0 4.1831e-01 1.0 0.00e+00 0.0 1.2e+01 1.4e+03 
2.2e+02  0  0  1  0  7   0  0  1  0  7     0
MatGetRowIJ            1 1.0 4.0531e-06 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.6429e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp             100 1.0 4.1475e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              50 1.0 1.8577e+01 1.0 4.76e+08 1.0 1.2e+03 5.7e+03 
1.9e+03 22100 99100 63  22100 99100 63   102
VecDot               202 1.0 1.0362e+00 1.4 1.63e+07 1.0 0.0e+00 0.0e+00 
2.0e+02  1  3  0  0  7   1  3  0  0  7    63
VecDotNorm2          101 1.0 1.7485e+00 2.6 1.63e+07 1.0 0.0e+00 0.0e+00 
1.0e+02  1  3  0  0  3   1  3  0  0  3    37
VecNorm              151 1.0 1.6854e+00 1.1 1.22e+07 1.0 0.0e+00 0.0e+00 
1.5e+02  2  3  0  0  5   2  3  0  0  5    29
VecCopy              100 1.0 7.1418e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               403 1.0 1.7004e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPBYCZ           202 1.0 3.0207e-01 1.5 3.26e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  0  7  0  0  0   0  7  0  0  0   431
VecWAXPY             202 1.0 3.2482e-01 1.4 1.63e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  0  3  0  0  0   0  3  0  0  0   201
VecAssemblyBegin     100 1.0 3.3056e+00 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 
3.0e+02  2  0  0  0 10   2  0  0  0 10     0
VecAssemblyEnd       100 1.0 9.0289e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      202 1.0 3.1948e-02 2.6 0.00e+00 0.0 1.2e+03 5.7e+03 
0.0e+00  0  0 99100  0   0  0 99100  0     0
VecScatterEnd        202 1.0 9.4827e-01 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
PCSetUp              100 1.0 2.4949e+00 1.0 7.31e+07 1.0 0.0e+00 0.0e+00 
8.0e+00  3 15  0  0  0   3 15  0  0  0   117
PCSetUpOnBlocks       50 1.0 2.4723e+00 1.0 7.31e+07 1.0 0.0e+00 0.0e+00 
4.0e+00  3 15  0  0  0   3 15  0  0  0   118
PCApply              252 1.0 3.7255e+00 1.4 1.71e+08 1.0 0.0e+00 0.0e+00 
5.0e+02  4 36  0  0 16   4 36  0  0 16   183
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

               Matrix     4              4     16900896     0
        Krylov Solver     2              2         2168     0
               Vector    12             12      2604080     0
       Vector Scatter     1              1         1060     0
            Index Set     5              5       167904     0
       Preconditioner     2              2         1800     0
               Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 5.57899e-06
Average time for zero size MPI_Send(): 2.37226e-05
#PETSc Option Table entries:
-log_summary
-mg_ksp_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Thu May 31 10:24:12 2012
Configure options: --with-mpi-dir=/opt/openmpi-1.5.3/ 
--with-blas-lapack-dir=/opt/intelcpro-11.1.059/mkl/lib/em64t/ 
--with-debugging=1 --download-hypre=1 
--prefix=/home/wtay/Lib/petsc-3.2-dev_shared_debug --known-mpi-shared=1 
--with-shared-libraries
-----------------------------------------
Libraries compiled on Thu May 31 10:24:12 2012 on hpc12
Machine characteristics: 
Linux-2.6.32-220.2.1.el6.x86_64-x86_64-with-centos-6.2-Final
Using PETSc directory: /home/wtay/Codes/petsc-dev
Using PETSc arch: petsc-3.2-dev_shared_debug
-----------------------------------------

Using C compiler: /opt/openmpi-1.5.3/bin/mpicc  -fPIC -wd1572 
-Qoption,cpp,--extended_float_type -g  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/openmpi-1.5.3/bin/mpif90  -fPIC -g   
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: 
-I/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_debug/include 
-I/home/wtay/Codes/petsc-dev/include 
-I/home/wtay/Codes/petsc-dev/include 
-I/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_debug/include 
-I/opt/openmpi-1.5.3/include
-----------------------------------------

Using C linker: /opt/openmpi-1.5.3/bin/mpicc
Using Fortran linker: /opt/openmpi-1.5.3/bin/mpif90
Using libraries: 
-Wl,-rpath,/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_debug/lib 
-L/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_debug/lib -lpetsc 
-lX11 -lpthread 
-Wl,-rpath,/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_debug/lib 
-L/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_debug/lib -lHYPRE 
-lmpi_cxx -Wl,-rpath,/opt/openmpi-1.5.3/lib 
-Wl,-rpath,/opt/intelcpro-11.1.059/lib/intel64 
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lstdc++ 
-Wl,-rpath,/opt/intelcpro-11.1.059/mkl/lib/em64t 
-L/opt/intelcpro-11.1.059/mkl/lib/em64t -lmkl_intel_lp64 
-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -ldl 
-L/opt/openmpi-1.5.3/lib -lmpi -lnsl -lutil 
-L/opt/intelcpro-11.1.059/lib/intel64 -limf 
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lsvml -lipgo -ldecimal -lgcc_s 
-lirc -lpthread -lirc_s -lmpi_f90 -lmpi_f77 -lm -lm -lifport -lifcore 
-lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lnsl 
-lutil -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lpthread -lirc_s -ldl
-----------------------------------------

>     call KSPCreate(MPI_COMM_WORLD,ksp,ierr)
>
>     call KSPGetPC(ksp,pc,ierr)
>
>     call PCSetType(pc_uv,PCMG,ierr)
>
>     mg_lvl = 1 (or 2)
>
>     call PCMGSetLevels(pc,mg_lvl,PETSC_NULL_OBJECT,ierr)
>
>     call
>     DMDACreate2d(MPI_COMM_WORLD,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_STENCIL_STAR,size_x,size_y,1,num_procs,i1,i1,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,da,ierr)
>
>     ...
>
>     Btw, I tried to look at
>     http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex42.c.html
>     but I think there's some error in the page formatting.
>
>>
>>         However, I get the error:
>>
>>         Caught signal number 11 SEGV: Segmentation Violation,
>>         probably memory access out of range
>>
>>         after calling *PCMGSetLevels*
>>
>>         What's the problem? Is there any examples which I can follow?
>>
>>
>>     I believe the other examples that use this routine are in C or
>>     just tests (not tutorial-style) examples.
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120607/eced704f/attachment-0001.html>


More information about the petsc-users mailing list