[petsc-users] Enquiry regarding log summary results

TAY wee-beng zonexo at gmail.com
Tue Oct 2 07:35:55 CDT 2012


Hi,

I have combined the momentum linear eqns involving x,y,z into 1 large 
matrix. The Poisson eqn is solved using HYPRE strcut format so it's not 
included. I run the code for 50 timesteps (hence 50 kspsolve) using 96 
procs. The log_summary is given below. I have some questions:

1. After combining the matrix, I should have only 1 PETSc matrix. Why 
does it says there are 4 matrix, 12 vector etc?

2. I'm looking at the stages which take the longest time. It seems that 
MatAssemblyBegin, VecNorm, VecAssemblyBegin, VecScatterEnd have very 
high ratios. The ratios of some others are also not too good (~ 1.6 - 
2). So are these stages the reason why my code is not scaling well? What 
can I do to improve it?

Btw, I insert matrix using:

/do ijk=ijk_xyz_sta+1,ijk_xyz_end//
//
//    II = ijk - 1//    !Fortran shift to 0-based//
////
//    call 
MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)//
//
//end do/

where ijk_xyz_sta/ijk_xyz_end are the starting/end index

int_semi_xyz(ijk,1:7) stores the 7 column global indices

semi_mat_xyz has the corresponding values.

and I insert vectors using:

call 
VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)

Thanks!

/
/

Yours sincerely,

TAY wee-beng

On 30/9/2012 11:30 PM, Jed Brown wrote:
>
> You can measure the time spent in Hypre via PCApply and PCSetUp, but 
> you can't get finer grained integrated profiling because it was not 
> set up that way.
>
> On Sep 30, 2012 3:26 PM, "TAY wee-beng" <zonexo at gmail.com 
> <mailto:zonexo at gmail.com>> wrote:
>
>     On 27/9/2012 1:44 PM, Matthew Knepley wrote:
>>     On Thu, Sep 27, 2012 at 3:49 AM, TAY wee-beng <zonexo at gmail.com
>>     <mailto:zonexo at gmail.com>> wrote:
>>
>>         Hi,
>>
>>         I'm doing a log summary for my 3d cfd code. I have some
>>         questions:
>>
>>         1. if I'm solving 3 linear equations using ksp, is the result
>>         given in the log summary the total of the 3 linear eqns'
>>         performance? How can I get the performance for each
>>         individual eqn?
>>
>>
>>     Use logging stages:
>>     http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html
>>
>>         2. If I run my code for 10 time steps, does the log summary
>>         gives the total or avg performance/ratio?
>>
>>
>>     Total.
>>
>>         3. Besides PETSc, I'm also using HYPRE's native geometric MG
>>         (Struct) to solve my Cartesian's grid CFD poisson eqn. Is
>>         there any way I can use PETSc's log summary to get HYPRE's
>>         performance? If I use boomerAMG thru PETSc, can I get its
>>         performance?
>>
>>
>>     If you mean flops, only if you count them yourself and tell PETSc
>>     using
>>     http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html
>>
>>     This is the disadvantage of using packages that do not properly
>>     monitor things :)
>>
>>         Matt
>     So u mean if I use boomerAMG thru PETSc, there is no proper way of
>     evaluating its performance, beside using PetscLogFlops?
>>
>>
>>         -- 
>>         Yours sincerely,
>>
>>         TAY wee-beng
>>
>>
>>
>>
>>     -- 
>>     What most experimenters take for granted before they begin their
>>     experiments is infinitely more interesting than any results to
>>     which their experiments lead.
>>     -- Norbert Wiener
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121002/0cf56206/attachment-0001.html>
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./a.out on a petsc-3.3-dev_shared_rel named n12-10 with 96 processors, by wtay Tue Oct  2 14:26:10 2012
Using Petsc Development HG revision: 9883b54053eca13dd473a4711adfd309d1436b6e  HG Date: Sun Sep 30 22:42:36 2012 -0500

                         Max       Max/Min        Avg      Total
Time (sec):           2.035e+03      1.00999   2.026e+03
Objects:              2.700e+01      1.00000   2.700e+01
Flops:                1.096e+10      1.15860   1.093e+10  1.049e+12
Flops/sec:            5.437e+06      1.16981   5.397e+06  5.181e+08
MPI Messages:         2.000e+02      2.00000   1.979e+02  1.900e+04
MPI Message Lengths:  2.598e+08      2.00000   1.299e+06  2.468e+10
MPI Reductions:       7.590e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
 0:      Main Stage: 2.0257e+03 100.0%  1.0495e+12 100.0%  1.900e+04 100.0%  1.299e+06      100.0%  7.580e+02  99.9%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult               98 1.0 2.1910e+01 1.8 2.51e+09 1.2 1.9e+04 1.3e+06 0.0e+00  1 23 98 99  0   1 23 98 99  0 10990
MatSolve             147 1.0 2.0458e+01 1.6 3.68e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1 34  0  0  0   1 34  0  0  0 17205
MatLUFactorNum        49 1.0 2.6876e+01 1.4 2.03e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1 19  0  0  0   1 19  0  0  0  7245
MatILUFactorSym        1 1.0 5.6867e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      49 1.0 5.1938e+0144.9 0.00e+00 0.0 0.0e+00 0.0e+00 9.8e+01  1  0  0  0 13   1  0  0  0 13     0
MatAssemblyEnd        49 1.0 1.3391e+01 2.1 0.00e+00 0.0 3.8e+02 3.3e+05 8.0e+00  0  0  2  1  1   0  0  2  1  1     0
MatGetRowIJ            1 1.0 1.2875e-05 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 8.2746e-02 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecDot                98 1.0 7.3859e+00 6.4 3.91e+08 1.1 0.0e+00 0.0e+00 9.8e+01  0  4  0  0 13   0  4  0  0 13  5067
VecDotNorm2           49 1.0 7.2898e+00 4.8 7.81e+08 1.1 0.0e+00 0.0e+00 1.5e+02  0  7  0  0 19   0  7  0  0 19 10269
VecNorm               98 1.0 1.2824e+0115.4 3.91e+08 1.1 0.0e+00 0.0e+00 9.8e+01  0  4  0  0 13   0  4  0  0 13  2918
VecCopy               98 1.0 1.8141e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               295 1.0 3.0051e+00 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPBYCZ            98 1.0 4.0675e+00 1.7 7.81e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  7  0  0  0   0  7  0  0  0 18403
VecWAXPY              98 1.0 3.3849e+00 1.5 3.91e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0 11057
VecAssemblyBegin      98 1.0 2.2710e+01167.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.9e+02  1  0  0  0 39   1  0  0  0 39     0
VecAssemblyEnd        98 1.0 5.5361e-04 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin       98 1.0 4.5025e-01 3.0 0.00e+00 0.0 1.9e+04 1.3e+06 0.0e+00  0  0 98 99  0   0  0 98 99  0     0
VecScatterEnd         98 1.0 8.9157e+0019.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp              98 1.0 2.0951e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              49 1.0 8.0828e+01 1.0 1.10e+10 1.2 1.9e+04 1.3e+06 3.5e+02  4100 98 99 46   4100 98 99 46 12984
PCSetUp               98 1.0 2.7532e+01 1.4 2.03e+09 1.2 0.0e+00 0.0e+00 5.0e+00  1 19  0  0  1   1 19  0  0  1  7072
PCSetUpOnBlocks       49 1.0 2.7531e+01 1.4 2.03e+09 1.2 0.0e+00 0.0e+00 3.0e+00  1 19  0  0  0   1 19  0  0  0  7072
PCApply              147 1.0 2.1493e+01 1.6 3.68e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1 34  0  0  0   1 34  0  0  0 16376
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     4              4    593048052     0
              Vector    12             12    128863184     0
      Vector Scatter     1              1         1060     0
       Krylov Solver     2              2         2168     0
           Index Set     5              5      8633616     0
      Preconditioner     2              2         1800     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 0.000372219
Average time for zero size MPI_Send(): 2.28832e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Mon Oct  1 11:36:09 2012
Configure options: --with-mpi-dir=/opt/openmpi-1.5.3/ --with-blas-lapack-dir=/opt/intelcpro-11.1.059/mkl/lib/em64t/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.3-dev_shared_rel --known-mpi-shared=1 --with-shared-libraries


More information about the petsc-users mailing list