[petsc-users] Enquiry regarding log summary results
TAY wee-beng
zonexo at gmail.com
Tue Oct 2 07:35:55 CDT 2012
Hi,
I have combined the momentum linear eqns involving x,y,z into 1 large
matrix. The Poisson eqn is solved using HYPRE strcut format so it's not
included. I run the code for 50 timesteps (hence 50 kspsolve) using 96
procs. The log_summary is given below. I have some questions:
1. After combining the matrix, I should have only 1 PETSc matrix. Why
does it says there are 4 matrix, 12 vector etc?
2. I'm looking at the stages which take the longest time. It seems that
MatAssemblyBegin, VecNorm, VecAssemblyBegin, VecScatterEnd have very
high ratios. The ratios of some others are also not too good (~ 1.6 -
2). So are these stages the reason why my code is not scaling well? What
can I do to improve it?
Btw, I insert matrix using:
/do ijk=ijk_xyz_sta+1,ijk_xyz_end//
//
// II = ijk - 1// !Fortran shift to 0-based//
////
// call
MatSetValues(A_semi_xyz,1,II,7,int_semi_xyz(ijk,1:7),semi_mat_xyz(ijk,1:7),INSERT_VALUES,ierr)//
//
//end do/
where ijk_xyz_sta/ijk_xyz_end are the starting/end index
int_semi_xyz(ijk,1:7) stores the 7 column global indices
semi_mat_xyz has the corresponding values.
and I insert vectors using:
call
VecSetValues(b_rhs_semi_xyz,ijk_xyz_end_mz-ijk_xyz_sta_mz,(/ijk_xyz_sta_mz:ijk_xyz_end_mz-1/),q_semi_vect_xyz(ijk_xyz_sta_mz+1:ijk_xyz_end_mz),INSERT_VALUES,ierr)
Thanks!
/
/
Yours sincerely,
TAY wee-beng
On 30/9/2012 11:30 PM, Jed Brown wrote:
>
> You can measure the time spent in Hypre via PCApply and PCSetUp, but
> you can't get finer grained integrated profiling because it was not
> set up that way.
>
> On Sep 30, 2012 3:26 PM, "TAY wee-beng" <zonexo at gmail.com
> <mailto:zonexo at gmail.com>> wrote:
>
> On 27/9/2012 1:44 PM, Matthew Knepley wrote:
>> On Thu, Sep 27, 2012 at 3:49 AM, TAY wee-beng <zonexo at gmail.com
>> <mailto:zonexo at gmail.com>> wrote:
>>
>> Hi,
>>
>> I'm doing a log summary for my 3d cfd code. I have some
>> questions:
>>
>> 1. if I'm solving 3 linear equations using ksp, is the result
>> given in the log summary the total of the 3 linear eqns'
>> performance? How can I get the performance for each
>> individual eqn?
>>
>>
>> Use logging stages:
>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogStagePush.html
>>
>> 2. If I run my code for 10 time steps, does the log summary
>> gives the total or avg performance/ratio?
>>
>>
>> Total.
>>
>> 3. Besides PETSc, I'm also using HYPRE's native geometric MG
>> (Struct) to solve my Cartesian's grid CFD poisson eqn. Is
>> there any way I can use PETSc's log summary to get HYPRE's
>> performance? If I use boomerAMG thru PETSc, can I get its
>> performance?
>>
>>
>> If you mean flops, only if you count them yourself and tell PETSc
>> using
>> http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/Profiling/PetscLogFlops.html
>>
>> This is the disadvantage of using packages that do not properly
>> monitor things :)
>>
>> Matt
> So u mean if I use boomerAMG thru PETSc, there is no proper way of
> evaluating its performance, beside using PetscLogFlops?
>>
>>
>> --
>> Yours sincerely,
>>
>> TAY wee-beng
>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to
>> which their experiments lead.
>> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20121002/0cf56206/attachment-0001.html>
-------------- next part --------------
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./a.out on a petsc-3.3-dev_shared_rel named n12-10 with 96 processors, by wtay Tue Oct 2 14:26:10 2012
Using Petsc Development HG revision: 9883b54053eca13dd473a4711adfd309d1436b6e HG Date: Sun Sep 30 22:42:36 2012 -0500
Max Max/Min Avg Total
Time (sec): 2.035e+03 1.00999 2.026e+03
Objects: 2.700e+01 1.00000 2.700e+01
Flops: 1.096e+10 1.15860 1.093e+10 1.049e+12
Flops/sec: 5.437e+06 1.16981 5.397e+06 5.181e+08
MPI Messages: 2.000e+02 2.00000 1.979e+02 1.900e+04
MPI Message Lengths: 2.598e+08 2.00000 1.299e+06 2.468e+10
MPI Reductions: 7.590e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.0257e+03 100.0% 1.0495e+12 100.0% 1.900e+04 100.0% 1.299e+06 100.0% 7.580e+02 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 98 1.0 2.1910e+01 1.8 2.51e+09 1.2 1.9e+04 1.3e+06 0.0e+00 1 23 98 99 0 1 23 98 99 0 10990
MatSolve 147 1.0 2.0458e+01 1.6 3.68e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 34 0 0 0 1 34 0 0 0 17205
MatLUFactorNum 49 1.0 2.6876e+01 1.4 2.03e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 19 0 0 0 1 19 0 0 0 7245
MatILUFactorSym 1 1.0 5.6867e-01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 49 1.0 5.1938e+0144.9 0.00e+00 0.0 0.0e+00 0.0e+00 9.8e+01 1 0 0 0 13 1 0 0 0 13 0
MatAssemblyEnd 49 1.0 1.3391e+01 2.1 0.00e+00 0.0 3.8e+02 3.3e+05 8.0e+00 0 0 2 1 1 0 0 2 1 1 0
MatGetRowIJ 1 1.0 1.2875e-05 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 8.2746e-02 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecDot 98 1.0 7.3859e+00 6.4 3.91e+08 1.1 0.0e+00 0.0e+00 9.8e+01 0 4 0 0 13 0 4 0 0 13 5067
VecDotNorm2 49 1.0 7.2898e+00 4.8 7.81e+08 1.1 0.0e+00 0.0e+00 1.5e+02 0 7 0 0 19 0 7 0 0 19 10269
VecNorm 98 1.0 1.2824e+0115.4 3.91e+08 1.1 0.0e+00 0.0e+00 9.8e+01 0 4 0 0 13 0 4 0 0 13 2918
VecCopy 98 1.0 1.8141e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 295 1.0 3.0051e+00 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPBYCZ 98 1.0 4.0675e+00 1.7 7.81e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 18403
VecWAXPY 98 1.0 3.3849e+00 1.5 3.91e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 11057
VecAssemblyBegin 98 1.0 2.2710e+01167.5 0.00e+00 0.0 0.0e+00 0.0e+00 2.9e+02 1 0 0 0 39 1 0 0 0 39 0
VecAssemblyEnd 98 1.0 5.5361e-04 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 98 1.0 4.5025e-01 3.0 0.00e+00 0.0 1.9e+04 1.3e+06 0.0e+00 0 0 98 99 0 0 0 98 99 0 0
VecScatterEnd 98 1.0 8.9157e+0019.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 98 1.0 2.0951e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 49 1.0 8.0828e+01 1.0 1.10e+10 1.2 1.9e+04 1.3e+06 3.5e+02 4100 98 99 46 4100 98 99 46 12984
PCSetUp 98 1.0 2.7532e+01 1.4 2.03e+09 1.2 0.0e+00 0.0e+00 5.0e+00 1 19 0 0 1 1 19 0 0 1 7072
PCSetUpOnBlocks 49 1.0 2.7531e+01 1.4 2.03e+09 1.2 0.0e+00 0.0e+00 3.0e+00 1 19 0 0 0 1 19 0 0 0 7072
PCApply 147 1.0 2.1493e+01 1.6 3.68e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 34 0 0 0 1 34 0 0 0 16376
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 4 4 593048052 0
Vector 12 12 128863184 0
Vector Scatter 1 1 1060 0
Krylov Solver 2 2 2168 0
Index Set 5 5 8633616 0
Preconditioner 2 2 1800 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 0.000372219
Average time for zero size MPI_Send(): 2.28832e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Mon Oct 1 11:36:09 2012
Configure options: --with-mpi-dir=/opt/openmpi-1.5.3/ --with-blas-lapack-dir=/opt/intelcpro-11.1.059/mkl/lib/em64t/ --with-debugging=0 --download-hypre=1 --prefix=/home/wtay/Lib/petsc-3.3-dev_shared_rel --known-mpi-shared=1 --with-shared-libraries
More information about the petsc-users
mailing list