understanding the output from -info

Matthew Knepley knepley at gmail.com
Fri Feb 9 19:15:49 CST 2007


1) These MFlop rates are terrible. It seems like your problem is way too
small.

2) The load balance is not good.

   Matt

On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
>
> Ya, that's the mistake. I changed part of the code resulting in
> PetscFinalize not being called.
>
> Here's the output:
>
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> /home/enduser/g0306332/ns2d/a.out on a linux-mpi named atlas00.nus.edu.sgwith 4 processors, by g0306332 Sat Feb 10 08:32:08 2007
> Using Petsc Release Version 2.3.2, Patch 8, Tue Jan  2 14:33:59 PST 2007
> HG revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80
>
>                          Max       Max/Min        Avg      Total
> Time (sec):           2.826e+02      2.08192   1.725e+02
> Objects:              1.110e+02      1.00000   1.110e+02
> Flops:                6.282e+08       1.00736   6.267e+08  2.507e+09
> Flops/sec:            4.624e+06      2.08008   4.015e+06  1.606e+07
> Memory:               1.411e+07      1.01142              5.610e+07
> MPI Messages:         8.287e+03      1.90156    6.322e+03  2.529e+04
> MPI Message Lengths:  6.707e+07      1.11755   1.005e+04  2.542e+08
> MPI Reductions:       3.112e+03      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 1.7247e+02 100.0%  2.5069e+09 100.0%  2.529e+04
> 100.0%  1.005e+04      100.0%  1.245e+04 100.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops/sec: Max - maximum over all processors
>                        Ratio - ratio of maximum to minimum over all
> processors
>    Mess: number of messages sent
>    Avg. len: average message length
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
>
>
>       ##########################################################
>       #                                                        #
>       #                          WARNING!!!                    #
>       #                                                        #
>       #   This code was compiled with a debugging option,      #
>       #   To get timing results run config/configure.py        #
>       #   using --with-debugging=no, the performance will      #
>       #   be generally two or three times faster.              #
>       #                                                        #
>       ##########################################################
>
>
>
>
>       ##########################################################
>
>
>
>      ##########################################################
>       #                                                        #
>       #                          WARNING!!!                    #
>       #                                                        #
>       #   This code was run without the PreLoadBegin()         #
>       #   macros. To get timing results we always recommend    #
>       #   preloading. otherwise timing numbers may be          #
>       #   meaningless.                                         #
>       ##########################################################
>
>
> Event                Count      Time (sec)
> Flops/sec                         --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult             3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04 1.3e+03
> 0.0e+00 12 18 93 12  0  12 18 93 12  0    19
> MatSolve            3967 1.0 2.5914e+00 1.9 7.99e+07 1.9 0.0e+00 0.0e+00
> 0.0e+00  1 17  0  0  0   1 17  0  0  0   168
> MatLUFactorNum        40 1.0 4.4779e-01 1.5 3.14e+07 1.5 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0    85
> MatILUFactorSym        2 1.0 3.1099e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatScale              20 1.0 1.1487e-01 8.7 8.73e+07 8.9 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0    39
> MatAssemblyBegin      40 1.0 7.8844e+00 1.3 0.00e+00 0.0 7.6e+02 2.8e+05
> 8.0e+01  4  0  3 83  1   4  0  3 83  1     0
> MatAssemblyEnd        40 1.0 6.9408e+00 1.2 0.00e+00 0.0 1.2e+01 9.6e+02
> 6.4e+01  4  0  0  0  1   4  0  0  0  1     0
> MatGetOrdering         2 1.0 8.0509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries        21 1.0 1.4379e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecMDot             3792 1.0 4.7372e+01 1.4 5.20e+06 1.4 0.0e+00 0.0e+00
> 3.8e+03 24 29  0  0 30  24 29  0  0 30    15
> VecNorm             3967 1.0 3.9513e+01 1.2 4.11e+05 1.2 0.0e+00 0.0e+00
> 4.0e+03 21  2  0  0 32  21  2  0  0 32     1
> VecScale            3947 1.0 3.4941e-02 1.2 2.18e+08 1.2 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0   738
> VecCopy              155 1.0 1.0029e-0125.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              4142 1.0 3.4638e-01 6.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY              290 1.0 5.9618e-03 1.2 2.14e+08 1.2 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   709
> VecMAXPY            3947 1.0 1.5566e+00 1.3 1.64e+08 1.3 0.0e+00 0.0e+00
> 0.0e+00  1 31  0  0  0   1 31  0  0  0   498
> VecAssemblyBegin      80 1.0 4.1793e+00 1.1 0.00e+00 0.0 9.6e+02 1.4e+04
> 2.4e+02  2  0  4  5  2   2  0  4  5  2     0
> VecAssemblyEnd        80 1.0 2.0682e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin     3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04 1.3e+03
> 0.0e+00  0  0 93 12  0   0  0 93 12  0     0
> VecScatterEnd       3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 11  0  0  0  0  11  0  0  0  0     0
> VecNormalize        3947 1.0 3.9593e+01 1.2 6.11e+05 1.2 0.0e+00 0.0e+00
> 3.9e+03 21  3  0  0 32  21  3  0  0 32     2
> KSPGMRESOrthog      3792 1.0 4.8670e+01 1.3 9.92e+06 1.3 0.0e+00 0.0e+00
> 3.8e+03 25 58  0  0 30  25 58  0  0 30    30
> KSPSetup              80 1.0 2.0014e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+01  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve              40 1.0 1.0660e+02 1.0 5.90e+06 1.0 2.4e+04 1.3e+03
> 1.2e+04 62100 93 12 97  62100 93 12 97    23
> PCSetUp               80 1.0 4.5669e-01 1.5 3.05e+07 1.5 0.0e+00 0.0e+00
> 1.4e+01  0  2  0  0  0   0  2  0  0  0    83
> PCSetUpOnBlocks       40 1.0 4.5418e-01 1.5 3.07e+07 1.5 0.0e+00 0.0e+00
> 1.0e+01  0  2  0  0  0   0  2  0  0  0    84
> PCApply             3967 1.0 4.1737e+00 2.0 5.30e+07 2.0 0.0e+00 0.0e+00
> 4.0e+03  2 17  0  0 32   2 17  0  0 32   104
> ------------------------------------------------------------------------------------------------------------------------
>
>
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions   Memory  Descendants' Mem.
>
> --- Event Stage 0: Main Stage
>
>               Matrix     8              8      21136     0
>            Index Set    12             12      74952     0
>                  Vec    81             81    1447476     0
>          Vec Scatter     2              2          0     0
>        Krylov Solver     4              4      33760     0
>       Preconditioner     4              4        392     0
> ========================================================================================================================
>
> Average time to get PetscTime(): 1.09673e-06
> Average time for MPI_Barrier(): 3.90053e-05
> Average time for zero size MPI_Send(): 1.65105e-05
> OptionTable: -log_summary
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
> sizeof(PetscScalar) 8
> Configure run at: Thu Jan 18 12:23:31 2007
> Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
> --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
> --with-mpi-dir=/opt/mpich/myrinet/intel/
> -----------------------------------------
> Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on atlas1.nus.edu.sg
> Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1 SMP
> Wed Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
> Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
> Using PETSc arch: linux-mpif90
> -----------------------------------------
> Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> -w90 -w
> -----------------------------------------
> Using include paths: -I/nas/lsftmp/g0306332/petsc- 2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
> 2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/include
> -I/opt/mpich/myrinet/intel/include
> ------------------------------------------
> Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> -w90 -w
> Using libraries: -Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
> -L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts
> -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
> -Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
> -L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
> -lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
> -Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
> -lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa
> -lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib -lPEPCF90
> -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90 -lm  -Wl,-rpath,\
> -Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
> -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl
>
>
> This is the result I get for running 20 steps. There are 2 matrix to be
> solved. I've only parallize the solving of linear equations and kept the
> rest of the code serial for this test. However, I found that it's much
> slower than the sequential version.
>
> From the ratio, it seems that MatScale and VecSet 's ratio are very high.
> I've done a scaling of 0.5 for momentum eqn. Is that the reason for the
> slowness? That is all I can decipher ....
>
> Thank you.
>
>
>
>
>
> On 2/10/07, Matthew Knepley <knepley at gmail.com> wrote:
> >
> > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > >
> > > ops.... it worked for ex2 and ex2f  ;-)
> > >
> > > so what could be wrong? is there some commands or subroutine which i
> > > must call? btw, i'm programming in fortran.
> > >
> >
> > Yes, you must call PetscFinalize() in your code.
> >
> >   Matt
> >
> >
> >  thank you.
> > >
> > >
> > > On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > > >
> > > > Problems do not go away by ignoring them. Something is wrong here,
> > > > and it may
> > > > affect the rest of your program. Please try to run an example:
> > > >
> > > >   cd src/ksp/ksp/examples/tutorials
> > > >   make ex2
> > > >   ./ex2 -log_summary
> > > >
> > > >      Matt
> > > >
> > > > On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> > > > >
> > > > > Well, I don't know what's wrong. I did the same thing for -info
> > > > > and it worked. Anyway, is there any other way?
> > > > >
> > > > > Like I can use -mat_view or call matview( ... ) to view a matrix.
> > > > > Is there a similar subroutine for me to call?
> > > > >
> > > > > Thank you.
> > > > >
> > > > >
> > > > >  On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > > > > >
> > > > > > Impossible, please check the spelling, and make sure your
> > > > > > command line was not truncated.
> > > > > >
> > > > > >   Matt
> > > > > >
> > > > > > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > >
> > > > > > > ya, i did use -log_summary. but no output.....
> > > > > > >
> > > > > > > On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > -log_summary
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, 9 Feb 2007, Ben Tay wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I've tried to use log_summary but nothing came out? Did I
> > > > > > > > miss out
> > > > > > > > > something? It worked when I used -info...
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com > wrote:
> > > > > > > > > >
> > > > > > > > > > On 2/8/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > > > > > > i'm trying to solve my cfd code using PETSc in
> > > > > > > > parallel. Besides the
> > > > > > > > > > linear
> > > > > > > > > > > eqns for PETSc, other parts of the code has also been
> > > > > > > > parallelized using
> > > > > > > > > > > MPI.
> > > > > > > > > >
> > > > > > > > > > Finite elements or finite differences, or what?
> > > > > > > > > >
> > > > > > > > > > > however i find that the parallel version of the code
> > > > > > > > running on 4
> > > > > > > > > > processors
> > > > > > > > > > > is even slower than the sequential version.
> > > > > > > > > >
> > > > > > > > > > Can you monitor the convergence and iteration count of
> > > > > > > > momentum and
> > > > > > > > > > poisson steps?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > in order to find out why, i've used the -info option
> > > > > > > > to print out the
> > > > > > > > > > > details. there are 2 linear equations being solved -
> > > > > > > > momentum and
> > > > > > > > > > poisson.
> > > > > > > > > > > the momentum one is twice the size of the poisson. it
> > > > > > > > is shown below:
> > > > > > > > > >
> > > > > > > > > > Can you use -log_summary command line option and send
> > > > > > > > the output attached?
> > > > > > > > > >
> > > > > > > > > > > i saw some statements stating "seq". am i running in
> > > > > > > > sequential or
> > > > > > > > > > parallel
> > > > > > > > > > > mode? have i preallocated too much space?
> > > > > > > > > >
> > > > > > > > > > It seems you are running in parallel. The "Seq" are
> > > > > > > > related to local,
> > > > > > > > > > internal objects. In PETSc, parallel matrices have inner
> > > > > > > > sequential
> > > > > > > > > > matrices.
> > > > > > > > > >
> > > > > > > > > > > lastly, if Ax=b, A_sta and A_end
> > > > > > > > from  MatGetOwnershipRange and b_sta
> > > > > > > > > > and
> > > > > > > > > > > b_end from VecGetOwnershipRange should always be the
> > > > > > > > same value, right?
> > > > > > > > > >
> > > > > > > > > > I should. If not, you are likely going to get an runtime
> > > > > > > > error.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Lisandro Dalcín
> > > > > > > > > > ---------------
> > > > > > > > > > Centro Internacional de Métodos Computacionales en
> > > > > > > > Ingeniería (CIMEC)
> > > > > > > > > > Instituto de Desarrollo Tecnológico para la Industria
> > > > > > > > Química (INTEC)
> > > > > > > > > > Consejo Nacional de Investigaciones Científicas y
> > > > > > > > Técnicas (CONICET)
> > > > > > > > > > PTLC - Güemes 3450, (3000) Santa Fe, Argentina
> > > > > > > > > > Tel/Fax: +54-(0)342-451.1594
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > One trouble is that despite this system, anyone who reads
> > > > > > journals widely
> > > > > > and critically is forced to realize that there are scarcely any
> > > > > > bars to eventual
> > > > > > publication. There seems to be no study too fragmented, no
> > > > > > hypothesis too
> > > > > > trivial, no literature citation too biased or too egotistical,
> > > > > > no design too
> > > > > > warped, no methodology too bungled, no presentation of results
> > > > > > too
> > > > > > inaccurate, too obscure, and too contradictory, no analysis too
> > > > > > self-serving,
> > > > > > no argument too circular, no conclusions too trifling or too
> > > > > > unjustified, and
> > > > > > no grammar and syntax too offensive for a paper to end up in
> > > > > > print. -- Drummond Rennie
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > One trouble is that despite this system, anyone who reads journals
> > > > widely
> > > > and critically is forced to realize that there are scarcely any bars
> > > > to eventual
> > > > publication. There seems to be no study too fragmented, no
> > > > hypothesis too
> > > > trivial, no literature citation too biased or too egotistical, no
> > > > design too
> > > > warped, no methodology too bungled, no presentation of results too
> > > > inaccurate, too obscure, and too contradictory, no analysis too
> > > > self-serving,
> > > > no argument too circular, no conclusions too trifling or too
> > > > unjustified, and
> > > > no grammar and syntax too offensive for a paper to end up in print.
> > > > -- Drummond Rennie
> > > >
> > >
> > >
> >
> >
> > --
> > One trouble is that despite this system, anyone who reads journals
> > widely
> > and critically is forced to realize that there are scarcely any bars to
> > eventual
> > publication. There seems to be no study too fragmented, no hypothesis
> > too
> > trivial, no literature citation too biased or too egotistical, no design
> > too
> > warped, no methodology too bungled, no presentation of results too
> > inaccurate, too obscure, and too contradictory, no analysis too
> > self-serving,
> > no argument too circular, no conclusions too trifling or too
> > unjustified, and
> > no grammar and syntax too offensive for a paper to end up in print. --
> > Drummond Rennie
> >
>
>


-- 
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to
eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too
self-serving,
no argument too circular, no conclusions too trifling or too unjustified,
and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/e7204f2f/attachment.htm>


More information about the petsc-users mailing list