understanding the output from -info
Matthew Knepley
knepley at gmail.com
Fri Feb 9 19:15:49 CST 2007
1) These MFlop rates are terrible. It seems like your problem is way too
small.
2) The load balance is not good.
Matt
On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
>
> Ya, that's the mistake. I changed part of the code resulting in
> PetscFinalize not being called.
>
> Here's the output:
>
>
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
>
> /home/enduser/g0306332/ns2d/a.out on a linux-mpi named atlas00.nus.edu.sgwith 4 processors, by g0306332 Sat Feb 10 08:32:08 2007
> Using Petsc Release Version 2.3.2, Patch 8, Tue Jan 2 14:33:59 PST 2007
> HG revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80
>
> Max Max/Min Avg Total
> Time (sec): 2.826e+02 2.08192 1.725e+02
> Objects: 1.110e+02 1.00000 1.110e+02
> Flops: 6.282e+08 1.00736 6.267e+08 2.507e+09
> Flops/sec: 4.624e+06 2.08008 4.015e+06 1.606e+07
> Memory: 1.411e+07 1.01142 5.610e+07
> MPI Messages: 8.287e+03 1.90156 6.322e+03 2.529e+04
> MPI Message Lengths: 6.707e+07 1.11755 1.005e+04 2.542e+08
> MPI Reductions: 3.112e+03 1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total counts
> %Total Avg %Total counts %Total
> 0: Main Stage: 1.7247e+02 100.0% 2.5069e+09 100.0% 2.529e+04
> 100.0% 1.005e+04 100.0% 1.245e+04 100.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flops/sec: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all
> processors
> Mess: number of messages sent
> Avg. len: average message length
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flops in this
> phase
> %M - percent messages in this phase %L - percent message lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
>
>
> ##########################################################
> # #
> # WARNING!!! #
> # #
> # This code was compiled with a debugging option, #
> # To get timing results run config/configure.py #
> # using --with-debugging=no, the performance will #
> # be generally two or three times faster. #
> # #
> ##########################################################
>
>
>
>
> ##########################################################
>
>
>
> ##########################################################
> # #
> # WARNING!!! #
> # #
> # This code was run without the PreLoadBegin() #
> # macros. To get timing results we always recommend #
> # preloading. otherwise timing numbers may be #
> # meaningless. #
> ##########################################################
>
>
> Event Count Time (sec)
> Flops/sec --- Global --- --- Stage --- Total
> Max Ratio Max Ratio Max Ratio Mess Avg len
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> MatMult 3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04 1.3e+03
> 0.0e+00 12 18 93 12 0 12 18 93 12 0 19
> MatSolve 3967 1.0 2.5914e+00 1.9 7.99e+07 1.9 0.0e+00 0.0e+00
> 0.0e+00 1 17 0 0 0 1 17 0 0 0 168
> MatLUFactorNum 40 1.0 4.4779e-01 1.5 3.14e+07 1.5 0.0e+00 0.0e+00
> 0.0e+00 0 2 0 0 0 0 2 0 0 0 85
> MatILUFactorSym 2 1.0 3.1099e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatScale 20 1.0 1.1487e-01 8.7 8.73e+07 8.9 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 39
> MatAssemblyBegin 40 1.0 7.8844e+00 1.3 0.00e+00 0.0 7.6e+02 2.8e+05
> 8.0e+01 4 0 3 83 1 4 0 3 83 1 0
> MatAssemblyEnd 40 1.0 6.9408e+00 1.2 0.00e+00 0.0 1.2e+01 9.6e+02
> 6.4e+01 4 0 0 0 1 4 0 0 0 1 0
> MatGetOrdering 2 1.0 8.0509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
> MatZeroEntries 21 1.0 1.4379e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecMDot 3792 1.0 4.7372e+01 1.4 5.20e+06 1.4 0.0e+00 0.0e+00
> 3.8e+03 24 29 0 0 30 24 29 0 0 30 15
> VecNorm 3967 1.0 3.9513e+01 1.2 4.11e+05 1.2 0.0e+00 0.0e+00
> 4.0e+03 21 2 0 0 32 21 2 0 0 32 1
> VecScale 3947 1.0 3.4941e-02 1.2 2.18e+08 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 1 0 0 0 0 1 0 0 0 738
> VecCopy 155 1.0 1.0029e-0125.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecSet 4142 1.0 3.4638e-01 6.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecAXPY 290 1.0 5.9618e-03 1.2 2.14e+08 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 709
> VecMAXPY 3947 1.0 1.5566e+00 1.3 1.64e+08 1.3 0.0e+00 0.0e+00
> 0.0e+00 1 31 0 0 0 1 31 0 0 0 498
> VecAssemblyBegin 80 1.0 4.1793e+00 1.1 0.00e+00 0.0 9.6e+02 1.4e+04
> 2.4e+02 2 0 4 5 2 2 0 4 5 2 0
> VecAssemblyEnd 80 1.0 2.0682e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> VecScatterBegin 3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04 1.3e+03
> 0.0e+00 0 0 93 12 0 0 0 93 12 0 0
> VecScatterEnd 3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 11 0 0 0 0 11 0 0 0 0 0
> VecNormalize 3947 1.0 3.9593e+01 1.2 6.11e+05 1.2 0.0e+00 0.0e+00
> 3.9e+03 21 3 0 0 32 21 3 0 0 32 2
> KSPGMRESOrthog 3792 1.0 4.8670e+01 1.3 9.92e+06 1.3 0.0e+00 0.0e+00
> 3.8e+03 25 58 0 0 30 25 58 0 0 30 30
> KSPSetup 80 1.0 2.0014e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+01 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 40 1.0 1.0660e+02 1.0 5.90e+06 1.0 2.4e+04 1.3e+03
> 1.2e+04 62100 93 12 97 62100 93 12 97 23
> PCSetUp 80 1.0 4.5669e-01 1.5 3.05e+07 1.5 0.0e+00 0.0e+00
> 1.4e+01 0 2 0 0 0 0 2 0 0 0 83
> PCSetUpOnBlocks 40 1.0 4.5418e-01 1.5 3.07e+07 1.5 0.0e+00 0.0e+00
> 1.0e+01 0 2 0 0 0 0 2 0 0 0 84
> PCApply 3967 1.0 4.1737e+00 2.0 5.30e+07 2.0 0.0e+00 0.0e+00
> 4.0e+03 2 17 0 0 32 2 17 0 0 32 104
> ------------------------------------------------------------------------------------------------------------------------
>
>
>
> Memory usage is given in bytes:
>
> Object Type Creations Destructions Memory Descendants' Mem.
>
> --- Event Stage 0: Main Stage
>
> Matrix 8 8 21136 0
> Index Set 12 12 74952 0
> Vec 81 81 1447476 0
> Vec Scatter 2 2 0 0
> Krylov Solver 4 4 33760 0
> Preconditioner 4 4 392 0
> ========================================================================================================================
>
> Average time to get PetscTime(): 1.09673e-06
> Average time for MPI_Barrier(): 3.90053e-05
> Average time for zero size MPI_Send(): 1.65105e-05
> OptionTable: -log_summary
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
> sizeof(PetscScalar) 8
> Configure run at: Thu Jan 18 12:23:31 2007
> Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
> --with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
> --with-mpi-dir=/opt/mpich/myrinet/intel/
> -----------------------------------------
> Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on atlas1.nus.edu.sg
> Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1 SMP
> Wed Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
> Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
> Using PETSc arch: linux-mpif90
> -----------------------------------------
> Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> -w90 -w
> -----------------------------------------
> Using include paths: -I/nas/lsftmp/g0306332/petsc- 2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
> 2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/include
> -I/opt/mpich/myrinet/intel/include
> ------------------------------------------
> Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
> Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
> -w90 -w
> Using libraries: -Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
> -L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts
> -lpetscsnes -lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
> -Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
> -L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
> -lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
> -Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
> -lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa
> -lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib -lPEPCF90
> -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90 -lm -Wl,-rpath,\
> -Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
> -Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
> -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl
>
>
> This is the result I get for running 20 steps. There are 2 matrix to be
> solved. I've only parallize the solving of linear equations and kept the
> rest of the code serial for this test. However, I found that it's much
> slower than the sequential version.
>
> From the ratio, it seems that MatScale and VecSet 's ratio are very high.
> I've done a scaling of 0.5 for momentum eqn. Is that the reason for the
> slowness? That is all I can decipher ....
>
> Thank you.
>
>
>
>
>
> On 2/10/07, Matthew Knepley <knepley at gmail.com> wrote:
> >
> > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > >
> > > ops.... it worked for ex2 and ex2f ;-)
> > >
> > > so what could be wrong? is there some commands or subroutine which i
> > > must call? btw, i'm programming in fortran.
> > >
> >
> > Yes, you must call PetscFinalize() in your code.
> >
> > Matt
> >
> >
> > thank you.
> > >
> > >
> > > On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > > >
> > > > Problems do not go away by ignoring them. Something is wrong here,
> > > > and it may
> > > > affect the rest of your program. Please try to run an example:
> > > >
> > > > cd src/ksp/ksp/examples/tutorials
> > > > make ex2
> > > > ./ex2 -log_summary
> > > >
> > > > Matt
> > > >
> > > > On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> > > > >
> > > > > Well, I don't know what's wrong. I did the same thing for -info
> > > > > and it worked. Anyway, is there any other way?
> > > > >
> > > > > Like I can use -mat_view or call matview( ... ) to view a matrix.
> > > > > Is there a similar subroutine for me to call?
> > > > >
> > > > > Thank you.
> > > > >
> > > > >
> > > > > On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > > > > >
> > > > > > Impossible, please check the spelling, and make sure your
> > > > > > command line was not truncated.
> > > > > >
> > > > > > Matt
> > > > > >
> > > > > > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > >
> > > > > > > ya, i did use -log_summary. but no output.....
> > > > > > >
> > > > > > > On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > -log_summary
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, 9 Feb 2007, Ben Tay wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I've tried to use log_summary but nothing came out? Did I
> > > > > > > > miss out
> > > > > > > > > something? It worked when I used -info...
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com > wrote:
> > > > > > > > > >
> > > > > > > > > > On 2/8/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > > > > > > i'm trying to solve my cfd code using PETSc in
> > > > > > > > parallel. Besides the
> > > > > > > > > > linear
> > > > > > > > > > > eqns for PETSc, other parts of the code has also been
> > > > > > > > parallelized using
> > > > > > > > > > > MPI.
> > > > > > > > > >
> > > > > > > > > > Finite elements or finite differences, or what?
> > > > > > > > > >
> > > > > > > > > > > however i find that the parallel version of the code
> > > > > > > > running on 4
> > > > > > > > > > processors
> > > > > > > > > > > is even slower than the sequential version.
> > > > > > > > > >
> > > > > > > > > > Can you monitor the convergence and iteration count of
> > > > > > > > momentum and
> > > > > > > > > > poisson steps?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > in order to find out why, i've used the -info option
> > > > > > > > to print out the
> > > > > > > > > > > details. there are 2 linear equations being solved -
> > > > > > > > momentum and
> > > > > > > > > > poisson.
> > > > > > > > > > > the momentum one is twice the size of the poisson. it
> > > > > > > > is shown below:
> > > > > > > > > >
> > > > > > > > > > Can you use -log_summary command line option and send
> > > > > > > > the output attached?
> > > > > > > > > >
> > > > > > > > > > > i saw some statements stating "seq". am i running in
> > > > > > > > sequential or
> > > > > > > > > > parallel
> > > > > > > > > > > mode? have i preallocated too much space?
> > > > > > > > > >
> > > > > > > > > > It seems you are running in parallel. The "Seq" are
> > > > > > > > related to local,
> > > > > > > > > > internal objects. In PETSc, parallel matrices have inner
> > > > > > > > sequential
> > > > > > > > > > matrices.
> > > > > > > > > >
> > > > > > > > > > > lastly, if Ax=b, A_sta and A_end
> > > > > > > > from MatGetOwnershipRange and b_sta
> > > > > > > > > > and
> > > > > > > > > > > b_end from VecGetOwnershipRange should always be the
> > > > > > > > same value, right?
> > > > > > > > > >
> > > > > > > > > > I should. If not, you are likely going to get an runtime
> > > > > > > > error.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Lisandro Dalcín
> > > > > > > > > > ---------------
> > > > > > > > > > Centro Internacional de Métodos Computacionales en
> > > > > > > > Ingeniería (CIMEC)
> > > > > > > > > > Instituto de Desarrollo Tecnológico para la Industria
> > > > > > > > Química (INTEC)
> > > > > > > > > > Consejo Nacional de Investigaciones Científicas y
> > > > > > > > Técnicas (CONICET)
> > > > > > > > > > PTLC - Güemes 3450, (3000) Santa Fe, Argentina
> > > > > > > > > > Tel/Fax: +54-(0)342-451.1594
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > One trouble is that despite this system, anyone who reads
> > > > > > journals widely
> > > > > > and critically is forced to realize that there are scarcely any
> > > > > > bars to eventual
> > > > > > publication. There seems to be no study too fragmented, no
> > > > > > hypothesis too
> > > > > > trivial, no literature citation too biased or too egotistical,
> > > > > > no design too
> > > > > > warped, no methodology too bungled, no presentation of results
> > > > > > too
> > > > > > inaccurate, too obscure, and too contradictory, no analysis too
> > > > > > self-serving,
> > > > > > no argument too circular, no conclusions too trifling or too
> > > > > > unjustified, and
> > > > > > no grammar and syntax too offensive for a paper to end up in
> > > > > > print. -- Drummond Rennie
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > One trouble is that despite this system, anyone who reads journals
> > > > widely
> > > > and critically is forced to realize that there are scarcely any bars
> > > > to eventual
> > > > publication. There seems to be no study too fragmented, no
> > > > hypothesis too
> > > > trivial, no literature citation too biased or too egotistical, no
> > > > design too
> > > > warped, no methodology too bungled, no presentation of results too
> > > > inaccurate, too obscure, and too contradictory, no analysis too
> > > > self-serving,
> > > > no argument too circular, no conclusions too trifling or too
> > > > unjustified, and
> > > > no grammar and syntax too offensive for a paper to end up in print.
> > > > -- Drummond Rennie
> > > >
> > >
> > >
> >
> >
> > --
> > One trouble is that despite this system, anyone who reads journals
> > widely
> > and critically is forced to realize that there are scarcely any bars to
> > eventual
> > publication. There seems to be no study too fragmented, no hypothesis
> > too
> > trivial, no literature citation too biased or too egotistical, no design
> > too
> > warped, no methodology too bungled, no presentation of results too
> > inaccurate, too obscure, and too contradictory, no analysis too
> > self-serving,
> > no argument too circular, no conclusions too trifling or too
> > unjustified, and
> > no grammar and syntax too offensive for a paper to end up in print. --
> > Drummond Rennie
> >
>
>
--
One trouble is that despite this system, anyone who reads journals widely
and critically is forced to realize that there are scarcely any bars to
eventual
publication. There seems to be no study too fragmented, no hypothesis too
trivial, no literature citation too biased or too egotistical, no design too
warped, no methodology too bungled, no presentation of results too
inaccurate, too obscure, and too contradictory, no analysis too
self-serving,
no argument too circular, no conclusions too trifling or too unjustified,
and
no grammar and syntax too offensive for a paper to end up in print. --
Drummond Rennie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070209/e7204f2f/attachment.htm>
More information about the petsc-users
mailing list