understanding the output from -info

Fri Feb 9 18:51:43 CST 2007

Ya, that's the mistake. I changed part of the code resulting in
PetscFinalize not being called.

Here's the output:

---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

/home/enduser/g0306332/ns2d/a.out on a linux-mpi named
atlas00.nus.edu.sgwith 4 processors, by g0306332 Sat Feb 10 08:32:08
2007
Using Petsc Release Version 2.3.2, Patch 8, Tue Jan  2 14:33:59 PST 2007 HG
revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80

                         Max       Max/Min        Avg      Total
Time (sec):           2.826e+02      2.08192   1.725e+02
Objects:              1.110e+02      1.00000   1.110e+02
Flops:                6.282e+08      1.00736   6.267e+08  2.507e+09
Flops/sec:            4.624e+06      2.08008   4.015e+06  1.606e+07
Memory:               1.411e+07      1.01142              5.610e+07
MPI Messages:         8.287e+03      1.90156   6.322e+03  2.529e+04
MPI Message Lengths:  6.707e+07      1.11755   1.005e+04  2.542e+08
MPI Reductions:       3.112e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 1.7247e+02 100.0%  2.5069e+09 100.0%  2.529e+04 100.0%
1.005e+04      100.0%  1.245e+04 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all
processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------

      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run config/configure.py        #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################

      ##########################################################

     ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################

Event                Count      Time (sec)
Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04 1.3e+03
0.0e+00 12 18 93 12  0  12 18 93 12  0    19
MatSolve            3967 1.0 2.5914e+00 1.9 7.99e+07 1.9 0.0e+00 0.0e+00
0.0e+00  1 17  0  0  0   1 17  0  0  0   168
MatLUFactorNum        40 1.0 4.4779e-01 1.5 3.14e+07 1.5 0.0e+00 0.0e+00
0.0e+00  0  2  0  0  0   0  2  0  0  0    85
MatILUFactorSym        2 1.0 3.1099e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale              20 1.0 1.1487e-01 8.7 8.73e+07 8.9 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0    39
MatAssemblyBegin      40 1.0 7.8844e+00 1.3 0.00e+00 0.0 7.6e+02 2.8e+05
8.0e+01  4  0  3 83  1   4  0  3 83  1     0
MatAssemblyEnd        40 1.0 6.9408e+00 1.2 0.00e+00 0.0 1.2e+01 9.6e+02
6.4e+01  4  0  0  0  1   4  0  0  0  1     0
MatGetOrdering         2 1.0 8.0509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        21 1.0 1.4379e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot             3792 1.0 4.7372e+01 1.4 5.20e+06 1.4 0.0e+00 0.0e+00
3.8e+03 24 29  0  0 30  24 29  0  0 30    15
VecNorm             3967 1.0 3.9513e+01 1.2 4.11e+05 1.2 0.0e+00 0.0e+00
4.0e+03 21  2  0  0 32  21  2  0  0 32     1
VecScale            3947 1.0 3.4941e-02 1.2 2.18e+08 1.2 0.0e+00 0.0e+00
0.0e+00  0  1  0  0  0   0  1  0  0  0   738
VecCopy              155 1.0 1.0029e-0125.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              4142 1.0 3.4638e-01 6.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              290 1.0 5.9618e-03 1.2 2.14e+08 1.2 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   709
VecMAXPY            3947 1.0 1.5566e+00 1.3 1.64e+08 1.3 0.0e+00 0.0e+00
0.0e+00  1 31  0  0  0   1 31  0  0  0   498
VecAssemblyBegin      80 1.0 4.1793e+00 1.1 0.00e+00 0.0 9.6e+02 1.4e+04
2.4e+02  2  0  4  5  2   2  0  4  5  2     0
VecAssemblyEnd        80 1.0 2.0682e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04 1.3e+03
0.0e+00  0  0 93 12  0   0  0 93 12  0     0
VecScatterEnd       3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 11  0  0  0  0  11  0  0  0  0     0
VecNormalize        3947 1.0 3.9593e+01 1.2 6.11e+05 1.2 0.0e+00 0.0e+00
3.9e+03 21  3  0  0 32  21  3  0  0 32     2
KSPGMRESOrthog      3792 1.0 4.8670e+01 1.3 9.92e+06 1.3 0.0e+00 0.0e+00
3.8e+03 25 58  0  0 30  25 58  0  0 30    30
KSPSetup              80 1.0 2.0014e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              40 1.0 1.0660e+02 1.0 5.90e+06 1.0 2.4e+04 1.3e+03
1.2e+04 62100 93 12 97  62100 93 12 97    23
PCSetUp               80 1.0 4.5669e-01 1.5 3.05e+07 1.5 0.0e+00 0.0e+00
1.4e+01  0  2  0  0  0   0  2  0  0  0    83
PCSetUpOnBlocks       40 1.0 4.5418e-01 1.5 3.07e+07 1.5 0.0e+00 0.0e+00
1.0e+01  0  2  0  0  0   0  2  0  0  0    84
PCApply             3967 1.0 4.1737e+00 2.0 5.30e+07 2.0 0.0e+00 0.0e+00
4.0e+03  2 17  0  0 32   2 17  0  0 32   104
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

              Matrix     8              8      21136     0
           Index Set    12             12      74952     0
                 Vec    81             81    1447476     0
         Vec Scatter     2              2          0     0
       Krylov Solver     4              4      33760     0
      Preconditioner     4              4        392     0
========================================================================================================================
Average time to get PetscTime(): 1.09673e-06
Average time for MPI_Barrier(): 3.90053e-05
Average time for zero size MPI_Send(): 1.65105e-05
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
sizeof(PetscScalar) 8
Configure run at: Thu Jan 18 12:23:31 2007
Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
--with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
--with-mpi-dir=/opt/mpich/myrinet/intel/
-----------------------------------------
Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on atlas1.nus.edu.sg
Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1 SMP Wed
Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
Using PETSc arch: linux-mpif90
-----------------------------------------
Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
-w90 -w
-----------------------------------------
Using include paths:
-I/nas/lsftmp/g0306332/petsc-2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/include
-I/opt/mpich/myrinet/intel/include
------------------------------------------
Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g -w90
-w
Using libraries:
-Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
-L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts -lpetscsnes
-lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
-Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
-L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
-lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
-Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
-lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa
-lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib -lPEPCF90
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90 -lm  -Wl,-rpath,\
-Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl

This is the result I get for running 20 steps. There are 2 matrix to be
solved. I've only parallize the solving of linear equations and kept the
rest of the code serial for this test. However, I found that it's much
slower than the sequential version.

From the ratio, it seems that MatScale and VecSet 's ratio are very high.
I've done a scaling of 0.5 for momentum eqn. Is that the reason for the
slowness? That is all I can decipher ....

Thank you.

On 2/10/07, Matthew Knepley <knepley at gmail.com> wrote:
>
> On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> >
> > ops.... it worked for ex2 and ex2f  ;-)
> >
> > so what could be wrong? is there some commands or subroutine which i
> > must call? btw, i'm programming in fortran.
> >
>
> Yes, you must call PetscFinalize() in your code.
>
>   Matt
>
>
>  thank you.
> >
> >
> > On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > >
> > > Problems do not go away by ignoring them. Something is wrong here, and
> > > it may
> > > affect the rest of your program. Please try to run an example:
> > >
> > >   cd src/ksp/ksp/examples/tutorials
> > >   make ex2
> > >   ./ex2 -log_summary
> > >
> > >      Matt
> > >
> > > On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> > > >
> > > > Well, I don't know what's wrong. I did the same thing for -info and
> > > > it worked. Anyway, is there any other way?
> > > >
> > > > Like I can use -mat_view or call matview( ... ) to view a matrix. Is
> > > > there a similar subroutine for me to call?
> > > >
> > > > Thank you.
> > > >
> > > >
> > > >  On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > > > >
> > > > > Impossible, please check the spelling, and make sure your
> > > > > command line was not truncated.
> > > > >
> > > > >   Matt
> > > > >
> > > > > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > >
> > > > > > ya, i did use -log_summary. but no output.....
> > > > > >
> > > > > > On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > > > >
> > > > > > >
> > > > > > > -log_summary
> > > > > > >
> > > > > > >
> > > > > > > On Fri, 9 Feb 2007, Ben Tay wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I've tried to use log_summary but nothing came out? Did I
> > > > > > > miss out
> > > > > > > > something? It worked when I used -info...
> > > > > > > >
> > > > > > > >
> > > > > > > > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com > wrote:
> > > > > > > > >
> > > > > > > > > On 2/8/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > > > > > i'm trying to solve my cfd code using PETSc in parallel.
> > > > > > > Besides the
> > > > > > > > > linear
> > > > > > > > > > eqns for PETSc, other parts of the code has also been
> > > > > > > parallelized using
> > > > > > > > > > MPI.
> > > > > > > > >
> > > > > > > > > Finite elements or finite differences, or what?
> > > > > > > > >
> > > > > > > > > > however i find that the parallel version of the code
> > > > > > > running on 4
> > > > > > > > > processors
> > > > > > > > > > is even slower than the sequential version.
> > > > > > > > >
> > > > > > > > > Can you monitor the convergence and iteration count of
> > > > > > > momentum and
> > > > > > > > > poisson steps?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > in order to find out why, i've used the -info option to
> > > > > > > print out the
> > > > > > > > > > details. there are 2 linear equations being solved -
> > > > > > > momentum and
> > > > > > > > > poisson.
> > > > > > > > > > the momentum one is twice the size of the poisson. it is
> > > > > > > shown below:
> > > > > > > > >
> > > > > > > > > Can you use -log_summary command line option and send the
> > > > > > > output attached?
> > > > > > > > >
> > > > > > > > > > i saw some statements stating "seq". am i running in
> > > > > > > sequential or
> > > > > > > > > parallel
> > > > > > > > > > mode? have i preallocated too much space?
> > > > > > > > >
> > > > > > > > > It seems you are running in parallel. The "Seq" are
> > > > > > > related to local,
> > > > > > > > > internal objects. In PETSc, parallel matrices have inner
> > > > > > > sequential
> > > > > > > > > matrices.
> > > > > > > > >
> > > > > > > > > > lastly, if Ax=b, A_sta and A_end
> > > > > > > from  MatGetOwnershipRange and b_sta
> > > > > > > > > and
> > > > > > > > > > b_end from VecGetOwnershipRange should always be the
> > > > > > > same value, right?
> > > > > > > > >
> > > > > > > > > I should. If not, you are likely going to get an runtime
> > > > > > > error.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Lisandro Dalcín
> > > > > > > > > ---------------
> > > > > > > > > Centro Internacional de Métodos Computacionales en
> > > > > > > Ingeniería (CIMEC)
> > > > > > > > > Instituto de Desarrollo Tecnológico para la Industria
> > > > > > > Química (INTEC)
> > > > > > > > > Consejo Nacional de Investigaciones Científicas y Técnicas
> > > > > > > (CONICET)
> > > > > > > > > PTLC - Güemes 3450, (3000) Santa Fe, Argentina
> > > > > > > > > Tel/Fax: +54-(0)342-451.1594
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > One trouble is that despite this system, anyone who reads journals
> > > > > widely
> > > > > and critically is forced to realize that there are scarcely any
> > > > > bars to eventual
> > > > > publication. There seems to be no study too fragmented, no
> > > > > hypothesis too
> > > > > trivial, no literature citation too biased or too egotistical, no
> > > > > design too
> > > > > warped, no methodology too bungled, no presentation of results too
> > > > >
> > > > > inaccurate, too obscure, and too contradictory, no analysis too
> > > > > self-serving,
> > > > > no argument too circular, no conclusions too trifling or too
> > > > > unjustified, and
> > > > > no grammar and syntax too offensive for a paper to end up in
> > > > > print. -- Drummond Rennie
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > One trouble is that despite this system, anyone who reads journals
> > > widely
> > > and critically is forced to realize that there are scarcely any bars
> > > to eventual
> > > publication. There seems to be no study too fragmented, no hypothesis
> > > too
> > > trivial, no literature citation too biased or too egotistical, no
> > > design too
> > > warped, no methodology too bungled, no presentation of results too
> > > inaccurate, too obscure, and too contradictory, no analysis too
> > > self-serving,
> > > no argument too circular, no conclusions too trifling or too
> > > unjustified, and
> > > no grammar and syntax too offensive for a paper to end up in print. --
> > > Drummond Rennie
> > >
> >
> >
>
>
> --
> One trouble is that despite this system, anyone who reads journals widely
> and critically is forced to realize that there are scarcely any bars to
> eventual
> publication. There seems to be no study too fragmented, no hypothesis too
> trivial, no literature citation too biased or too egotistical, no design
> too
> warped, no methodology too bungled, no presentation of results too
> inaccurate, too obscure, and too contradictory, no analysis too
> self-serving,
> no argument too circular, no conclusions too trifling or too unjustified,
> and
> no grammar and syntax too offensive for a paper to end up in print. --
> Drummond Rennie
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070210/841ee45b/attachment.htm>