understanding the output from -info
Ben Tay
zonexo at gmail.com
Fri Feb 9 18:51:43 CST 2007
Ya, that's the mistake. I changed part of the code resulting in
PetscFinalize not being called.
Here's the output:
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
/home/enduser/g0306332/ns2d/a.out on a linux-mpi named
atlas00.nus.edu.sgwith 4 processors, by g0306332 Sat Feb 10 08:32:08
2007
Using Petsc Release Version 2.3.2, Patch 8, Tue Jan 2 14:33:59 PST 2007 HG
revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80
Max Max/Min Avg Total
Time (sec): 2.826e+02 2.08192 1.725e+02
Objects: 1.110e+02 1.00000 1.110e+02
Flops: 6.282e+08 1.00736 6.267e+08 2.507e+09
Flops/sec: 4.624e+06 2.08008 4.015e+06 1.606e+07
Memory: 1.411e+07 1.01142 5.610e+07
MPI Messages: 8.287e+03 1.90156 6.322e+03 2.529e+04
MPI Message Lengths: 6.707e+07 1.11755 1.005e+04 2.542e+08
MPI Reductions: 3.112e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N -->
2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
-- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 1.7247e+02 100.0% 2.5069e+09 100.0% 2.529e+04 100.0%
1.005e+04 100.0% 1.245e+04 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops/sec: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all
processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option, #
# To get timing results run config/configure.py #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
##########################################################
##########################################################
# #
# WARNING!!! #
# #
# This code was run without the PreLoadBegin() #
# macros. To get timing results we always recommend #
# preloading. otherwise timing numbers may be #
# meaningless. #
##########################################################
Event Count Time (sec)
Flops/sec --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04 1.3e+03
0.0e+00 12 18 93 12 0 12 18 93 12 0 19
MatSolve 3967 1.0 2.5914e+00 1.9 7.99e+07 1.9 0.0e+00 0.0e+00
0.0e+00 1 17 0 0 0 1 17 0 0 0 168
MatLUFactorNum 40 1.0 4.4779e-01 1.5 3.14e+07 1.5 0.0e+00 0.0e+00
0.0e+00 0 2 0 0 0 0 2 0 0 0 85
MatILUFactorSym 2 1.0 3.1099e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatScale 20 1.0 1.1487e-01 8.7 8.73e+07 8.9 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 39
MatAssemblyBegin 40 1.0 7.8844e+00 1.3 0.00e+00 0.0 7.6e+02 2.8e+05
8.0e+01 4 0 3 83 1 4 0 3 83 1 0
MatAssemblyEnd 40 1.0 6.9408e+00 1.2 0.00e+00 0.0 1.2e+01 9.6e+02
6.4e+01 4 0 0 0 1 4 0 0 0 1 0
MatGetOrdering 2 1.0 8.0509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 21 1.0 1.4379e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMDot 3792 1.0 4.7372e+01 1.4 5.20e+06 1.4 0.0e+00 0.0e+00
3.8e+03 24 29 0 0 30 24 29 0 0 30 15
VecNorm 3967 1.0 3.9513e+01 1.2 4.11e+05 1.2 0.0e+00 0.0e+00
4.0e+03 21 2 0 0 32 21 2 0 0 32 1
VecScale 3947 1.0 3.4941e-02 1.2 2.18e+08 1.2 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 738
VecCopy 155 1.0 1.0029e-0125.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 4142 1.0 3.4638e-01 6.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 290 1.0 5.9618e-03 1.2 2.14e+08 1.2 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 709
VecMAXPY 3947 1.0 1.5566e+00 1.3 1.64e+08 1.3 0.0e+00 0.0e+00
0.0e+00 1 31 0 0 0 1 31 0 0 0 498
VecAssemblyBegin 80 1.0 4.1793e+00 1.1 0.00e+00 0.0 9.6e+02 1.4e+04
2.4e+02 2 0 4 5 2 2 0 4 5 2 0
VecAssemblyEnd 80 1.0 2.0682e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04 1.3e+03
0.0e+00 0 0 93 12 0 0 0 93 12 0 0
VecScatterEnd 3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 11 0 0 0 0 11 0 0 0 0 0
VecNormalize 3947 1.0 3.9593e+01 1.2 6.11e+05 1.2 0.0e+00 0.0e+00
3.9e+03 21 3 0 0 32 21 3 0 0 32 2
KSPGMRESOrthog 3792 1.0 4.8670e+01 1.3 9.92e+06 1.3 0.0e+00 0.0e+00
3.8e+03 25 58 0 0 30 25 58 0 0 30 30
KSPSetup 80 1.0 2.0014e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+01 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 40 1.0 1.0660e+02 1.0 5.90e+06 1.0 2.4e+04 1.3e+03
1.2e+04 62100 93 12 97 62100 93 12 97 23
PCSetUp 80 1.0 4.5669e-01 1.5 3.05e+07 1.5 0.0e+00 0.0e+00
1.4e+01 0 2 0 0 0 0 2 0 0 0 83
PCSetUpOnBlocks 40 1.0 4.5418e-01 1.5 3.07e+07 1.5 0.0e+00 0.0e+00
1.0e+01 0 2 0 0 0 0 2 0 0 0 84
PCApply 3967 1.0 4.1737e+00 2.0 5.30e+07 2.0 0.0e+00 0.0e+00
4.0e+03 2 17 0 0 32 2 17 0 0 32 104
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Matrix 8 8 21136 0
Index Set 12 12 74952 0
Vec 81 81 1447476 0
Vec Scatter 2 2 0 0
Krylov Solver 4 4 33760 0
Preconditioner 4 4 392 0
========================================================================================================================
Average time to get PetscTime(): 1.09673e-06
Average time for MPI_Barrier(): 3.90053e-05
Average time for zero size MPI_Send(): 1.65105e-05
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
sizeof(PetscScalar) 8
Configure run at: Thu Jan 18 12:23:31 2007
Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
--with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
--with-mpi-dir=/opt/mpich/myrinet/intel/
-----------------------------------------
Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on atlas1.nus.edu.sg
Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1 SMP Wed
Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
Using PETSc arch: linux-mpif90
-----------------------------------------
Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
-w90 -w
-----------------------------------------
Using include paths:
-I/nas/lsftmp/g0306332/petsc-2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/include
-I/opt/mpich/myrinet/intel/include
------------------------------------------
Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g -w90
-w
Using libraries:
-Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
-L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts -lpetscsnes
-lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
-Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
-L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
-lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
-Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
-lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa
-lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib -lPEPCF90
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90 -lm -Wl,-rpath,\
-Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl
This is the result I get for running 20 steps. There are 2 matrix to be
solved. I've only parallize the solving of linear equations and kept the
rest of the code serial for this test. However, I found that it's much
slower than the sequential version.
From the ratio, it seems that MatScale and VecSet 's ratio are very high.
I've done a scaling of 0.5 for momentum eqn. Is that the reason for the
slowness? That is all I can decipher ....
Thank you.
On 2/10/07, Matthew Knepley <knepley at gmail.com> wrote:
>
> On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> >
> > ops.... it worked for ex2 and ex2f ;-)
> >
> > so what could be wrong? is there some commands or subroutine which i
> > must call? btw, i'm programming in fortran.
> >
>
> Yes, you must call PetscFinalize() in your code.
>
> Matt
>
>
> thank you.
> >
> >
> > On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > >
> > > Problems do not go away by ignoring them. Something is wrong here, and
> > > it may
> > > affect the rest of your program. Please try to run an example:
> > >
> > > cd src/ksp/ksp/examples/tutorials
> > > make ex2
> > > ./ex2 -log_summary
> > >
> > > Matt
> > >
> > > On 2/9/07, Ben Tay <zonexo at gmail.com> wrote:
> > > >
> > > > Well, I don't know what's wrong. I did the same thing for -info and
> > > > it worked. Anyway, is there any other way?
> > > >
> > > > Like I can use -mat_view or call matview( ... ) to view a matrix. Is
> > > > there a similar subroutine for me to call?
> > > >
> > > > Thank you.
> > > >
> > > >
> > > > On 2/9/07, Matthew Knepley <knepley at gmail.com > wrote:
> > > > >
> > > > > Impossible, please check the spelling, and make sure your
> > > > > command line was not truncated.
> > > > >
> > > > > Matt
> > > > >
> > > > > On 2/9/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > >
> > > > > > ya, i did use -log_summary. but no output.....
> > > > > >
> > > > > > On 2/9/07, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > > > > > >
> > > > > > >
> > > > > > > -log_summary
> > > > > > >
> > > > > > >
> > > > > > > On Fri, 9 Feb 2007, Ben Tay wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I've tried to use log_summary but nothing came out? Did I
> > > > > > > miss out
> > > > > > > > something? It worked when I used -info...
> > > > > > > >
> > > > > > > >
> > > > > > > > On 2/9/07, Lisandro Dalcin <dalcinl at gmail.com > wrote:
> > > > > > > > >
> > > > > > > > > On 2/8/07, Ben Tay < zonexo at gmail.com> wrote:
> > > > > > > > > > i'm trying to solve my cfd code using PETSc in parallel.
> > > > > > > Besides the
> > > > > > > > > linear
> > > > > > > > > > eqns for PETSc, other parts of the code has also been
> > > > > > > parallelized using
> > > > > > > > > > MPI.
> > > > > > > > >
> > > > > > > > > Finite elements or finite differences, or what?
> > > > > > > > >
> > > > > > > > > > however i find that the parallel version of the code
> > > > > > > running on 4
> > > > > > > > > processors
> > > > > > > > > > is even slower than the sequential version.
> > > > > > > > >
> > > > > > > > > Can you monitor the convergence and iteration count of
> > > > > > > momentum and
> > > > > > > > > poisson steps?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > in order to find out why, i've used the -info option to
> > > > > > > print out the
> > > > > > > > > > details. there are 2 linear equations being solved -
> > > > > > > momentum and
> > > > > > > > > poisson.
> > > > > > > > > > the momentum one is twice the size of the poisson. it is
> > > > > > > shown below:
> > > > > > > > >
> > > > > > > > > Can you use -log_summary command line option and send the
> > > > > > > output attached?
> > > > > > > > >
> > > > > > > > > > i saw some statements stating "seq". am i running in
> > > > > > > sequential or
> > > > > > > > > parallel
> > > > > > > > > > mode? have i preallocated too much space?
> > > > > > > > >
> > > > > > > > > It seems you are running in parallel. The "Seq" are
> > > > > > > related to local,
> > > > > > > > > internal objects. In PETSc, parallel matrices have inner
> > > > > > > sequential
> > > > > > > > > matrices.
> > > > > > > > >
> > > > > > > > > > lastly, if Ax=b, A_sta and A_end
> > > > > > > from MatGetOwnershipRange and b_sta
> > > > > > > > > and
> > > > > > > > > > b_end from VecGetOwnershipRange should always be the
> > > > > > > same value, right?
> > > > > > > > >
> > > > > > > > > I should. If not, you are likely going to get an runtime
> > > > > > > error.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Lisandro Dalcín
> > > > > > > > > ---------------
> > > > > > > > > Centro Internacional de Métodos Computacionales en
> > > > > > > Ingeniería (CIMEC)
> > > > > > > > > Instituto de Desarrollo Tecnológico para la Industria
> > > > > > > Química (INTEC)
> > > > > > > > > Consejo Nacional de Investigaciones Científicas y Técnicas
> > > > > > > (CONICET)
> > > > > > > > > PTLC - Güemes 3450, (3000) Santa Fe, Argentina
> > > > > > > > > Tel/Fax: +54-(0)342-451.1594
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > One trouble is that despite this system, anyone who reads journals
> > > > > widely
> > > > > and critically is forced to realize that there are scarcely any
> > > > > bars to eventual
> > > > > publication. There seems to be no study too fragmented, no
> > > > > hypothesis too
> > > > > trivial, no literature citation too biased or too egotistical, no
> > > > > design too
> > > > > warped, no methodology too bungled, no presentation of results too
> > > > >
> > > > > inaccurate, too obscure, and too contradictory, no analysis too
> > > > > self-serving,
> > > > > no argument too circular, no conclusions too trifling or too
> > > > > unjustified, and
> > > > > no grammar and syntax too offensive for a paper to end up in
> > > > > print. -- Drummond Rennie
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > One trouble is that despite this system, anyone who reads journals
> > > widely
> > > and critically is forced to realize that there are scarcely any bars
> > > to eventual
> > > publication. There seems to be no study too fragmented, no hypothesis
> > > too
> > > trivial, no literature citation too biased or too egotistical, no
> > > design too
> > > warped, no methodology too bungled, no presentation of results too
> > > inaccurate, too obscure, and too contradictory, no analysis too
> > > self-serving,
> > > no argument too circular, no conclusions too trifling or too
> > > unjustified, and
> > > no grammar and syntax too offensive for a paper to end up in print. --
> > > Drummond Rennie
> > >
> >
> >
>
>
> --
> One trouble is that despite this system, anyone who reads journals widely
> and critically is forced to realize that there are scarcely any bars to
> eventual
> publication. There seems to be no study too fragmented, no hypothesis too
> trivial, no literature citation too biased or too egotistical, no design
> too
> warped, no methodology too bungled, no presentation of results too
> inaccurate, too obscure, and too contradictory, no analysis too
> self-serving,
> no argument too circular, no conclusions too trifling or too unjustified,
> and
> no grammar and syntax too offensive for a paper to end up in print. --
> Drummond Rennie
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070210/841ee45b/attachment.htm>
More information about the petsc-users
mailing list