understanding the output from -info

Ben Tay zonexo at gmail.com
Sat Feb 10 02:28:51 CST 2007


Hi,

I tried to use ex2f.F as a test code. I've changed the number n,m from 3 to
500 each. I ran the code using 1 processor and then with 4 processor. I then
repeat the same with the following modification:


do i=1,10

      call KSPSolve(ksp,b,x,ierr)

end do
I've added to do loop to make the solving repeat 10 times.

In both cases, the serial code is faster, e.g. 1 taking 2.4 min while the
other 3.3 min.

Here's the log_summary:


---------------------------------------------- PETSc Performance Summary:
----------------------------------------------

./ex2f on a linux-mpi named atlas12.nus.edu.sg with 4 processors, by
g0306332 Sat Feb 10 16:21:36 2007
Using Petsc Release Version 2.3.2, Patch 8, Tue Jan  2 14:33:59 PST 2007 HG
revision: ebeddcedcc065e32fc252af32cf1d01ed4fc7a80

                         Max       Max/Min        Avg      Total
Time (sec):           2.213e+02      1.00051   2.212e+02
Objects:              5.500e+01      1.00000   5.500e+01
Flops:                4.718e+09      1.00019   4.718e+09  1.887e+10
Flops/sec:            2.134e+07      1.00070   2.133e+07  8.531e+07

Memory:               3.186e+07      1.00069              1.274e+08
MPI Messages:         1.832e+03      2.00000   1.374e+03  5.496e+03
MPI Message Lengths:  7.324e+06      2.00000   3.998e+03  2.197e+07
MPI Reductions:       7.112e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N -->
2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
-- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 2.2120e+02 100.0%  1.8871e+10 100.0%  5.496e+03 100.0%
3.998e+03      100.0%  2.845e+03 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all
processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------

      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run config/configure.py        #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################




      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################


Event                Count      Time (sec)
Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult              915 1.0 4.4291e+01 1.3 1.50e+07 1.3 5.5e+03 4.0e+03
0.0e+00 18 11100100  0  18 11100100  0    46
MatSolve             915 1.0 1.5684e+01 1.1 3.56e+07 1.1 0.0e+00 0.0e+00
0.0e+00  7 11  0  0  0   7 11  0  0  0   131
MatLUFactorNum         1 1.0 5.1654e-02 1.4 1.48e+07 1.4 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0    43
MatILUFactorSym        1 1.0 1.6838e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 3.2428e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 1.3120e+00 1.1 0.00e+00 0.0 6.0e+00 2.0e+03
1.3e+01  1  0  0  0  0   1  0  0  0  0     0
MatGetOrdering         1 1.0 4.1590e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMDot              885 1.0 8.5091e+01 1.1 2.27e+07 1.1 0.0e+00 0.0e+00
8.8e+02 36 36  0  0 31  36 36  0  0 31    80
VecNorm              916 1.0 6.6747e+01 1.1 1.81e+06 1.1 0.0e+00 0.0e+00
9.2e+02 29  2  0  0 32  29  2  0  0 32     7
VecScale             915 1.0 1.1430e+00 2.2 1.12e+08 2.2 0.0e+00 0.0e+00
0.0e+00  0  1  0  0  0   0  1  0  0  0   200
VecCopy               30 1.0 1.2816e-01 5.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               947 1.0 7.8979e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               60 1.0 5.5332e-02 1.1 1.51e+08 1.1 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0   542
VecMAXPY             915 1.0 1.5004e+01 1.3 1.54e+08 1.3 0.0e+00 0.0e+00
0.0e+00  6 38  0  0  0   6 38  0  0  0   483
VecScatterBegin      915 1.0 9.0358e-02 1.4 0.00e+00 0.0 5.5e+03 4.0e+03
0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd        915 1.0 3.5136e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 14  0  0  0  0  14  0  0  0  0     0
VecNormalize         915 1.0 6.7272e+01 1.0 2.68e+06 1.0 0.0e+00 0.0e+00
9.2e+02 30  4  0  0 32  30  4  0  0 32    10
KSPGMRESOrthog       885 1.0 9.8478e+01 1.1 3.87e+07 1.1 0.0e+00 0.0e+00
8.8e+02 42 72  0  0 31  42 72  0  0 31   138
KSPSetup               2 1.0 6.1918e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.1892e+02 1.0 2.15e+07 1.0 5.5e+03 4.0e+03
2.8e+03 99100100100 99  99100100100 99    86
PCSetUp                2 1.0 7.3292e-02 1.3 9.84e+06 1.3 0.0e+00 0.0e+00
6.0e+00  0  0  0  0  0   0  0  0  0  0    30
PCSetUpOnBlocks        1 1.0 7.2706e-02 1.3 9.97e+06 1.3 0.0e+00 0.0e+00
4.0e+00  0  0  0  0  0   0  0  0  0  0    31
PCApply              915 1.0 1.6508e+01 1.1 3.27e+07 1.1 0.0e+00 0.0e+00
9.2e+02  7 11  0  0 32   7 11  0  0 32   124
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

              Matrix     4              4     252008     0
           Index Set     5              5     753096     0
                 Vec    41             41   18519984     0
         Vec Scatter     1              1          0     0
       Krylov Solver     2              2      16880     0
      Preconditioner     2              2        196     0
========================================================================================================================
Average time to get PetscTime(): 1.09673e-06
Average time for MPI_Barrier(): 4.18186e-05
Average time for zero size MPI_Send(): 2.62856e-05
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4
sizeof(PetscScalar) 8
Configure run at: Thu Jan 18 12:23:31 2007
Configure options: --with-vendor-compilers=intel --with-x=0 --with-shared
--with-blas-lapack-dir=/lsftmp/g0306332/inter/mkl/lib/32
--with-mpi-dir=/opt/mpich/myrinet/intel/
-----------------------------------------
Libraries compiled on Thu Jan 18 12:24:41 SGT 2007 on atlas1.nus.edu.sg
Machine characteristics: Linux atlas1.nus.edu.sg 2.4.21-20.ELsmp #1 SMP Wed
Sep 8 17:29:34 GMT 2004 i686 i686 i386 GNU/Linux
Using PETSc directory: /nas/lsftmp/g0306332/petsc-2.3.2-p8
Using PETSc arch: linux-mpif90
-----------------------------------------
Using C compiler: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
Using Fortran compiler: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g
-w90 -w
-----------------------------------------
Using include paths:
-I/nas/lsftmp/g0306332/petsc-2.3.2-p8-I/nas/lsftmp/g0306332/petsc-
2.3.2-p8/bmake/linux-mpif90 -I/nas/lsftmp/g0306332/petsc-2.3.2-p8/include
-I/opt/mpich/myrinet/intel/include
------------------------------------------
Using C linker: /opt/mpich/myrinet/intel/bin/mpicc -fPIC -g
Using Fortran linker: /opt/mpich/myrinet/intel/bin/mpif90 -I. -fPIC -g -w90
-w
Using libraries:
-Wl,-rpath,/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90
-L/nas/lsftmp/g0306332/petsc-2.3.2-p8/lib/linux-mpif90 -lpetscts -lpetscsnes
-lpetscksp -lpetscdm -lpetscmat -lpetscvec -lpetsc
-Wl,-rpath,/lsftmp/g0306332/inter/mkl/lib/32
-L/lsftmp/g0306332/inter/mkl/lib/32 -lmkl_lapack -lmkl_ia32 -lguide
-lPEPCF90 -Wl,-rpath,/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/opt/mpich/myrinet/intel/lib -L/opt/mpich/myrinet/intel/lib
-Wl,-rpath,-rpath -Wl,-rpath,-ldl -L-ldl -lmpich -Wl,-rpath,-L -lgm
-lpthread -Wl,-rpath,/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa
-lunwind -ldl -lmpichf90 -Wl,-rpath,/opt/gm/lib -L/opt/gm/lib -lPEPCF90
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -L/usr/lib -lintrins -lIEPCF90 -lF90 -lm  -Wl,-rpath,\
-Wl,-rpath,\ -L\ -ldl -lmpich -Wl,-rpath,\ -L\ -lgm -lpthread
-Wl,-rpath,/opt/intel/compiler70/ia32/lib -L/opt/intel/compiler70/ia32/lib
-Wl,-rpath,/usr/lib -L/usr/lib -limf -lirc -lcprts -lcxa -lunwind -ldl
------------------------------------------

 So is there something wrong with the server's mpi implementation?

Thank you.



On 2/10/07, Satish Balay <balay at mcs.anl.gov> wrote:
>
> Looks like MatMult = 24sec Out of this the scatter time is: 22sec.
> Either something is wrong with your run - or MPI is really broken..
>
> Satish
>
> > > > MatMult             3927 1.0 2.4071e+01 1.3 6.14e+06 1.4 2.4e+04
> 1.3e+03
> > > > VecScatterBegin     3927 1.0 2.8672e-01 3.9 0.00e+00 0.0 2.4e+04
> 1.3e+03
> > > > VecScatterEnd       3927 1.0 2.2135e+01 1.5 0.00e+00 0.0 0.0e+00
> 0.0e+00
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20070210/164583ed/attachment.htm>


More information about the petsc-users mailing list