Slow speed after changing from serial to parallel

Ben Tay zonexo at gmail.com
Tue Apr 15 19:52:19 CDT 2008


Hi,

I was initially using LU and Hypre to solve my serial code. I switched 
to the default GMRES when I converted the parallel code. I've now redo 
the test using KSPBCGS and also Hypre  BommerAMG. Seems like 
MatAssemblyBegin, VecAYPX, VecScatterEnd (in bold) are the problems. 
What should I be checking? Here's the results for 1 and 2 processor  for 
each solver. Thank you so much!

*1 processor KSPBCGS *

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance 
Summary: ----------------------------------------------

./a.out on a atlas3-mp named atlas3-c45 with 1 processor, by g0306332 
Wed Apr 16 08:32:21 2008
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007 
HG revision: 414581156e67e55c761739b0deb119f7590d0f4b

                         Max       Max/Min        Avg      Total
Time (sec):           8.176e+01      1.00000   8.176e+01
Objects:              2.700e+01      1.00000   2.700e+01
Flops:                1.893e+10      1.00000   1.893e+10  1.893e+10
Flops/sec:            2.315e+08      1.00000   2.315e+08  2.315e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       3.743e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N 
--> 2N flops
                            and VecAXPY() for complex vectors of length 
N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages 
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 8.1756e+01 100.0%  1.8925e+10 100.0%  0.000e+00   
0.0%  0.000e+00        0.0%  3.743e+03 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on 
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all 
processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() 
and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this 
phase
      %M - percent messages in this phase     %L - percent message 
lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
over all processors)
------------------------------------------------------------------------------------------------------------------------

     
      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################




Event                Count      Time (sec)     
Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len 
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1498 1.0 1.6548e+01 1.0 3.55e+08 1.0 0.0e+00 0.0e+00 
0.0e+00 20 31  0  0  0  20 31  0  0  0   355
MatSolve            1500 1.0 3.2228e+01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 
0.0e+00 39 31  0  0  0  39 31  0  0  0   183
MatLUFactorNum         2 1.0 2.0642e-01 1.0 1.02e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0   102
MatILUFactorSym        2 1.0 2.0250e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 1.7963e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            2 1.0 3.8147e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         2 1.0 2.6301e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         2 1.0 1.0190e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetup               2 1.0 2.8230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               2 1.0 6.7238e+01 1.0 2.81e+08 1.0 0.0e+00 0.0e+00 
3.7e+03 82100  0  0100  82100  0  0100   281
PCSetUp                2 1.0 4.3527e-01 1.0 4.85e+07 1.0 0.0e+00 0.0e+00 
6.0e+00  1  0  0  0  0   1  0  0  0  0    48
PCApply             1500 1.0 3.2232e+01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 
0.0e+00 39 31  0  0  0  39 31  0  0  0   183
VecDot              2984 1.0 5.3279e+00 1.0 4.84e+08 1.0 0.0e+00 0.0e+00 
3.0e+03  7 14  0  0 80   7 14  0  0 80   484
VecNorm              754 1.0 1.1453e+00 1.0 5.74e+08 1.0 0.0e+00 0.0e+00 
7.5e+02  1  3  0  0 20   1  3  0  0 20   574
VecCopy                2 1.0 3.2830e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                 3 1.0 3.9389e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             2244 1.0 4.8304e+00 1.0 4.02e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  6 10  0  0  0   6 10  0  0  0   402
VecAYPX              752 1.0 1.5623e+00 1.0 4.19e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  2  3  0  0  0   2  3  0  0  0   419
VecWAXPY            1492 1.0 5.0827e+00 1.0 2.54e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  6  7  0  0  0   6  7  0  0  0   254
VecAssemblyBegin       2 1.0 2.6703e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
6.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         2 1.0 5.2452e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

              Matrix     4              4  300369852     0
       Krylov Solver     2              2          8     0
      Preconditioner     2              2        336     0
           Index Set     6              6   15554064     0
                 Vec    13             13   44937496     0
========================================================================================================================
Average time to get PetscTime(): 3.09944e-07
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8
Configure run at: Tue Jan  8 22:22:08 2008
Configure options: --with-memcmp-ok --sizeof_char=1 --sizeof_void_p=8 
--sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --sizeof_long_long=8 
--sizeof_float=4 --sizeof_double=8 --bits_per_byte=8 --sizeof_MPI_Comm=4 
--sizeof_MPI_Fint=4 --with-vendor-compilers=intel --with-x=0 
--with-hypre-dir=/home/enduser/g0306332/lib/hypre --with-debugging=0 
--with-batch=1 --with-mpi-shared=0 
--with-mpi-include=/usr/local/topspin/mpi/mpich/include 
--with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a 
--with-mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun 
--with-blas-lapack-dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0
-----------------------------------------
Libraries compiled on Tue Jan  8 22:34:13 SGT 2008 on atlas3-c01

*2 processors KSPBCGS


*
---------------------------------------------- PETSc Performance 
Summary: ----------------------------------------------

./a.out on a atlas3-mp named atlas3-c25 with 2 processors, by g0306332 
Wed Apr 16 08:37:25 2008
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007 
HG revision: 414581156e67e55c761739b0deb119f7590d0f4b

                         Max       Max/Min        Avg      Total
Time (sec):           3.795e+02      1.00000   3.795e+02
Objects:              3.800e+01      1.00000   3.800e+01
Flops:                8.592e+09      1.00000   8.592e+09  1.718e+10
Flops/sec:            2.264e+07      1.00000   2.264e+07  4.528e+07
MPI Messages:         1.335e+03      1.00000   1.335e+03  2.670e+03
MPI Message Lengths:  6.406e+06      1.00000   4.798e+03  1.281e+07
MPI Reductions:       1.678e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N 
--> 2N flops
                            and VecAXPY() for complex vectors of length 
N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages 
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 3.7950e+02 100.0%  1.7185e+10 100.0%  2.670e+03 
100.0%  4.798e+03      100.0%  3.357e+03 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on 
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all 
processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() 
and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this 
phase
      %M - percent messages in this phase     %L - percent message 
lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
over all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################

 
Event                Count      Time (sec)     
Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len 
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult             1340 1.0 7.4356e+01 1.6 5.87e+07 1.6 2.7e+03 4.8e+03 
0.0e+00 16 31100100  0  16 31100100  0    72
MatSolve            1342 1.0 4.3794e+01 1.2 7.08e+07 1.2 0.0e+00 0.0e+00 
0.0e+00 11 31  0  0  0  11 31  0  0  0   123
MatLUFactorNum         2 1.0 2.5116e-01 1.0 7.68e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0   153
MatILUFactorSym        2 1.0 2.3831e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
*MatAssemblyBegin       2 1.0 7.9380e-0116482.3 0.00e+00 0.0 0.0e+00 
0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0*
MatAssemblyEnd         2 1.0 2.4782e-01 1.0 0.00e+00 0.0 2.0e+00 2.4e+03 
7.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            2 1.0 5.0068e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         2 1.0 1.8508e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         2 1.0 8.6530e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetup               3 1.0 1.9901e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               2 1.0 3.3575e+02 1.0 2.56e+07 1.0 2.7e+03 4.8e+03 
3.3e+03 88100100100100  88100100100100    51
PCSetUp                3 1.0 5.0751e-01 1.0 3.79e+07 1.0 0.0e+00 0.0e+00 
6.0e+00  0  0  0  0  0   0  0  0  0  0    76
PCSetUpOnBlocks        1 1.0 4.4248e-02 1.0 4.39e+07 1.0 0.0e+00 0.0e+00 
3.0e+00  0  0  0  0  0   0  0  0  0  0    88
PCApply             1342 1.0 4.9832e+01 1.2 6.56e+07 1.2 0.0e+00 0.0e+00 
0.0e+00 12 31  0  0  0  12 31  0  0  0   108
VecDot              2668 1.0 2.0710e+02 1.2 6.70e+06 1.2 0.0e+00 0.0e+00 
2.7e+03 50 13  0  0 79  50 13  0  0 79    11
VecNorm              675 1.0 2.9565e+01 3.3 3.33e+07 3.3 0.0e+00 0.0e+00 
6.7e+02  5  3  0  0 20   5  3  0  0 20    20
VecCopy                2 1.0 2.4400e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1338 1.0 5.9052e+00 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY             2007 1.0 2.2173e+01 2.6 1.03e+08 2.6 0.0e+00 0.0e+00 
0.0e+00  4 10  0  0  0   4 10  0  0  0    79
*VecAYPX              673 1.0 2.8062e+00 4.0 4.29e+08 4.0 0.0e+00 
0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0   213*
VecWAXPY            1334 1.0 4.8052e+00 2.4 2.84e+08 2.4 0.0e+00 0.0e+00 
0.0e+00  1  7  0  0  0   1  7  0  0  0   240
VecAssemblyBegin       2 1.0 1.4091e-04 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 
6.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         2 1.0 5.0068e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
*VecScatterBegin     1334 1.0 1.1666e-01 5.9 0.00e+00 0.0 2.7e+03 
4.8e+03 0.0e+00  0  0100100  0   0  0100100  0     0*
VecScatterEnd       1334 1.0 5.2569e+01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 10  0  0  0  0  10  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------
                      
Memory usage is given in bytes:
  
Object Type          Creations   Destructions   Memory  Descendants' Mem.
  
--- Event Stage 0: Main Stage
     
              Matrix     6              6  283964900     0
       Krylov Solver     3              3          8     0
      Preconditioner     3              3        424     0
           Index Set     8              8   12965152     0
                 Vec    17             17   34577080     0
         Vec Scatter     1              1          0     0
========================================================================================================================
Average time to get PetscTime(): 8.10623e-07                  
Average time for MPI_Barrier(): 5.72205e-07                   
Average time for zero size MPI_Send(): 1.90735e-06            
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8
Configure run at: Tue Jan  8 22:22:08 2008
@
@

*1 processor Hypre

* 
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance 
Summary: ----------------------------------------------

./a.out on a atlas3-mp named atlas3-c45 with 1 processor, by g0306332 
Wed Apr 16 08:45:38 2008
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007 
HG revision: 414581156e67e55c761739b0deb119f7590d0f4b

                         Max       Max/Min        Avg      Total
Time (sec):           2.059e+01      1.00000   2.059e+01
Objects:              3.400e+01      1.00000   3.400e+01
Flops:                3.151e+08      1.00000   3.151e+08  3.151e+08
Flops/sec:            1.530e+07      1.00000   1.530e+07  1.530e+07
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       2.400e+01      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N 
--> 2N flops
                            and VecAXPY() for complex vectors of length 
N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages 
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 2.0590e+01 100.0%  3.1512e+08 100.0%  0.000e+00   
0.0%  0.000e+00        0.0%  2.400e+01 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on 
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all 
processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() 
and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this 
phase
      %M - percent messages in this phase     %L - percent message 
lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
over all processors)
------------------------------------------------------------------------------------------------------------------------

     
      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################


Event                Count      Time (sec)     
Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len 
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult               12 1.0 2.6237e-01 1.0 4.24e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  1 35  0  0  0   1 35  0  0  0   424
MatSolve               7 1.0 4.5932e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  2 33  0  0  0   2 33  0  0  0   223
MatLUFactorNum         1 1.0 1.2635e-01 1.0 1.36e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  1  5  0  0  0   1  5  0  0  0   136
MatILUFactorSym        1 1.0 1.3007e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
1.0e+00  1  0  0  0  4   1  0  0  0  4     0
MatConvert             1 1.0 4.1277e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatAssemblyBegin       2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 1.3946e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatGetRow         432000 1.0 8.4685e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            2 1.0 3.0994e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.6376e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
2.0e+00  0  0  0  0  8   0  0  0  0  8     0
MatZeroEntries         2 1.0 8.2422e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog         6 1.0 1.0955e-01 1.0 3.31e+08 1.0 0.0e+00 0.0e+00 
6.0e+00  1 12  0  0 25   1 12  0  0 25   331
KSPSetup               2 1.0 2.5418e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               2 1.0 5.9363e+00 1.0 5.31e+07 1.0 0.0e+00 0.0e+00 
1.8e+01 29100  0  0 75  29100  0  0 75    53
PCSetUp                2 1.0 1.5691e+00 1.0 1.10e+07 1.0 0.0e+00 0.0e+00 
5.0e+00  8  5  0  0 21   8  5  0  0 21    11
PCApply               14 1.0 3.7548e+00 1.0 2.73e+07 1.0 0.0e+00 0.0e+00 
0.0e+00 18 33  0  0  0  18 33  0  0  0    27
VecMDot                6 1.0 7.7139e-02 1.0 2.35e+08 1.0 0.0e+00 0.0e+00 
6.0e+00  0  6  0  0 25   0  6  0  0 25   235
VecNorm               14 1.0 9.9192e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00 
7.0e+00  0  6  0  0 29   0  6  0  0 29   183
VecScale               7 1.0 5.4052e-03 1.0 5.59e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0   559
VecCopy                1 1.0 2.0301e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                 9 1.0 1.1883e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                7 1.0 2.8702e-02 1.0 3.91e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0  4  0  0  0   0  4  0  0  0   391
VecAYPX                6 1.0 2.8528e-02 1.0 3.63e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0  3  0  0  0   0  3  0  0  0   363
VecMAXPY               7 1.0 4.1699e-02 1.0 5.59e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0  7  0  0  0   0  7  0  0  0   559
VecAssemblyBegin       2 1.0 2.3842e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
6.0e+00  0  0  0  0 25   0  0  0  0 25     0
VecAssemblyEnd         2 1.0 4.0531e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize           7 1.0 1.3958e-02 1.0 6.50e+08 1.0 0.0e+00 0.0e+00 
7.0e+00  0  3  0  0 29   0  3  0  0 29   650
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

              Matrix     3              3  267569524     0
       Krylov Solver     2              2      17224     0
      Preconditioner     2              2        440     0
           Index Set     3              3   10369032     0
                 Vec    24             24   82961752     0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
OptionTable: -log_summary
Compiled without FORTRAN kernels
 

*2 processors Hypre*
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance 
Summary: ----------------------------------------------

./a.out on a atlas3-mp named atlas3-c48 with 2 processors, by g0306332 
Wed Apr 16 08:46:56 2008  
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007 
HG revision: 414581156e67e55c761739b0deb119f7590d0f4b

                         Max       Max/Min        Avg      Total
Time (sec):           9.614e+01      1.02903   9.478e+01
Objects:              4.100e+01      1.00000   4.100e+01
Flops:                2.778e+08      1.00000   2.778e+08  5.555e+08
Flops/sec:            2.973e+06      1.02903   2.931e+06  5.862e+06
MPI Messages:         7.000e+00      1.00000   7.000e+00  1.400e+01
MPI Message Lengths:  3.120e+04      1.00000   4.457e+03  6.240e+04
MPI Reductions:       1.650e+01      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N 
--> 2N flops
                            and VecAXPY() for complex vectors of length 
N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages 
---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 9.4784e+01 100.0%  5.5553e+08 100.0%  1.400e+01 
100.0%  4.457e+03      100.0%  3.300e+01 100.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on 
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all 
processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() 
and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this 
phase
      %M - percent messages in this phase     %L - percent message 
lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
over all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################


Event                Count      Time (sec)     
Flops/sec                         --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len 
Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s

--- Event Stage 0: Main Stage

MatMult               12 1.0 4.5412e-01 2.0 4.34e+08 2.0 1.2e+01 4.8e+03 
0.0e+00  0 36 86 92  0   0 36 86 92  0   438
MatSolve               7 1.0 5.0386e-01 1.1 2.28e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  1 37  0  0  0   1 37  0  0  0   407
MatLUFactorNum         1 1.0 9.5120e-01 1.6 2.98e+07 1.6 0.0e+00 0.0e+00 
0.0e+00  1  6  0  0  0   1  6  0  0  0    36
MatILUFactorSym        1 1.0 1.1285e+01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
1.0e+00  9  0  0  0  3   9  0  0  0  3     0
MatConvert             1 1.0 6.2023e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
*MatAssemblyBegin       2 1.0 3.1003e+01246.4 0.00e+00 0.0 0.0e+00 
0.0e+00 2.0e+00 16  0  0  0  6  16  0  0  0  6     0*
MatAssemblyEnd         2 1.0 2.2413e+00 1.9 0.00e+00 0.0 2.0e+00 2.4e+03 
7.0e+00  2  0 14  8 21   2  0 14  8 21     0
MatGetRow         216000 1.0 9.2643e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            3 1.0 5.9605e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 2.4464e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
2.0e+00  0  0  0  0  6   0  0  0  0  6     0
MatZeroEntries         2 1.0 6.1072e+00 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  5  0  0  0  0   5  0  0  0  0     0
KSPGMRESOrthog         6 1.0 4.4529e-02 1.3 5.26e+08 1.3 0.0e+00 0.0e+00 
6.0e+00  0  7  0  0 18   0  7  0  0 18   815
KSPSetup               2 1.0 1.8315e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
KSPSolve               2 1.0 3.0572e+01 1.1 9.64e+06 1.1 1.2e+01 4.8e+03 
1.8e+01 31100 86 92 55  31100 86 92 55    18
PCSetUp                2 1.0 2.0424e+01 1.3 1.07e+06 1.3 0.0e+00 0.0e+00 
5.0e+00 19  6  0  0 15  19  6  0  0 15     2
PCApply               14 1.0 2.9443e+00 1.0 3.56e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  3 37  0  0  0   3 37  0  0  0    70
VecMDot                6 1.0 2.7561e-02 1.6 5.15e+08 1.6 0.0e+00 0.0e+00 
6.0e+00  0  3  0  0 18   0  3  0  0 18   658
*VecNorm               14 1.0 1.4223e+00 5.1 5.45e+07 5.1 0.0e+00 
0.0e+00 7.0e+00  1  5  0  0 21   1  5  0  0 21    21*
VecScale               7 1.0 1.8604e-02 1.0 8.25e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0   163
VecCopy                1 1.0 3.0069e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                 9 1.0 3.2693e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                7 1.0 3.0581e-02 1.1 3.98e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  0  4  0  0  0   0  4  0  0  0   706
*VecAYPX                6 1.0 4.4344e+00147.6 3.45e+08147.6 0.0e+00 
0.0e+00 0.0e+00  2  4  0  0  0   2  4  0  0  0     5*
VecMAXPY               7 1.0 2.1892e-02 1.0 5.34e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0  4  0  0  0   0  4  0  0  0  1066
VecAssemblyBegin       2 1.0 9.2602e-0412.5 0.00e+00 0.0 0.0e+00 0.0e+00 
6.0e+00  0  0  0  0 18   0  0  0  0 18     0
VecAssemblyEnd         2 1.0 7.8678e-06 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin        6 1.0 9.3222e-05 1.1 0.00e+00 0.0 1.2e+01 4.8e+03 
0.0e+00  0  0 86 92  0   0  0 86 92  0     0
*VecScatterEnd          6 1.0 1.9959e-011404.6 0.00e+00 0.0 0.0e+00 
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0*
VecNormalize           7 1.0 2.3088e-02 1.0 1.98e+08 1.0 0.0e+00 0.0e+00 
7.0e+00  0  2  0  0 21   0  2  0  0 21   393
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

              Matrix     5              5  267571932     0
       Krylov Solver     2              2      17224     0
      Preconditioner     2              2        440     0
           Index Set     5              5   10372120     0
                 Vec    26             26   53592184     0
         Vec Scatter     1              1          0     0
========================================================================================================================
Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 8.10623e-07
Average time for zero size MPI_Send(): 1.43051e-06
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8
Configure run at: Tue Jan  8 22:22:08 2008
 

Matthew Knepley wrote:
> The convergence here is jsut horrendous. Have you tried using LU to check
> your implementation? All the time is in the solve right now. I would first
> try a direct method (at least on a small problem) and then try to understand
> the convergence behavior. MUMPS can actually scale very well for big problems.
>
>   Matt
>
>   
>>>>>           
>>>>         
>>>
>>>       
>>     
>
>
>
>   




More information about the petsc-users mailing list