assembly

Mon Feb 4 11:34:29 CST 2008

hi matt,

this is indeed much clearer
i put the push and pop around MatAssemblyBegin/End

1:  First_assembly: 1.4724e+02  90.8%  0.0000e+00   0.0%  7.000e+01
0.9%  2.266e+03       15.5%  9.000e+00   0.2%
2: Second_assembly: 2.3823e-01   0.1%  0.0000e+00   0.0%  7.000e+01
0.9%  1.276e+02        0.9%  9.000e+00   0.2%
3:  Third_assembly: 5.0168e-01   0.3%  0.0000e+00   0.0%  4.200e+01
0.5%  2.237e+03       15.4%  3.000e+00   0.1%

The second assembly is another system of equations (pressure
correction in simpler)
so 1 and 3 are 1 and 2 ......

cheers
Thomas

---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------

Unknown Name on a linux-gnu named etna.geo.uu.nl with 6 processors, by
geenen Mon Feb  4 18:27:48 2008
Using Petsc Release Version 2.3.3, Patch 3, Fri Jun 15 16:51:25 CDT
2007 HG revision: f051789beadcd36f77fb6111d20225e26ed7cc0d

                         Max       Max/Min        Avg      Total
Time (sec):           1.621e+02      1.00000   1.621e+02
Objects:              2.600e+02      1.00000   2.600e+02
Flops:                1.265e+09      1.17394   1.161e+09  6.969e+09
Flops/sec:            7.806e+06      1.17393   7.166e+06  4.300e+07
MPI Messages:         1.436e+03      1.20816   1.326e+03  7.956e+03
MPI Message Lengths:  2.120e+07      1.23141   1.457e+04  1.159e+08
MPI Reductions:       9.862e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length
N --> 2N flops
                            and VecAXPY() for complex vectors of
length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  ---
Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts
%Total     Avg         %Total   counts   %Total
 0:      Main Stage: 1.3160e+01   8.1%  6.9689e+09 100.0%  7.762e+03
97.6%  9.941e+03       68.2%  2.020e+02   3.4%
 1:  First_assembly: 1.4724e+02  90.8%  0.0000e+00   0.0%  7.000e+01
0.9%  2.266e+03       15.5%  9.000e+00   0.2%
 2: Second_assembly: 2.3823e-01   0.1%  0.0000e+00   0.0%  7.000e+01
0.9%  1.276e+02        0.9%  9.000e+00   0.2%
 3:  Third_assembly: 5.0168e-01   0.3%  0.0000e+00   0.0%  4.200e+01
0.5%  2.237e+03       15.4%  3.000e+00   0.1%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops/sec: Max - maximum over all processors
                       Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message
lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------

      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was run without the PreLoadBegin()         #
      #   macros. To get timing results we always recommend    #
      #   preloading. otherwise timing numbers may be          #
      #   meaningless.                                         #
      ##########################################################

Event                Count      Time (sec)     Flops/sec
          --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg
len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult              135 1.0 1.5541e+00 1.3 3.03e+08 1.8 2.4e+03
1.3e+04 0.0e+00  1 26 30 26  0  11 26 30 38  0  1155
MatMultAdd            40 1.0 3.2611e-01 8.3 3.64e+07 7.7 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0    31
MatSolve              44 1.0 6.7682e-01 1.7 1.94e+08 1.7 0.0e+00
0.0e+00 0.0e+00  0  7  0  0  0   4  7  0  0  0   673
MatRelax              80 1.0 3.4453e+00 1.4 9.46e+07 1.1 2.2e+03
1.3e+04 0.0e+00  2 23 28 26  0  22 23 29 38  0   466
MatLUFactorSym         1 1.0 6.7567e-02 1.1 0.00e+00 0.0 0.0e+00
0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         2 1.0 2.8804e+00 1.4 2.53e+08 1.4 0.0e+00
0.0e+00 0.0e+00  1 44  0  0  0  17 44  0  0  0  1058
MatILUFactorSym        1 1.0 6.7676e-01 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 1.0e+00  0  0  0  0  0   5  0  0  0  0     0
MatAssemblyBegin       4 1.0 2.7711e-0237.3 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  1     0
MatAssemblyEnd         4 1.0 2.4401e-02 1.2 0.00e+00 0.0 2.8e+01
4.7e+02 7.0e+00  0  0  0  0  0   0  0  0  0  3     0
MatGetRowIJ            2 1.0 1.2948e-02 7.5 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       1 1.0 2.8603e-02 1.1 0.00e+00 0.0 0.0e+00
0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  1     0
MatGetOrdering         2 1.0 2.3054e-02 2.4 0.00e+00 0.0 0.0e+00
0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  2     0
MatIncreaseOvrlp       1 1.0 8.1528e-02 1.0 0.00e+00 0.0 1.1e+03
2.4e+03 2.0e+01  0  0 13  2  0   1  0 14  3 10     0
MatZeroEntries         3 1.0 3.4422e-02 1.8 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MAT_GetRedundantMatrix       1 1.0 3.5774e-02 1.6 0.00e+00 0.0 9.0e+01
8.0e+04 2.0e+00  0  0  1  6  0   0  0  1  9  1     0
VecDot                39 1.0 5.8092e-0131.0 1.02e+0842.3 0.0e+00
0.0e+00 3.9e+01  0  0  0  0  1   2  0  0  0 19    17
VecMDot                8 1.0 3.4735e-03 2.9 4.52e+07 5.5 0.0e+00
0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  4    69
VecNorm               31 1.0 3.8690e-02 4.1 1.11e+08 5.6 0.0e+00
0.0e+00 3.1e+01  0  0  0  0  1   0  0  0  0 15   139
VecScale              85 1.0 1.7631e-03 1.2 5.59e+08 1.9 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2150
VecCopy                4 1.0 5.5027e-04 1.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               139 1.0 3.0956e-03 1.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               98 1.0 5.0848e-03 1.3 9.35e+08 1.1 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4765
VecAYPX               40 1.0 1.0264e-02 1.4 2.01e+08 1.1 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   973
VecWAXPY              75 1.0 2.6191e-02 1.4 1.22e+08 1.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   615
VecMAXPY               9 1.0 2.1935e-04 1.7 2.93e+08 1.1 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1317
VecAssemblyBegin       4 1.0 1.9331e-03 1.2 0.00e+00 0.0 0.0e+00
0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  6     0
VecAssemblyEnd         4 1.0 2.3842e-05 1.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      267 1.0 1.0370e-01 1.5 0.00e+00 0.0 6.3e+03
1.1e+04 0.0e+00  0  0 79 59  0   1  0 81 86  0     0
VecScatterEnd        267 1.0 4.1189e-01 3.1 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0
KSPGMRESOrthog         4 1.0 4.5178e-03 2.0 5.22e+07 3.8 0.0e+00
0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  6   117
KSPSetup               6 1.0 7.9882e-03 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  2     0
KSPSolve               2 1.0 6.6590e+00 1.0 1.36e+08 1.2 5.9e+03
1.1e+04 8.1e+01  4 70 75 54  1  51 70 76 80 40   731
PCSetUp                3 1.0 5.0877e+00 1.2 1.29e+08 1.2 1.6e+03
9.1e+03 7.3e+01  3 47 20 13  1  34 47 21 19 36   642
PCSetUpOnBlocks        1 1.0 1.3292e+00 1.0 1.46e+08 1.0 0.0e+00
0.0e+00 3.0e+00  1 17  0  0  0  10 17  0  0  1   873
PCApply               44 1.0 4.7444e+00 1.1 1.16e+08 1.2 4.7e+03
1.0e+04 0.0e+00  3 41 59 42  0  34 41 60 61  0   607

--- Event Stage 1: First_assembly

MatAssemblyBegin       1 1.0 3.0375e-01 3.5 0.00e+00 0.0 4.2e+01
4.2e+05 2.0e+00  0  0  1 15  0   0  0 60 99 22     0
MatAssemblyEnd         1 1.0 1.4709e+02 1.0 0.00e+00 0.0 2.8e+01
8.2e+03 7.0e+00 91  0  0  0  0 100  0 40  1 78     0

--- Event Stage 2: Second_assembly

MatAssemblyBegin       1 1.0 1.7451e-02 5.0 0.00e+00 0.0 4.2e+01
2.4e+04 2.0e+00  0  0  1  1  0   5  0 60 98 22     0
MatAssemblyEnd         1 1.0 2.3056e-01 1.0 0.00e+00 0.0 2.8e+01
8.4e+02 7.0e+00  0  0  0  0  0  95  0 40  2 78     0

--- Event Stage 3: Third_assembly

MatAssemblyBegin       1 1.0 3.3676e-01 3.8 0.00e+00 0.0 4.2e+01
4.2e+05 2.0e+00  0  0  1 15  0  45  0100100 67     0
MatAssemblyEnd         1 1.0 3.3125e-01 1.7 0.00e+00 0.0 0.0e+00
0.0e+00 1.0e+00  0  0  0  0  0  55  0  0  0 33     0

--- Event Stage 4: Unknown

--- Event Stage 5: Unknown

--- Event Stage 6: Unknown

--- Event Stage 7: Unknown

--- Event Stage 8: Unknown

--- Event Stage 9: Unknown

------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions   Memory  Descendants' Mem.

--- Event Stage 0: Main Stage

              Matrix    16              0          0     0
           Index Set    32             25     207888     0
                 Vec   172             91   16374224     0
         Vec Scatter     8              0          0     0
   IS L to G Mapping     4              0          0     0
       Krylov Solver     6              0          0     0
      Preconditioner     6              0          0     0
              Viewer     4              2          0     0
           Container     2              0          0     0

--- Event Stage 1: First_assembly

           Index Set     2              2      44140     0
                 Vec     2              1     196776     0
         Vec Scatter     1              0          0     0

--- Event Stage 2: Second_assembly

           Index Set     2              2       4732     0
                 Vec     2              1      11464     0
         Vec Scatter     1              0          0     0

--- Event Stage 3: Third_assembly

--- Event Stage 4: Unknown

--- Event Stage 5: Unknown

--- Event Stage 6: Unknown

--- Event Stage 7: Unknown

--- Event Stage 8: Unknown

--- Event Stage 9: Unknown

========================================================================================================================
Average time to get PetscTime(): 8.82149e-07
Average time for MPI_Barrier(): 0.000153208
Average time for zero size MPI_Send(): 1.86761e-05
OptionTable: -mg_levels_ksp_type richardson
OptionTable: -mg_levels_pc_sor_omega 1.05
OptionTable: -mg_levels_pc_type sor
OptionTable: -pc_ml_PrintLevel 4
OptionTable: -pc_ml_maxNlevels 2
OptionTable: -pc_type ml
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Fri Sep 28 23:34:20 2007

On Feb 4, 2008 5:47 PM, Matthew Knepley <knepley at gmail.com> wrote:
> On Feb 4, 2008 10:41 AM, Thomas Geenen <geenen at gmail.com> wrote:
> > On Feb 3, 2008 8:51 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > >
> > >     Hmmm, are you saying the first round of setting values still
> > > takes much longer then the second round?
> >
> > yes
> >
> > >Or is it the time
> > > in MatAssemblyBegin() much longer the first time?
> > >
> > >    The MatAssembly process has one piece of code that's
> > > work is order n*size; where n is the stash size and size is the
> > > number of processes, all other work is only order n.
> > >
> > >     Could you send the -log_summary output?
> >
> > the timing is cumulative i guess?
> > in between these two solves i solve a smaller system for which i do
> > not include the timing.
>
> I ma having a little trouble reading this. I think the easiest thing to do
> is wrap the two section of code in their own sections:
>
> PetscLogStageRegister(&stage1, "First assembly");
> PetscLogStageRegister(&stage2, "Second assembly");
>
> PetscLogStagePush(stage1);
> <code for first assembly>
> PetscLogStagePop();
>
> PetscLogStagePush(stage2);
> <code for second assembly>
> PetscLogStagePop();
>
> Then we can also get a look at how many messages are sent
> and how big they are.
>
>   Thanks,
>
>      Matt
>
>
> > run 1
> >                          Max       Max/Min        Avg      Total
> > Time (sec):           2.154e+02      1.00001   2.154e+02
> > Objects:              2.200e+01      1.00000   2.200e+01
> > Flops:                0.000e+00      0.00000   0.000e+00  0.000e+00
> > Flops/sec:            0.000e+00      0.00000   0.000e+00  0.000e+00
> > MPI Messages:         1.750e+01      1.25000   1.633e+01  9.800e+01
> > MPI Message Lengths:  3.460e+06      1.29903   1.855e+05  1.818e+07
> > MPI Reductions:       4.167e+00      1.00000
> >
> > Flop counting convention: 1 flop = 1 real number operation of type
> > (multiply/divide/add/subtract)
> >                             e.g., VecAXPY() for real vectors of length
> > N --> 2N flops
> >                             and VecAXPY() for complex vectors of
> > length N --> 8N flops
> >
> > Summary of Stages:   ----- Time ------  ----- Flops -----  ---
> > Messages ---  -- Message Lengths --  -- Reductions --
> >                         Avg     %Total     Avg     %Total   counts
> > %Total     Avg         %Total   counts   %Total
> >  0:      Main Stage: 2.1537e+02 100.0%  0.0000e+00   0.0%  9.800e+01
> > 100.0%  1.855e+05      100.0%  2.500e+01 100.0%
> >
> > ------------------------------------------------------------------------------------------------------------------------
> > See the 'Profiling' chapter of the users' manual for details on
> > interpreting output.
> > Phase summary info:
> >    Count: number of times phase was executed
> >    Time and Flops/sec: Max - maximum over all processors
> >                        Ratio - ratio of maximum to minimum over all processors
> >    Mess: number of messages sent
> >    Avg. len: average message length
> >    Reduct: number of global reductions
> >    Global: entire computation
> >    Stage: stages of a computation. Set stages with PetscLogStagePush()
> > and PetscLogStagePop().
> >       %T - percent time in this phase         %F - percent flops in this phase
> >       %M - percent messages in this phase     %L - percent message
> > lengths in this phase
> >       %R - percent reductions in this phase
> >    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> > over all processors)
> > ------------------------------------------------------------------------------------------------------------------------
> >
> >
> >       ##########################################################
> >       #                                                        #
> >       #                          WARNING!!!                    #
> >       #                                                        #
> >       #   This code was run without the PreLoadBegin()         #
> >       #   macros. To get timing results we always recommend    #
> >       #   preloading. otherwise timing numbers may be          #
> >       #   meaningless.                                         #
> >       ##########################################################
> >
> >
> > Event                Count      Time (sec)     Flops/sec
> >           --- Global ---  --- Stage ---   Total
> >                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg
> > len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> > ------------------------------------------------------------------------------------------------------------------------
> >
> > --- Event Stage 0: Main Stage
> >
> > MatAssemblyBegin       1 1.0 2.9536e-01 4.0 0.00e+00 0.0 4.2e+01
> > 4.2e+05 2.0e+00  0  0 43 98  8   0  0 43 98  8     0
> > MatAssemblyEnd         1 1.0 2.1410e+02 1.0 0.00e+00 0.0 2.8e+01
> > 8.2e+03 7.0e+00 99  0 29  1 28  99  0 29  1 28     0
> > MatZeroEntries         1 1.0 9.3739e-02 5.9 0.00e+00 0.0 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecSet                 4 1.0 3.9721e-04 2.2 0.00e+00 0.0 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > ------------------------------------------------------------------------------------------------------------------------
> >
> > Memory usage is given in bytes:
> >
> > Object Type          Creations   Destructions   Memory  Descendants' Mem.
> >
> > --- Event Stage 0: Main Stage
> >
> >               Matrix     3              0          0     0
> >            Index Set     6              6      45500     0
> >                  Vec     6              1     196776     0
> >          Vec Scatter     3              0          0     0
> >    IS L to G Mapping     2              0          0     0
> >        Krylov Solver     1              0          0     0
> >       Preconditioner     1              0          0     0
> > ========================================================================================================================
> > Average time to get PetscTime(): 1.71661e-06
> > Average time for MPI_Barrier(): 0.000159979
> > Average time for zero size MPI_Send(): 1.29938e-05
> > Compiled without FORTRAN kernels
> > Compiled with full precision matrices (default)
> > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> > sizeof(PetscScalar) 8
> > Configure run at: Fri Sep 28 23:34:20 2007
> >
> > run2
> >                          Max       Max/Min        Avg      Total
> > Time (sec):           2.298e+02      1.00000   2.298e+02
> > Objects:              2.600e+02      1.00000   2.600e+02
> > Flops:                1.265e+09      1.17394   1.161e+09  6.969e+09
> > Flops/sec:            5.505e+06      1.17394   5.054e+06  3.032e+07
> > MPI Messages:         1.436e+03      1.20816   1.326e+03  7.956e+03
> > MPI Message Lengths:  2.120e+07      1.23141   1.457e+04  1.159e+08
> > MPI Reductions:       4.192e+02      1.00000
> >
> > Flop counting convention: 1 flop = 1 real number operation of type
> > (multiply/divide/add/subtract)
> >                             e.g., VecAXPY() for real vectors of length
> > N --> 2N flops
> >                             and VecAXPY() for complex vectors of
> > length N --> 8N flops
> >
> > Summary of Stages:   ----- Time ------  ----- Flops -----  ---
> > Messages ---  -- Message Lengths --  -- Reductions --
> >                         Avg     %Total     Avg     %Total   counts
> > %Total     Avg         %Total   counts   %Total
> >  0:      Main Stage: 2.2943e+02  99.8%  6.9689e+09 100.0%  7.944e+03
> > 99.8%  1.457e+04      100.0%  2.230e+02   8.9%
> >
> > ------------------------------------------------------------------------------------------------------------------------
> > See the 'Profiling' chapter of the users' manual for details on
> > interpreting output.
> > Phase summary info:
> >    Count: number of times phase was executed
> >    Time and Flops/sec: Max - maximum over all processors
> >                        Ratio - ratio of maximum to minimum over all processors
> >    Mess: number of messages sent
> >    Avg. len: average message length
> >    Reduct: number of global reductions
> >    Global: entire computation
> >    Stage: stages of a computation. Set stages with PetscLogStagePush()
> > and PetscLogStagePop().
> >       %T - percent time in this phase         %F - percent flops in this phase
> >       %M - percent messages in this phase     %L - percent message
> > lengths in this phase
> >       %R - percent reductions in this phase
> >    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> > over all processors)
> > ------------------------------------------------------------------------------------------------------------------------
> >
> >
> >       ##########################################################
> >       #                                                        #
> >       #                          WARNING!!!                    #
> >       #                                                        #
> >       #   This code was run without the PreLoadBegin()         #
> >       #   macros. To get timing results we always recommend    #
> >       #   preloading. otherwise timing numbers may be          #
> >       #   meaningless.                                         #
> >       ##########################################################
> >
> >
> > Event                Count      Time (sec)     Flops/sec
> >           --- Global ---  --- Stage ---   Total
> >                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg
> > len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> > ------------------------------------------------------------------------------------------------------------------------
> >
> > --- Event Stage 0: Main Stage
> >
> > MatMult              135 1.0 2.0830e+00 1.4 2.30e+08 1.9 2.4e+03
> > 1.3e+04 0.0e+00  1 26 30 26  0   1 26 30 26  0   862
> > MatMultAdd            40 1.0 3.2598e-01 4.5 2.68e+07 5.7 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    31
> > MatSolve              44 1.0 6.7841e-01 1.7 1.93e+08 1.7 0.0e+00
> > 0.0e+00 0.0e+00  0  7  0  0  0   0  7  0  0  0   672
> > MatRelax              80 1.0 4.2949e+00 1.6 8.77e+07 1.2 2.2e+03
> > 1.3e+04 0.0e+00  1 23 28 26  0   1 23 28 26  0   374
> > MatLUFactorSym         1 1.0 7.6739e-02 1.1 0.00e+00 0.0 0.0e+00
> > 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatLUFactorNum         2 1.0 2.9370e+00 1.5 2.53e+08 1.5 0.0e+00
> > 0.0e+00 0.0e+00  1 44  0  0  0   1 44  0  0  0  1037
> > MatILUFactorSym        1 1.0 6.7334e-01 1.0 0.00e+00 0.0 0.0e+00
> > 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatAssemblyBegin       7 1.0 7.1652e-01 4.4 0.00e+00 0.0 1.3e+02
> > 2.9e+05 8.0e+00  0  0  2 32  0   0  0  2 32  4     0
> > MatAssemblyEnd         7 1.0 2.1473e+02 1.0 0.00e+00 0.0 8.4e+01
> > 3.2e+03 2.2e+01 93  0  1  0  1  94  0  1  0 10     0
> > MatGetRowIJ            2 1.0 1.8899e-03 1.1 0.00e+00 0.0 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MatGetSubMatrice       1 1.0 3.1915e-02 1.0 0.00e+00 0.0 0.0e+00
> > 0.0e+00 3.0e+00  0  0  0  0  0   0  0  0  0  1     0
> > MatGetOrdering         2 1.0 1.2184e-02 1.6 0.00e+00 0.0 0.0e+00
> > 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  2     0
> > MatIncreaseOvrlp       1 1.0 7.6865e-02 1.0 0.00e+00 0.0 1.1e+03
> > 2.4e+03 2.0e+01  0  0 13  2  1   0  0 13  2  9     0
> > MatZeroEntries         3 1.0 1.0429e-01 4.7 0.00e+00 0.0 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > MAT_GetRedundantMatrix       1 1.0 3.0144e-02 1.2 0.00e+00 0.0 9.0e+01
> > 8.0e+04 2.0e+00  0  0  1  6  0   0  0  1  6  1     0
> > VecDot                39 1.0 9.0680e-01140.0 2.95e+08191.3 0.0e+00
> > 0.0e+00 3.9e+01  0  0  0  0  2   0  0  0  0 17    11
> > VecMDot                8 1.0 2.8777e-03 2.8 4.16e+07 3.3 0.0e+00
> > 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  4    84
> > VecNorm               31 1.0 6.2301e-02 4.2 7.17e+07 5.8 0.0e+00
> > 0.0e+00 3.1e+01  0  0  0  0  1   0  0  0  0 14    86
> > VecScale              85 1.0 1.7729e-03 1.4 4.94e+08 1.4 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  2138
> > VecCopy                4 1.0 6.2108e-04 1.5 0.00e+00 0.0 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecSet               139 1.0 3.5934e-03 1.4 0.00e+00 0.0 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecAXPY               98 1.0 5.1496e-03 1.3 9.24e+08 1.1 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  4705
> > VecAYPX               40 1.0 1.0311e-02 1.4 1.90e+08 1.0 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   969
> > VecWAXPY              75 1.0 2.6060e-02 1.4 1.22e+08 1.0 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   618
> > VecMAXPY               9 1.0 2.1315e-04 1.6 3.01e+08 1.2 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1355
> > VecAssemblyBegin       4 1.0 1.9898e-03 1.2 0.00e+00 0.0 0.0e+00
> > 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  5     0
> > VecAssemblyEnd         4 1.0 2.1219e-05 1.2 0.00e+00 0.0 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > VecScatterBegin      267 1.0 1.0224e-01 1.6 0.00e+00 0.0 6.3e+03
> > 1.1e+04 0.0e+00  0  0 79 59  0   0  0 79 59  0     0
> > VecScatterEnd        267 1.0 7.8653e-0111.3 0.00e+00 0.0 0.0e+00
> > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > KSPGMRESOrthog         4 1.0 3.7677e-03 1.9 4.78e+07 2.4 0.0e+00
> > 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  5   141
> > KSPSetup               6 1.0 1.0260e-02 1.0 0.00e+00 0.0 0.0e+00
> > 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  2     0
> > KSPSolve               2 1.0 7.8238e+00 1.0 1.16e+08 1.2 5.9e+03
> > 1.1e+04 8.1e+01  3 70 75 54  3   3 70 75 54 36   622
> > PCSetUp                3 1.0 5.1323e+00 1.2 1.29e+08 1.2 1.6e+03
> > 9.1e+03 7.3e+01  2 47 20 13  3   2 47 20 13 33   636
> > PCSetUpOnBlocks        1 1.0 1.3325e+00 1.0 1.47e+08 1.0 0.0e+00
> > 0.0e+00 3.0e+00  1 17  0  0  0   1 17  0  0  1   871
> > PCApply               44 1.0 5.7917e+00 1.1 9.60e+07 1.2 4.7e+03
> > 1.0e+04 0.0e+00  2 41 59 42  0   2 41 59 42  0   497
> > ------------------------------------------------------------------------------------------------------------------------
> >
> > Memory usage is given in bytes:
> >
> > Object Type          Creations   Destructions   Memory  Descendants' Mem.
> >
> > --- Event Stage 0: Main Stage
> >
> >               Matrix    16              0          0     0
> >            Index Set    36             29     256760     0
> >                  Vec   176             93   16582464     0
> >          Vec Scatter    10              0          0     0
> >    IS L to G Mapping     4              0          0     0
> >        Krylov Solver     6              0          0     0
> >       Preconditioner     6              0          0     0
> >               Viewer     4              2          0     0
> >            Container     2              0          0     0
> > ========================================================================================================================
> > Average time to get PetscTime(): 8.10623e-07
> > Average time for MPI_Barrier(): 0.000178194
> > Average time for zero size MPI_Send(): 1.33117e-05
> > OptionTable: -mg_levels_ksp_type richardson
> > OptionTable: -mg_levels_pc_sor_omega 1.05
> > OptionTable: -mg_levels_pc_type sor
> > OptionTable: -pc_ml_PrintLevel 4
> > OptionTable: -pc_ml_maxNlevels 2
> > OptionTable: -pc_type ml
> > Compiled without FORTRAN kernels
> > Compiled with full precision matrices (default)
> > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> > sizeof(PetscScalar) 8
> > Configure run at: Fri Sep 28 23:34:20 2007
> >
> >
> > >
> > >     Barry
> > >
> > >
> > >     The a
> > >
> > > On Feb 3, 2008, at 6:44 AM, Thomas Geenen wrote:
> > >
> > > > i call
> > > > ierr = MatStashSetInitialSize(A[*seqsolve],stash_size,
> > > > stash_size);CHKERRQ(ierr);
> > > > with 100 000 000 for the stash size to make sure that's not the
> > > > bottleneck
> > > >
> > > > the assemble time remains unchanged however.
> > > >
> > > > nstash in MatAssemblyBegin_MPIAIJ (CPU=0) = 109485
> > > > reallocs in MatAssemblyBegin_MPIAIJ =  0
> > > >
> > > > cheers
> > > > Thomas
> > > >
> > > > On Saturday 02 February 2008 23:19, Barry Smith wrote:
> > > >>    The matstash has a concept of preallocation also. During the first
> > > >> setvalues
> > > >> it is allocating more and more memory for the stash. In the second
> > > >> setvalues
> > > >> the stash is large enough so does not require any addition
> > > >> allocation.
> > > >>
> > > >>    You can use the option -matstash_initial_size <size> to allocate
> > > >> enough space
> > > >> initially so that the first setvalues is also fast. It does not look
> > > >> like there is a way
> > > >> coded to get the <size> that you should use. It should be set to the
> > > >> maximum nonzeros
> > > >> any process has that belongs to other processes. The stash handling
> > > >> code is
> > > >> in src/mat/utils/matstash.c, perhaps you can figure out how to
> > > >> printout with PetscInfo()
> > > >> the sizes needed?
> > > >>
> > > >>
> > > >>    Barry
> > > >>
> > > >> On Feb 2, 2008, at 12:30 PM, Thomas Geenen wrote:
> > > >>> On Saturday 02 February 2008 18:33, Hong Zhang wrote:
> > > >>>> On Sat, 2 Feb 2008, Thomas Geenen wrote:
> > > >>>>> Dear Petsc users,
> > > >>>>>
> > > >>>>> I would like to understand what is slowing down the assembly phase
> > > >>>>> of my
> > > >>>>> matrix. I create a matrix with MatCreateMPIAIJ i make a rough
> > > >>>>> guess of
> > > >>>>> the number of off diagonal entries and then use a conservative
> > > >>>>> value to
> > > >>>>> make sure I do not need extra mallocs. (the number of diagonal
> > > >>>>> entries is
> > > >>>>> exact)
> > > >>>>> next i call MatSetValues and MatAssemblyBegin, MatAssemblyEnd.
> > > >>>>> The first time i call MatSetValues and MatAssemblyBegin,
> > > >>>>> MatAssemblyEnd it takes about 170 seconds
> > > >>>>> the second time 0.3 seconds.
> > > >>>>> I run it on 6 cpu's and I do fill quit a number of row-entries on
> > > >>>>> the
> > > >>>>> "wrong" cpu. However thats also the case the second run. I checked
> > > >>>>> that there are no additional mallocs
> > > >>>>> MatGetInfo info.mallocs=0 both after MatSetValues and after
> > > >>>>> MatAssemblyBegin, MatAssemblyEnd.
> > > >>>>
> > > >>>> Run your code with the option '-log_summary' and check which
> > > >>>> function
> > > >>>> call dominates the execution time.
> > > >>>
> > > >>> the time is spend in MatStashScatterGetMesg_Private
> > > >>>
> > > >>>>> I run it on 6 cpu's and I do fill quit a number of row-entries on
> > > >>>>> the
> > > >>>>> "wrong" cpu.
> > > >>>>
> > > >>>> Likely, the communication that sending the entries to the
> > > >>>> corrected cpu consume the time. Can you fill the entries in the
> > > >>>> correct cpu?
> > > >>>
> > > >>> the second time the entries are filled on the wrong CPU as well.
> > > >>> i am curious about the difference in time between run 1 and 2.
> > > >>>
> > > >>>> Hong
> > > >>>>
> > > >>>>> cheers
> > > >>>>> Thomas
> > > >
> > >
> > >
> >
> >
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
>