petsc-users Digest, Vol 2, Issue 33

Matthew Knepley knepley at gmail.com
Tue Mar 3 12:37:40 CST 2009


On Tue, Mar 3, 2009 at 9:17 AM, STEPHANE TCHOUANMO <tchouanm at msn.com> wrote:

>  Ok Matt you're right. The SNES Solve is definitely at fault.
> But still there's something i dont understand in the log summary i get.
> Take for example the one for the unsteady heat equation right after:
>
> Event                Count      Time (sec)
> Flops/sec                         --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMDot               22 1.0 5.0474e-04 1.0 5.48e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  6  0  0  0   0  6  0  0  0   548
> VecNorm               62 1.0 8.8694e-03 1.0 4.72e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  9  0  0  0   0  9  0  0  0    47
> VecScale              32 1.0 3.8212e-04 1.0 2.83e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0   283
> VecCopy               81 1.0 1.1948e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                88 1.0 8.4816e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY               10 1.0 1.8910e-04 1.0 3.57e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0   357
> VecWAXPY              10 1.0 2.6472e-04 1.0 1.27e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0   127
> VecMAXPY              32 1.0 1.0271e-03 1.0 4.14e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  9  0  0  0   0  9  0  0  0   414
> VecAssemblyBegin      40 1.0 8.7160e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd        40 1.0 7.5617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin       39 1.0 1.5163e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNormalize          32 1.0 3.8553e-03 1.0 6.65e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  5  0  0  0   0  5  0  0  0    67
> MatMult               22 1.0 1.5831e-02 1.0 2.16e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 72  0  0  0   0 72  0  0  0   216
> MatAssemblyBegin      30 1.0 6.5176e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd        30 1.0 1.2829e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatZeroEntries         9 1.0 1.8313e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SNESSolve             10 1.0 1.7674e+01 1.0 2.69e+05 1.0 0.0e+00 0.0e+00
> 3.0e+01 93100  0  0  0  94100  0  0 75     0
> SNESLineSearch        10 1.0 3.7443e+00 1.0 4.51e+04 1.0 0.0e+00 0.0e+00
> 1.0e+01 20  4  0  0  0  20  4  0  0 25     0
> SNESFunctionEval      20 1.0 7.2693e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+01 38  0  0  0  0  39  0  0  0 50     0
> SNESJacobianEval      10 1.0 1.0367e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+01 55  0  0  0  0  55  0  0  0 25     0
> KSPGMRESOrthog        22 1.0 1.4277e-03 1.0 3.88e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 12  0  0  0   0 12  0  0  0   388
> KSPSetup              10 1.0 1.3128e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve              10 1.0 2.8431e-02 1.0 1.57e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 94  0  0  0   0 94  0  0  0   157
> PCSetUp               10 1.0 2.5831e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCApply               32 1.0 5.7973e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions   Memory  Descendants' Mem.
>
> --- Event Stage 0: Main Stage
>
>            Index Set    40             40     548800     0
>                  Vec   167            153    4198932     0
>          Vec Scatter    40             40          0     0
>               Matrix     1              0          0     0
>                 SNES    10              9       1116     0
>        Krylov Solver    10              9     151920     0
>       Preconditioner    10              9          0     0
>               Viewer     1              0          0     0
>
> ========================================================================================================================
>
> Now it says SNESSolve takes 93% of the main stage, right?
> In that case what does is mean the 20% for SNESLinesearch, 38% for
> SNESFunctionEval and 55% for SNESJacobianEval? It cant be percentages of the
> main stage or of the SNESSolve. Do you have an idea?
>

1) There is not strict separation or nesting for events. For instance, line
search, function eval, and Jacobian eval all happen inside SNESolve. In
addition,
    function evaluation happens inside line search.

2) However, clearly jac eval + func eval = solve roughly. Thus, nothing else
is taking any time. This is confirmed by looking at KSPSolve, which takes no
time.

3) If you say that the first step (of 10) takes most of the time, the
conclusion for me is inescapable. LibMesh is making a whole bunch of
allocation calls
     during the first time step, which is very very slow. After that, it has
the memory and everything runs fine.

I suggest talking to the LibMesh developers.

  Matt


>
> Actually to answer your question, what is long is the first Newton
> iteration in the first time step and a debugging in DDD shows it too. So
> with the log summary i get, its obviously due to SNESSolve with Residual and
> Jacobian evaluations.
> Here is a part of my Jacobian and Residual computation routine in LibMesh
> for the basic Laplacian (- \Delta u = f ). Its called 'compute_jacobian'
> and 'compute_residual' respectively. Could you please look at it quickly and
> tell me if you see at first look something strange?
>
>
> *  void compute_jacobian (const NumericVector<Number>& soln,
>                  SparseMatrix<Number>&  jacobian)
>   {
>     EquationSystems &es = *_equation_system;
>
>     const MeshBase& mesh = es.get_mesh();
>
>     NonlinearImplicitSystem& system =
>       es.get_system<NonlinearImplicitSystem>("dc");
>
>     const DofMap& dof_map = system.get_dof_map();
>
>     // Define the finite volume
>     FV fv;
>
>     MeshBase::const_element_iterator       el     =
> mesh.active_local_elements_begin();
>     const MeshBase::const_element_iterator end_el =
> mesh.active_local_elements_end();
>
>     // The loop on every simplex
>     for ( ; el != end_el; ++el)
>     {
>      const Elem* elem = *el;
>
>      dof_map.dof_indices (elem, dof_indices);
>
>      fv.reinit(elem);
>
>      n_dofs = dof_indices.size(); // = 4 for a hex
>
>      Ke.resize (n_dofs, n_dofs);
>
>      // Assemble the elementary matrix for the Laplacian problem (size
> 4*4)
>      Ke=fv.elmmat(perm);
>
>      dof_map.constrain_element_matrix (Ke, dof_indices);
>
>      // Adds the small matrix Ke to the Jacobian
>      jacobian.add_matrix (Ke, dof_indices);
>     }
>   }
>
>   void compute_residual (const NumericVector<Number>& soln,
>                  NumericVector<Number>& residual)
>   {
>     EquationSystems &es = *_equation_system;
>
>     const MeshBase& mesh = es.get_mesh();
>
>     NonlinearImplicitSystem& system =
>       es.get_system<NonlinearImplicitSystem>("dc");
>
>     const DofMap& dof_map = system.get_dof_map();
>
>     // Define the finite volume
>     FV fv;
>
>     residual.zero();
>
>     MeshBase::const_element_iterator       el     =
> mesh.active_local_elements_begin();
>     const MeshBase::const_element_iterator end_el =
> mesh.active_local_elements_end();
>
>   ** // The loop on every simplex**
>    for ( ; el != end_el; ++el)
>    {
>       const Elem* elem = *el;
>       dof_map.dof_indices (elem, dof_indices);
>
>       fv.reinit(elem);
>       n_dofs = dof_indices.size(); **// = 4 for a hex**
>       Se.resize (n_dofs);
>
>       // Compute the solution from the previous Newton iterate
>       for (unsigned int l=0; l<n_dofs; l++)
>         Se(l) = soln(dof_indices[l]);
>
>      Re.resize (n_dofs);
>
>      elmMat=fv.elmmat(perm);
>
>     for (unsigned int i=0; i<n_dofs; i++)
>     {
>       vol=fv.elmvolume(i);
>       xyz=elem->point(i);
>
>       Re(i) = vol*(xyz(0)-0.5) ;
>
>       for (unsigned int j=0; j<n_dofs; j++)
>       Re(i) += elmMat(i,j)*Se(j) ;
>     }
>     dof_map.constrain_element_vector (Re, dof_indices);
>     residual.add_vector (Re, dof_indices);
>    }
>   }
>
> *
> and thats it!
>
> What amazes me is that i always get the right solution after resolution.
>
> Thanks a lot.
>
> Stephane
>
>
>
>
>
>
> ------------------------------
> Date: Tue, 3 Mar 2009 08:15:15 -0600
> Subject: Re: petsc-users Digest, Vol 2, Issue 33
> From: knepley at gmail.com
> To: petsc-users at mcs.anl.gov
> CC: tchouanm at msn.com
>
>
> On Tue, Mar 3, 2009 at 7:54 AM, STEPHANE TCHOUANMO <tchouanm at msn.com>wrote:
>
>  Hi all,
>
> thank you Barry for the indication you gave me.
>
> As a matter of fact, i verified my jacobian and function evaluation again
> and again but i really dont see anything wrong in it.
> So i came back to the basic Laplacian problem (- \Delta u = f ) in the unit
> cube discretized in regular hexes. The numerical scheme i use is a
> vertex-centred finite volume scheme.
> The solution i get is correct compared to the exact solution (of second
> order) and i know my jacobian and residual evalutions are correct. But here
> is the log out i get.
>
>
> Event                Count      Time (sec)
> Flops/sec                         --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecMDot               71 1.0 2.9587e-02 1.0 6.23e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 25  0  0  0   0 25  0  0  0   623
> VecNorm               77 1.0 3.3638e-02 1.0 4.24e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0    42
> VecScale              74 1.0 2.1052e-03 1.0 3.26e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0   326
> VecCopy               80 1.0 3.4863e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                 9 1.0 2.0776e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY                5 1.0 2.3208e-04 1.0 3.99e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   399
> VecWAXPY               1 1.0 6.6995e-05 1.0 1.38e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0   138
> VecMAXPY              74 1.0 3.8138e-02 1.0 5.18e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 27  0  0  0   0 27  0  0  0   518
> VecAssemblyBegin       4 1.0 9.8636e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd         4 1.0 6.9494e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin        3 1.0 3.0706e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNormalize          74 1.0 3.4648e-02 1.0 5.88e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  3  0  0  0   0  3  0  0  0    59
> MatMult               73 1.0 1.4618e-01 1.0 2.22e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 45  0  0  0   0 45  0  0  0   222
> MatAssemblyBegin       2 1.0 6.9899e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         2 1.0 6.1999e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SNESSolve              1 1.0 6.7333e+01 1.0 1.08e+06 1.0 0.0e+00 0.0e+00
> 3.0e+00 99100  0  0100  99100  0  0100     1
> SNESLineSearch         1 1.0 5.1989e-01 1.0 8.91e+04 1.0 0.0e+00 0.0e+00
> 1.0e+00  1  0  0  0 33   1  0  0  0 33     0
> SNESFunctionEval       2 1.0 1.0441e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  2  0  0  0 67   2  0  0  0 67     0
> SNESJacobianEval       1 1.0 6.6026e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00 97  0  0  0 33  97  0  0  0 33     0
> KSPGMRESOrthog        71 1.0 6.5884e-02 1.0 5.60e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0 51  0  0  0   0 51  0  0  0   560
> KSPSetup               1 1.0 2.2203e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 2.6036e-01 1.0 2.80e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0100  0  0  0   0100  0  0  0   280
> PCSetUp                1 1.0 7.9495e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> PCApply               74 1.0 3.6445e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions   Memory  Descendants' Mem.
>
> --- Event Stage 0: Main Stage
>
>            Index Set     3              3     111792     0
>                  Vec    44              3     223596     0
>          Vec Scatter     3              3          0     0
>               Matrix     1              0          0     0
>                 SNES     1              0          0     0
>        Krylov Solver     1              0          0     0
>       Preconditioner     1              0          0     0
>               Viewer     2              0          0     0
>                 Draw     1              0          0     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 1.60268e-06
>
>
> This shows that the Jacobian evaluation takes 97% of time and the residual
> just 2% in the SNESSolve. But if you look at the total MFlops, you can see
> that its null(i guess very low) for these phases. What seems to be long is
> the part in red concerning Vector manips. You can even see at the end that
> the most memory use is in Index set and Vec.
>
>
> This analysis does not make sense. If you add all the time spent in the Vec
> operations (in red), it is less than 1/100 of the time in the
> SNES Solve. There is obviously a problem in that routine, if there is
> indeed a problem. Do you have a model of the computation that
> says that this time is too long?
>
>   Matt
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> ------------------------------
> Get news, entertainment and everything you care about at Live.com. Check
> it out! <http://www.live.com/getstarted.aspx>
>



-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20090303/7437dbba/attachment-0001.htm>


More information about the petsc-users mailing list