petsc-users Digest, Vol 2, Issue 33
STEPHANE TCHOUANMO
tchouanm at msn.com
Tue Mar 3 09:17:30 CST 2009
Ok Matt you're right. The SNES Solve is definitely at fault.
But still there's something i dont understand in the log summary i get. Take for example the one for the unsteady heat equation right after:
Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 22 1.0 5.0474e-04 1.0 5.48e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 6 0 0 0 0 6 0 0 0 548
VecNorm 62 1.0 8.8694e-03 1.0 4.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 47
VecScale 32 1.0 3.8212e-04 1.0 2.83e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 283
VecCopy 81 1.0 1.1948e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 88 1.0 8.4816e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 10 1.0 1.8910e-04 1.0 3.57e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 357
VecWAXPY 10 1.0 2.6472e-04 1.0 1.27e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 127
VecMAXPY 32 1.0 1.0271e-03 1.0 4.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 9 0 0 0 0 9 0 0 0 414
VecAssemblyBegin 40 1.0 8.7160e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 40 1.0 7.5617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 39 1.0 1.5163e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 32 1.0 3.8553e-03 1.0 6.65e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 67
MatMult 22 1.0 1.5831e-02 1.0 2.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 72 0 0 0 0 72 0 0 0 216
MatAssemblyBegin 30 1.0 6.5176e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 30 1.0 1.2829e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 9 1.0 1.8313e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESSolve 10 1.0 1.7674e+01 1.0 2.69e+05 1.0 0.0e+00 0.0e+00 3.0e+01 93100 0 0 0 94100 0 0 75 0
SNESLineSearch 10 1.0 3.7443e+00 1.0 4.51e+04 1.0 0.0e+00 0.0e+00 1.0e+01 20 4 0 0 0 20 4 0 0 25 0
SNESFunctionEval 20 1.0 7.2693e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 38 0 0 0 0 39 0 0 0 50 0
SNESJacobianEval 10 1.0 1.0367e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 55 0 0 0 0 55 0 0 0 25 0
KSPGMRESOrthog 22 1.0 1.4277e-03 1.0 3.88e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 12 0 0 0 0 12 0 0 0 388
KSPSetup 10 1.0 1.3128e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 10 1.0 2.8431e-02 1.0 1.57e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 94 0 0 0 0 94 0 0 0 157
PCSetUp 10 1.0 2.5831e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 32 1.0 5.7973e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Index Set 40 40 548800 0
Vec 167 153 4198932 0
Vec Scatter 40 40 0 0
Matrix 1 0 0 0
SNES 10 9 1116 0
Krylov Solver 10 9 151920 0
Preconditioner 10 9 0 0
Viewer 1 0 0 0
========================================================================================================================
Now it says SNESSolve takes 93% of the main stage, right?
In that case what does is mean the 20% for SNESLinesearch, 38% for SNESFunctionEval and 55% for SNESJacobianEval? It cant be percentages of the main stage or of the SNESSolve. Do you have an idea?
Actually to answer your question, what is long is the first Newton iteration in the first time step and a debugging in DDD shows it too. So with the log summary i get, its obviously due to SNESSolve with Residual and Jacobian evaluations.
Here is a part of my Jacobian and Residual computation routine in LibMesh for the basic Laplacian (- \Delta u = f ). Its called 'compute_jacobian' and 'compute_residual' respectively. Could you please look at it quickly and tell me if you see at first look something strange?
void compute_jacobian (const NumericVector<Number>& soln,
SparseMatrix<Number>& jacobian)
{
EquationSystems &es = *_equation_system;
const MeshBase& mesh = es.get_mesh();
NonlinearImplicitSystem& system =
es.get_system<NonlinearImplicitSystem>("dc");
const DofMap& dof_map = system.get_dof_map();
// Define the finite volume
FV fv;
MeshBase::const_element_iterator el = mesh.active_local_elements_begin();
const MeshBase::const_element_iterator end_el = mesh.active_local_elements_end();
// The loop on every simplex
for ( ; el != end_el; ++el)
{
const Elem* elem = *el;
dof_map.dof_indices (elem, dof_indices);
fv.reinit(elem);
n_dofs = dof_indices.size(); // = 4 for a hex
Ke.resize (n_dofs, n_dofs);
// Assemble the elementary matrix for the Laplacian problem (size 4*4)
Ke=fv.elmmat(perm);
dof_map.constrain_element_matrix (Ke, dof_indices);
// Adds the small matrix Ke to the Jacobian
jacobian.add_matrix (Ke, dof_indices);
}
}
void compute_residual (const NumericVector<Number>& soln,
NumericVector<Number>& residual)
{
EquationSystems &es = *_equation_system;
const MeshBase& mesh = es.get_mesh();
NonlinearImplicitSystem& system =
es.get_system<NonlinearImplicitSystem>("dc");
const DofMap& dof_map = system.get_dof_map();
// Define the finite volume
FV fv;
residual.zero();
MeshBase::const_element_iterator el = mesh.active_local_elements_begin();
const MeshBase::const_element_iterator end_el = mesh.active_local_elements_end();
// The loop on every simplex
for ( ; el != end_el; ++el)
{
const Elem* elem = *el;
dof_map.dof_indices (elem, dof_indices);
fv.reinit(elem);
n_dofs = dof_indices.size(); // = 4 for a hex
Se.resize (n_dofs);
// Compute the solution from the previous Newton iterate
for (unsigned int l=0; l<n_dofs; l++)
Se(l) = soln(dof_indices[l]);
Re.resize (n_dofs);
elmMat=fv.elmmat(perm);
for (unsigned int i=0; i<n_dofs; i++)
{
vol=fv.elmvolume(i);
xyz=elem->point(i);
Re(i) = vol*(xyz(0)-0.5) ;
for (unsigned int j=0; j<n_dofs; j++)
Re(i) += elmMat(i,j)*Se(j) ;
}
dof_map.constrain_element_vector (Re, dof_indices);
residual.add_vector (Re, dof_indices);
}
}
and thats it!
What amazes me is that i always get the right solution after resolution.
Thanks a lot.
Stephane
Date: Tue, 3 Mar 2009 08:15:15 -0600
Subject: Re: petsc-users Digest, Vol 2, Issue 33
From: knepley at gmail.com
To: petsc-users at mcs.anl.gov
CC: tchouanm at msn.com
On Tue, Mar 3, 2009 at 7:54 AM, STEPHANE TCHOUANMO <tchouanm at msn.com> wrote:
Hi all,
thank you Barry for the indication you gave me.
As a matter of fact, i verified my jacobian and function evaluation again and again but i really dont see anything wrong in it.
So i came back to the basic Laplacian problem (- \Delta u = f ) in the unit cube discretized in regular hexes. The numerical scheme i use is a vertex-centred finite volume scheme.
The solution i get is correct compared to the exact solution (of second order) and i know my jacobian and residual evalutions are correct. But here is the log out i get.
Event Count Time (sec) Flops/sec --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 71 1.0 2.9587e-02 1.0 6.23e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 25 0 0 0 0 25 0 0 0 623
VecNorm 77 1.0 3.3638e-02 1.0 4.24e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 42
VecScale 74 1.0 2.1052e-03 1.0 3.26e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 326
VecCopy 80 1.0 3.4863e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 9 1.0 2.0776e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 5 1.0 2.3208e-04 1.0 3.99e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 399
VecWAXPY 1 1.0 6.6995e-05 1.0 1.38e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 138
VecMAXPY 74 1.0 3.8138e-02 1.0 5.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 27 0 0 0 0 27 0 0 0 518
VecAssemblyBegin 4 1.0 9.8636e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 4 1.0 6.9494e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 3 1.0 3.0706e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 74 1.0 3.4648e-02 1.0 5.88e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 59
MatMult 73 1.0 1.4618e-01 1.0 2.22e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 45 0 0 0 0 45 0 0 0 222
MatAssemblyBegin 2 1.0 6.9899e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 6.1999e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESSolve 1 1.0 6.7333e+01 1.0 1.08e+06 1.0 0.0e+00 0.0e+00 3.0e+00 99100 0 0100 99100 0 0100 1
SNESLineSearch 1 1.0 5.1989e-01 1.0 8.91e+04 1.0 0.0e+00 0.0e+00 1.0e+00 1 0 0 0 33 1 0 0 0 33 0
SNESFunctionEval 2 1.0 1.0441e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 2 0 0 0 67 2 0 0 0 67 0
SNESJacobianEval 1 1.0 6.6026e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 97 0 0 0 33 97 0 0 0 33 0
KSPGMRESOrthog 71 1.0 6.5884e-02 1.0 5.60e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 51 0 0 0 0 51 0 0 0 560
KSPSetup 1 1.0 2.2203e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.6036e-01 1.0 2.80e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0100 0 0 0 0100 0 0 0 280
PCSetUp 1 1.0 7.9495e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 74 1.0 3.6445e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Index Set 3 3 111792 0
Vec 44 3 223596 0
Vec Scatter 3 3 0 0
Matrix 1 0 0 0
SNES 1 0 0 0
Krylov Solver 1 0 0 0
Preconditioner 1 0 0 0
Viewer 2 0 0 0
Draw 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.60268e-06
This shows that the Jacobian evaluation takes 97% of time and the residual just 2% in the SNESSolve. But if you look at the total MFlops, you can see that its null(i guess very low) for these phases. What seems to be long is the part in red concerning Vector manips. You can even see at the end that the most memory use is in Index set and Vec.
This analysis does not make sense. If you add all the time spent in the Vec operations (in red), it is less than 1/100 of the time in the
SNES Solve. There is obviously a problem in that routine, if there is indeed a problem. Do you have a model of the computation that
says that this time is too long?
Matt
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
_________________________________________________________________
News, entertainment and everything you care about at Live.com. Get it now!
http://www.live.com/getstarted.aspx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20090303/61da7778/attachment-0001.htm>
More information about the petsc-users
mailing list