petsc-users Digest, Vol 2, Issue 32

Barry Smith bsmith at mcs.anl.gov
Tue Feb 24 12:25:44 CST 2009


SNESSolve            100 1.0 2.9339e+03 1.0 9.17e+05 1.0 0.0e+00 0.0e 
+00 1.0e+03100100  0  0  1 100
SNESLineSearch       202 1.0 7.9707e+02 1.0 4.35e+05 1.0 0.0e+00 0.0e 
+00 4.0e+02 27 13  0  0  0  27
SNESFunctionEval     302 1.0 1.1836e+03 1.0 0.00e+00 0.0 0.0e+00 0.0e 
+00 3.0e+02 40  0  0  0  0  40
SNESJacobianEval     202 1.0 1.7238e+03 1.0 0.00e+00 0.0 0.0e+00 0.0e 
+00 2.0e+02 59  0  0  0  0  59

The final column above (I have removed the later ones for clarity) is  
the problem. Your function evaluation is
taking 40% of the time and the Jacobian 59%

The PETSc linear solver is taking 1% so something is seriously bad  
about your function evaluation and Jacobian evaluation
code.

    Barry



On Feb 24, 2009, at 12:17 PM, STEPHANE TCHOUANMO wrote:

> Here is my -log_summary:
> Something looks a bit strange to me; its the MPI Reductions below.
> Other than that, i dont see anything relevant.
> What do you think?
> Thanks
>
>
>
> ---------------------------------------------- PETSc Performance  
> Summary: ----------------------------------------------
>
> ./diff-conv-opt on a linux-gnu named linux-stchouan with 1  
> processor, by stephane Tue Feb 24 13:54:35 2009
> Using Petsc Release Version 2.3.3, Patch 13, Thu May 15 17:29:26 CDT  
> 2008 HG revision: 4466c6289a0922df26e20626fd4a0b4dd03c8124
>
>                                                   Max       Max/ 
> Min        Avg      Total
> Time (sec):                         2.937e+03      1.00000   2.937e+03
> Objects:                              3.420e+03      1.00000   3.420e 
> +03
> Flops:                                  2.690e+09      1.00000    
> 2.690e+09  2.690e+09
> Flops/sec:                           9.161e+05      1.00000   9.161e 
> +05  9.161e+05
> MPI Messages:                  0.000e+00      0.00000   0.000e+00   
> 0.000e+00
> MPI Message Lengths:    0.000e+00      0.00000   0.000e+00  0.000e+00
> MPI Reductions:                1.189e+05      1.00000
>
> Flop counting convention: 1 flop = 1 real number operation of type  
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of  
> length N --> 2N flops
>                             and VecAXPY() for complex vectors of  
> length N --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flops -----  ---  
> Messages ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts    
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 2.9367e+03 100.0%  2.6905e+09 100.0%  0.000e 
> +00   0.0%  0.000e+00        0.0%  1.106e+03   0.9%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on  
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops/sec: Max - maximum over all processors
>                        Ratio - ratio of maximum to minimum over all  
> processors
>    Mess: number of messages sent
>    Avg. len: average message length
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with  
> PetscLogStagePush() and PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flops in  
> this phase
>       %M - percent messages in this phase     %L - percent message  
> lengths in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max  
> time over all processors)
> ------------------------------------------------------------------------------------------------------------------------
>
> Event                Count      Time (sec)     Flops/ 
> sec                         --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg  
> len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
>
> --- Event Stage 0: Main Stage
>
> VecDot               202 1.0 3.0360e-02 1.0 3.96e+08 1.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0   396
> VecMDot              202 1.0 3.0552e-02 1.0 3.94e+08 1.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0   394
> VecNorm             1110 1.0 1.2257e+00 1.0 5.40e+07 1.0 0.0e+00 0.0e 
> +00 0.0e+00  0  2  0  0  0   0  2  0  0  0    54
> VecScale             404 1.0 3.5342e-02 1.0 3.41e+08 1.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0   341
> VecCopy              507 1.0 8.4626e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet              1408 1.0 1.1664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY              202 1.0 2.6221e-02 1.0 4.59e+08 1.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0   459
> VecWAXPY             202 1.0 4.4239e-02 1.0 1.36e+08 1.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0   136
> VecMAXPY             404 1.0 7.3515e-02 1.0 3.27e+08 1.0 0.0e+00 0.0e 
> +00 0.0e+00  0  1  0  0  0   0  1  0  0  0   327
> VecAssemblyBegin     302 1.0 9.2960e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAssemblyEnd       302 1.0 5.5790e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecScatterBegin      603 1.0 1.9933e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNormalize         404 1.0 5.5408e-01 1.0 6.52e+07 1.0 0.0e+00 0.0e 
> +00 0.0e+00  0  1  0  0  0   0  1  0  0  0    65
> MatMult              404 1.0 2.6457e+00 1.0 2.26e+08 1.0 0.0e+00 0.0e 
> +00 0.0e+00  0 22  0  0  0   0 22  0  0  0   226
> MatSolve             404 1.0 4.6454e+00 1.0 1.28e+08 1.0 0.0e+00 0.0e 
> +00 0.0e+00  0 22  0  0  0   0 22  0  0  0   128
> MatLUFactorNum       202 1.0 1.5211e+01 1.0 8.85e+07 1.0 0.0e+00 0.0e 
> +00 0.0e+00  1 50  0  0  0   1 50  0  0  0    89
> MatILUFactorSym      100 1.0 1.9993e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 1.0e+02  0  0  0  0  0   0  0  0  0  9     0
> MatAssemblyBegin     404 1.0 9.6217e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd       404 1.0 1.4601e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetRowIJ          100 1.0 2.4641e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatGetOrdering       100 1.0 7.6755e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 2.0e+02  0  0  0  0  0   0  0  0  0 18     0
> MatZeroEntries        99 1.0 3.6160e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> SNESSolve            100 1.0 2.9339e+03 1.0 9.17e+05 1.0 0.0e+00 0.0e 
> +00 1.0e+03100100  0  0  1 100100  0  0 91     1
> SNESLineSearch       202 1.0 7.9707e+02 1.0 4.35e+05 1.0 0.0e+00 0.0e 
> +00 4.0e+02 27 13  0  0  0  27 13  0  0 37     0
> SNESFunctionEval     302 1.0 1.1836e+03 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 3.0e+02 40  0  0  0  0  40  0  0  0 27     0
> SNESJacobianEval     202 1.0 1.7238e+03 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 2.0e+02 59  0  0  0  0  59  0  0  0 18     0
> KSPGMRESOrthog       202 1.0 7.0303e-02 1.0 3.42e+08 1.0 0.0e+00 0.0e 
> +00 0.0e+00  0  1  0  0  0   0  1  0  0  0   342
> KSPSetup             202 1.0 4.6391e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e 
> +00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve             202 1.0 2.4101e+01 1.0 9.65e+07 1.0 0.0e+00 0.0e 
> +00 3.0e+02  1 86  0  0  0   1 86  0  0 27    97
> PCSetUp              202 1.0 1.7296e+01 1.0 7.78e+07 1.0 0.0e+00 0.0e 
> +00 3.0e+02  1 50  0  0  0   1 50  0  0 27    78
> PCApply              404 1.0 4.6487e+00 1.0 1.28e+08 1.0 0.0e+00 0.0e 
> +00 0.0e+00  0 22  0  0  0   0 22  0  0  0   128
> ------------------------------------------------------------------------------------------------------------------------
>
> Memory usage is given in bytes:
>
> Object Type          Creations   Destructions   Memory  Descendants'  
> Mem.
>
> --- Event Stage 0: Main Stage
>
>            Index Set   904            901   107564984     0
>                  Vec     1511           1497  357441684     0
>          Vec Scatter   604          604          0     0
>               Matrix       101          99     942432084     0
>                 SNES   100              99      12276     0
>        Krylov Solver   100        99    1671120     0
>       Preconditioner   100        99      14256     0
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> = 
> ======================================================================
> Average time to get PetscTime(): 1.49164e-06
> OptionTable: -snes_converged_reason
> OptionTable: -snes_max_it 20
> OptionTable: -snes_rtol 0.0000001
> OptionTable: -snes_stol 0.001
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4  
> sizeof(PetscScalar) 8
> Configure run at: Mon Feb 23 23:01:43 2009
> Configure options: --with-debugging=no -with-shared --download-mpich=1
> -----------------------------------------
>
>
>
>
> >
> > Message: 7
> > Date: Tue, 24 Feb 2009 18:05:38 +0100
> > From: STEPHANE TCHOUANMO <tchouanm at msn.com>
> > Subject: Petsc slowness on a single processor machine?
> > To: <petsc-users at mcs.anl.gov>
> > Message-ID: <BAY107-W5D076FE1F3A7037240F88C2AF0 at phx.gbl>
> > Content-Type: text/plain; charset="windows-1256"
> >
> >
> > Dear all,
> >
> > I use petsc to solve a nonlinear convection-diffusion type  
> equation in porous media.
> > The built linear system's size is 30000 and its resolution lasts  
> about a hour on a single processor machine (my laptop...).
> > I
> > run for that a hundred time steps and have in average 4 Newton
> > iterations per time step. So a hour seems huge to me with the  
> optimized
> > version of Petsc.
> > I tryed to profile my code with the options
> > "-log_summary" and "-info". What i get out of it is that the first  
> time
> > step lasts about 15min representing 25% of the total time and i  
> think
> > its not normal.
> > The thing is, these 15min happen at the second call
> > of the petsc routine "VecScatterCreate()" right after the first  
> Newton residual.
> >
> > Any idea?
> > Thanks.
> >
> > Stephane
> >
> >
> >
> > _________________________________________________________________
> > Show them the way! Add maps and directions to your party invites.
> > http://www.microsoft.com/windows/windowslive/products/events.aspx
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20090224/c695a2bf/attachment-0001.htm 
> >
> >
> > ------------------------------
> >
> > Message: 8
> > Date: Tue, 24 Feb 2009 11:22:18 -0600
> > From: Matthew Knepley <knepley at gmail.com>
> > Subject: Re: Petsc slowness on a single processor machine?
> > To: PETSc users list <petsc-users at mcs.anl.gov>
> > Message-ID:
> > <a9f269830902240922y6751b40aqf8ea84257c139fd9 at mail.gmail.com>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > 2009/2/24 STEPHANE TCHOUANMO <tchouanm at msn.com>
> >
> > > Dear all,
> > >
> > > I use petsc to solve a nonlinear convection-diffusion type  
> equation in
> > > porous media.
> > > The built linear system's size is 30000 and its resolution lasts  
> about a
> > > hour on a single processor machine (my laptop...).
> > > I run for that a hundred time steps and have in average 4 Newton  
> iterations
> > > per time step. So a hour seems huge to me with the optimized  
> version of
> > > Petsc.
> > > I tryed to profile my code with the options "-log_summary" and "- 
> info".
> > > What i get out of it is that the first time step lasts about 15min
> > > representing 25% of the total time and i think its not normal.
> > > The thing is, these 15min happen at the second call of the petsc  
> routine
> > > "VecScatterCreate()" right after the first Newton residual.
> > >
> >
> > We can't say anything without seeing the entire output of - 
> log_summary.
> >
> > Matt
> >
> >
> > >
> > > Any idea?
> > > Thanks.
> > >
> > > Stephane
> > >
> > > <http://www.live.com/getstarted.aspx>
> > > <http://www.microsoft.com/windows/windowslive/default.aspx>
> > > ------------------------------
> > > See all the ways you can stay connected to friends and family<http://www.microsoft.com/windows/windowslive/default.aspx 
> >
> > >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their  
> experiments
> > is infinitely more interesting than any results to which their  
> experiments
> > lead.
> > -- Norbert Wiener
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20090224/cd1ffccb/attachment-0001.htm 
> >
> >
> > ------------------------------
> >
> > _______________________________________________
> > petsc-users mailing list
> > petsc-users at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/petsc-users
> >
> >
> > End of petsc-users Digest, Vol 2, Issue 32
> > ******************************************
>
> Get news, entertainment and everything you care about at Live.com.  
> Check it out!



More information about the petsc-users mailing list