[petsc-users] Function evaluation slowness ?
Timothée Nicolas
timothee.nicolas at gmail.com
Tue Aug 25 01:21:24 CDT 2015
Here is the log summary (attached). At the beginning are personal prints,
you can skip. I seem to have a memory crash in the present state after
typically 45 iterations (that's why I used 40 here), the log summary
indicates some creations without destruction of Petsc objects (I will fix
this immediately), that may cause the memory crash, but I don't think it's
the cause of the slow function evaluations.
The log_summary is consistent with 0.7s per function evaluation
(4.8990e+02/666 = 0.736). In addition, SNESSolve itself takes approximately
the same amount of time (is it normal ?). And the other long operation is
VecScatterEnd. I assume it is the time used in process communications ? In
which case I suppose it is normal that it takes a significant amount of
time.
So this ~10 times increase does not look normal right ?
Best
Timothee NICOLAS
2015-08-25 14:56 GMT+09:00 Barry Smith <bsmith at mcs.anl.gov>:
>
> > On Aug 25, 2015, at 12:45 AM, Timothée Nicolas <
> timothee.nicolas at gmail.com> wrote:
> >
> > Hi,
> >
> > I am testing PETSc on the supercomputer where I used to run my explicit
> MHD code. For my tests I use 256 processes on a problem of size 128*128*640
> = 10485760, that is, 40960 grid points per process, and 8 degrees of
> freedom (or physical fields). The explicit code was using Runge-Kutta 4 for
> the time scheme, which means 4 function evaluation per time step (plus one
> operation to put everything together, but let's forget this one).
> >
> > I could thus easily determine that the typical time required for a
> function evaluation was of the order of 50 ms.
> >
> > Now with the implicit Newton-Krylov solver written in PETSc, in the
> present state where for now I have not implemented any Jacobian or
> preconditioner whatsoever (so I run with -snes_mf), I measure a typical
> time between two time steps of between 5 and 20 seconds, and the number of
> function evaluations for each time step obtained with
> SNESGetNumberFunctionEvals is 17 (I am speaking of a particular case of
> course)
> >
> > This means a time per function evaluation of about 0.5 to 1 second, that
> is, 10 to 20 times slower.
> >
> > So I have some questions about this.
> >
> > 1. First does SNESGetNumberFunctionEvals take into account the function
> evaluations required to evaluate the Jacobian when -snes_mf is used, as
> well as the operations required by the GMRES (Krylov) method ? If it were
> the case, I would somehow intuitively expect a number larger than 17, which
> could explain the increase in time.
>
> PetscErrorCode SNESGetNumberFunctionEvals(SNES snes, PetscInt *nfuncs)
> {
> *nfuncs = snes->nfuncs;
> }
>
> PetscErrorCode SNESComputeFunction(SNES snes,Vec x,Vec y)
> {
> ...
> snes->nfuncs++;
> }
>
> PetscErrorCode MatCreateSNESMF(SNES snes,Mat *J)
> {
> .....
> if (snes->pc && snes->pcside == PC_LEFT) {
> ierr = MatMFFDSetFunction(*J,(PetscErrorCode
> (*)(void*,Vec,Vec))SNESComputeFunctionDefaultNPC,snes);CHKERRQ(ierr);
> } else {
> ierr = MatMFFDSetFunction(*J,(PetscErrorCode
> (*)(void*,Vec,Vec))SNESComputeFunction,snes);CHKERRQ(ierr);
> }
> }
>
> So, yes I would expect all the function evaluations needed for the
> matrix-free Jacobian matrix vector product to be counted. You can also look
> at the number of GMRES Krylov iterations it took (which should have one
> multiply per iteration) to double check that the numbers make sense.
>
> What does your -log_summary output look like? One thing that GMRES does
> is it introduces a global reduction with each multiple (hence a barrier
> across all your processes) on some systems this can be deadly.
>
> Barry
>
>
> >
> > 2. In any case, I thought that all things considered, the function
> evaluation would be the most time consuming part of a Newton-Krylov solver,
> am I completely wrong about that ? Is the 10-20 factor legit ?
> >
> > I realize of course that preconditioning should make all this smoother,
> in particular allowing larger time steps, but here I am just concerned
> about the sheer Function evaluation time.
> >
> > Best regards
> >
> > Timothee NICOLAS
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150825/e0d8ea11/attachment-0001.html>
-------------- next part --------------
This is an implicit MHD code based on the MIPS code
Setting all options
Start: Reading HINT2 equilibrium file
DATA: lr,lz,lphi= 128 128 640
lsymmetry= 1
r_min,r_max= 2.70000000000000 4.60000000000000
z_min,z_max= -0.950000000000000 0.950000000000000
phi_min,phi_max= 0.000000000000000E+000 6.28318530717959
dr,dz,dphi= 1.496062992125984E-002 1.496062992125984E-002
9.817477042468103E-003
pmax= 1.135504809500237E-002
bmax= 2.98086676166910
End: Reading HINT2 equilibrium file
Creating nonlinear solver, getting geometrical info, and setting vectors
Allocating arrays for which it is required
Masks definition
Major radius and vacuum definition
Initializing PETSc Vecs with equilibrium values
Set the initial force local vectors (used to enforce the equilibrium)
Add a random perturbation to the velocity
Entering the main MHD Loop
Iteration number = 1
Time (tau_A) = 1.0000E-02
0 SNES Function norm 5.382589763410e-05
1 SNES Function norm 4.917642384947e-11
2 SNES Function norm 4.187109891461e-16
Kinetic Energy = 9.0028E-16
Magnetic Energy = 1.3544E-16
Total CPU time since PetscInitialize: 7.7581E+00
CPU time used for SNESSolve: 5.0903E-01
Number of linear iterations : 11
Number of function evaluations : 16
Iteration number = 2
Time (tau_A) = 2.0000E-02
0 SNES Function norm 8.317717709972e-16
1 SNES Function norm 1.373112118906e-16
Kinetic Energy = 8.9656E-16
Magnetic Energy = 5.4177E-16
Total CPU time since PetscInitialize: 5.0395E+01
CPU time used for SNESSolve: 3.0864E+01
Number of linear iterations : 6
Number of function evaluations : 9
Too few linear iterations; Time step increased to dt = 1.3333E-02
Iteration number = 3
Time (tau_A) = 3.3333E-02
0 SNES Function norm 1.732338962433e-05
1 SNES Function norm 3.798031074240e-11
2 SNES Function norm 5.941618837381e-16
Kinetic Energy = 8.8707E-16
Magnetic Energy = 1.4783E-15
Total CPU time since PetscInitialize: 6.2962E+01
CPU time used for SNESSolve: 8.4275E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 4
Time (tau_A) = 4.6667E-02
0 SNES Function norm 8.650290226831e-07
1 SNES Function norm 3.159589761232e-12
2 SNES Function norm 5.559310443305e-16
Kinetic Energy = 8.7490E-16
Magnetic Energy = 2.8512E-15
Total CPU time since PetscInitialize: 7.5298E+01
CPU time used for SNESSolve: 7.2959E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 5
Time (tau_A) = 6.0000E-02
0 SNES Function norm 1.814549199007e-06
1 SNES Function norm 6.550542684026e-12
2 SNES Function norm 5.561417252951e-16
Kinetic Energy = 8.5968E-16
Magnetic Energy = 4.6147E-15
Total CPU time since PetscInitialize: 8.6579E+01
CPU time used for SNESSolve: 8.3290E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 6
Time (tau_A) = 7.3333E-02
0 SNES Function norm 5.143158573116e-07
1 SNES Function norm 1.854573359134e-12
2 SNES Function norm 5.555619916593e-16
Kinetic Energy = 8.4209E-16
Magnetic Energy = 6.7217E-15
Total CPU time since PetscInitialize: 9.8827E+01
CPU time used for SNESSolve: 9.6296E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 7
Time (tau_A) = 8.6667E-02
0 SNES Function norm 1.241769129665e-07
1 SNES Function norm 2.229921948896e-13
2 SNES Function norm 5.547572310422e-16
Kinetic Energy = 8.2295E-16
Magnetic Energy = 9.1167E-15
Total CPU time since PetscInitialize: 1.1428E+02
CPU time used for SNESSolve: 1.1251E+01
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 8
Time (tau_A) = 1.0000E-01
0 SNES Function norm 4.127596671841e-08
1 SNES Function norm 9.722821486344e-14
2 SNES Function norm 5.547721956164e-16
Kinetic Energy = 8.0310E-16
Magnetic Energy = 1.1740E-14
Total CPU time since PetscInitialize: 1.2603E+02
CPU time used for SNESSolve: 3.1856E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 9
Time (tau_A) = 1.1333E-01
0 SNES Function norm 6.526056598290e-08
1 SNES Function norm 1.551059106556e-13
2 SNES Function norm 5.554664894406e-16
Kinetic Energy = 7.8340E-16
Magnetic Energy = 1.4531E-14
Total CPU time since PetscInitialize: 1.3916E+02
CPU time used for SNESSolve: 1.0360E+01
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 10
Time (tau_A) = 1.2667E-01
0 SNES Function norm 7.216736548862e-08
1 SNES Function norm 1.709747035544e-13
2 SNES Function norm 5.551116565705e-16
Kinetic Energy = 7.6463E-16
Magnetic Energy = 1.7432E-14
Total CPU time since PetscInitialize: 1.5486E+02
CPU time used for SNESSolve: 1.4236E+01
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 11
Time (tau_A) = 1.4000E-01
0 SNES Function norm 7.957568936931e-08
1 SNES Function norm 7.164583362638e-13
2 SNES Function norm 5.547681147827e-16
Kinetic Energy = 7.4749E-16
Magnetic Energy = 2.0389E-14
Total CPU time since PetscInitialize: 1.6355E+02
CPU time used for SNESSolve: 5.0159E+00
Number of linear iterations : 11
Number of function evaluations : 16
Iteration number = 12
Time (tau_A) = 1.5333E-01
0 SNES Function norm 8.580881227555e-08
1 SNES Function norm 7.177441407594e-13
2 SNES Function norm 5.549122251796e-16
Kinetic Energy = 7.3253E-16
Magnetic Energy = 2.3356E-14
Total CPU time since PetscInitialize: 1.7703E+02
CPU time used for SNESSolve: 8.7550E+00
Number of linear iterations : 11
Number of function evaluations : 16
Iteration number = 13
Time (tau_A) = 1.6667E-01
0 SNES Function norm 9.036148523559e-08
1 SNES Function norm 7.596864452710e-13
2 SNES Function norm 5.552150394126e-16
Kinetic Energy = 7.2017E-16
Magnetic Energy = 2.6296E-14
Total CPU time since PetscInitialize: 1.9497E+02
CPU time used for SNESSolve: 1.0766E+01
Number of linear iterations : 11
Number of function evaluations : 16
Iteration number = 14
Time (tau_A) = 1.8000E-01
0 SNES Function norm 9.318522525551e-08
1 SNES Function norm 8.310283593293e-13
2 SNES Function norm 5.549000838284e-16
Kinetic Energy = 7.1065E-16
Magnetic Energy = 2.9180E-14
Total CPU time since PetscInitialize: 2.0960E+02
CPU time used for SNESSolve: 8.9322E+00
Number of linear iterations : 11
Number of function evaluations : 16
Iteration number = 15
Time (tau_A) = 1.9333E-01
0 SNES Function norm 9.426477938676e-08
1 SNES Function norm 9.230910004645e-13
2 SNES Function norm 5.545358678033e-16
Kinetic Energy = 7.0402E-16
Magnetic Energy = 3.1992E-14
Total CPU time since PetscInitialize: 2.2222E+02
CPU time used for SNESSolve: 3.2721E+00
Number of linear iterations : 11
Number of function evaluations : 16
Iteration number = 16
Time (tau_A) = 2.0667E-01
0 SNES Function norm 9.362326845768e-08
1 SNES Function norm 1.296102793360e-13
2 SNES Function norm 5.547628931478e-16
Kinetic Energy = 7.0019E-16
Magnetic Energy = 3.4725E-14
Total CPU time since PetscInitialize: 2.4133E+02
CPU time used for SNESSolve: 4.5367E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 17
Time (tau_A) = 2.2000E-01
0 SNES Function norm 9.133250113813e-08
1 SNES Function norm 1.149831225960e-13
2 SNES Function norm 5.548205596917e-16
Kinetic Energy = 6.9892E-16
Magnetic Energy = 3.7381E-14
Total CPU time since PetscInitialize: 2.5612E+02
CPU time used for SNESSolve: 7.0432E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 18
Time (tau_A) = 2.3333E-01
0 SNES Function norm 8.751516705490e-08
1 SNES Function norm 1.044612716823e-13
2 SNES Function norm 5.546955425427e-16
Kinetic Energy = 6.9986E-16
Magnetic Energy = 3.9970E-14
Total CPU time since PetscInitialize: 2.7746E+02
CPU time used for SNESSolve: 1.7779E+01
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 19
Time (tau_A) = 2.4667E-01
0 SNES Function norm 8.234829805103e-08
1 SNES Function norm 9.821327679226e-14
2 SNES Function norm 5.548194234735e-16
Kinetic Energy = 7.0258E-16
Magnetic Energy = 4.2510E-14
Total CPU time since PetscInitialize: 2.8796E+02
CPU time used for SNESSolve: 7.1363E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 20
Time (tau_A) = 2.6000E-01
0 SNES Function norm 7.607020340067e-08
1 SNES Function norm 9.607454720992e-14
2 SNES Function norm 5.546700693492e-16
Kinetic Energy = 7.0658E-16
Magnetic Energy = 4.5021E-14
Total CPU time since PetscInitialize: 3.0141E+02
CPU time used for SNESSolve: 9.5897E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 21
Time (tau_A) = 2.7333E-01
0 SNES Function norm 6.899369214087e-08
1 SNES Function norm 9.762527001937e-14
2 SNES Function norm 5.546534365057e-16
Kinetic Energy = 7.1138E-16
Magnetic Energy = 4.7526E-14
Total CPU time since PetscInitialize: 3.1839E+02
CPU time used for SNESSolve: 1.2120E+01
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 22
Time (tau_A) = 2.8667E-01
0 SNES Function norm 6.153013990112e-08
1 SNES Function norm 1.022809264903e-13
2 SNES Function norm 5.559191241468e-16
Kinetic Energy = 7.1650E-16
Magnetic Energy = 5.0049E-14
Total CPU time since PetscInitialize: 3.3729E+02
CPU time used for SNESSolve: 1.4010E+01
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 23
Time (tau_A) = 3.0000E-01
0 SNES Function norm 5.422883432610e-08
1 SNES Function norm 1.093821278111e-13
2 SNES Function norm 5.545153520497e-16
Kinetic Energy = 7.2152E-16
Magnetic Energy = 5.2614E-14
Total CPU time since PetscInitialize: 3.4506E+02
CPU time used for SNESSolve: 3.1828E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 24
Time (tau_A) = 3.1333E-01
0 SNES Function norm 4.782401973928e-08
1 SNES Function norm 1.182852644926e-13
2 SNES Function norm 5.545048290555e-16
Kinetic Energy = 7.2610E-16
Magnetic Energy = 5.5240E-14
Total CPU time since PetscInitialize: 3.5751E+02
CPU time used for SNESSolve: 3.4274E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 25
Time (tau_A) = 3.2667E-01
0 SNES Function norm 4.322955973001e-08
1 SNES Function norm 1.283072300260e-13
2 SNES Function norm 5.545681547284e-16
Kinetic Energy = 7.2996E-16
Magnetic Energy = 5.7944E-14
Total CPU time since PetscInitialize: 3.7469E+02
CPU time used for SNESSolve: 2.5263E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 26
Time (tau_A) = 3.4000E-01
0 SNES Function norm 4.132314115916e-08
1 SNES Function norm 1.386969354039e-13
2 SNES Function norm 5.540520147662e-16
Kinetic Energy = 7.3297E-16
Magnetic Energy = 6.0737E-14
Total CPU time since PetscInitialize: 3.8264E+02
CPU time used for SNESSolve: 3.1395E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 27
Time (tau_A) = 3.5333E-01
0 SNES Function norm 4.245684761661e-08
1 SNES Function norm 1.485504132442e-13
2 SNES Function norm 5.545487475751e-16
Kinetic Energy = 7.3503E-16
Magnetic Energy = 6.3627E-14
Total CPU time since PetscInitialize: 3.9247E+02
CPU time used for SNESSolve: 5.9592E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 28
Time (tau_A) = 3.6667E-01
0 SNES Function norm 4.616289783185e-08
1 SNES Function norm 1.567803392197e-13
2 SNES Function norm 5.548329671559e-16
Kinetic Energy = 7.3618E-16
Magnetic Energy = 6.6616E-14
Total CPU time since PetscInitialize: 4.0173E+02
CPU time used for SNESSolve: 5.8294E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 29
Time (tau_A) = 3.8000E-01
0 SNES Function norm 5.149150954830e-08
1 SNES Function norm 1.622328092002e-13
2 SNES Function norm 5.545725405186e-16
Kinetic Energy = 7.3651E-16
Magnetic Energy = 6.9706E-14
Total CPU time since PetscInitialize: 4.1289E+02
CPU time used for SNESSolve: 7.4448E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 30
Time (tau_A) = 3.9333E-01
0 SNES Function norm 5.751421272236e-08
1 SNES Function norm 1.639703075993e-13
2 SNES Function norm 5.551145331563e-16
Kinetic Energy = 7.3617E-16
Magnetic Energy = 7.2890E-14
Total CPU time since PetscInitialize: 4.2473E+02
CPU time used for SNESSolve: 9.8094E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 31
Time (tau_A) = 4.0667E-01
0 SNES Function norm 6.353011943021e-08
1 SNES Function norm 1.616610594001e-13
2 SNES Function norm 5.542606202109e-16
Kinetic Energy = 7.3536E-16
Magnetic Energy = 7.6164E-14
Total CPU time since PetscInitialize: 4.3767E+02
CPU time used for SNESSolve: 9.9831E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 32
Time (tau_A) = 4.2000E-01
0 SNES Function norm 6.905745342631e-08
1 SNES Function norm 1.557783742592e-13
2 SNES Function norm 5.544338434073e-16
Kinetic Energy = 7.3427E-16
Magnetic Energy = 7.9520E-14
Total CPU time since PetscInitialize: 4.6580E+02
CPU time used for SNESSolve: 2.7991E+01
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 33
Time (tau_A) = 4.3333E-01
0 SNES Function norm 7.377639093458e-08
1 SNES Function norm 1.474466471932e-13
2 SNES Function norm 5.543376070989e-16
Kinetic Energy = 7.3312E-16
Magnetic Energy = 8.2950E-14
Total CPU time since PetscInitialize: 4.8419E+02
CPU time used for SNESSolve: 1.4762E+01
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 34
Time (tau_A) = 4.4667E-01
0 SNES Function norm 7.748319943328e-08
1 SNES Function norm 1.381088301393e-13
2 SNES Function norm 5.550610629314e-16
Kinetic Energy = 7.3209E-16
Magnetic Energy = 8.6449E-14
Total CPU time since PetscInitialize: 4.9802E+02
CPU time used for SNESSolve: 1.3800E+01
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 35
Time (tau_A) = 4.6000E-01
0 SNES Function norm 8.006178457271e-08
1 SNES Function norm 1.290725563320e-13
2 SNES Function norm 5.542186770005e-16
Kinetic Energy = 7.3134E-16
Magnetic Energy = 9.0010E-14
Total CPU time since PetscInitialize: 5.2157E+02
CPU time used for SNESSolve: 1.5918E+01
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 36
Time (tau_A) = 4.7333E-01
0 SNES Function norm 8.146728382443e-08
1 SNES Function norm 1.213819113579e-13
2 SNES Function norm 5.543903059087e-16
Kinetic Energy = 7.3097E-16
Magnetic Energy = 9.3631E-14
Total CPU time since PetscInitialize: 5.3299E+02
CPU time used for SNESSolve: 7.5341E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 37
Time (tau_A) = 4.8667E-01
0 SNES Function norm 8.171674223552e-08
1 SNES Function norm 1.157134150395e-13
2 SNES Function norm 5.542672888432e-16
Kinetic Energy = 7.3106E-16
Magnetic Energy = 9.7311E-14
Total CPU time since PetscInitialize: 5.5695E+02
CPU time used for SNESSolve: 1.8279E+01
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 38
Time (tau_A) = 5.0000E-01
0 SNES Function norm 8.088378896911e-08
1 SNES Function norm 1.124615100310e-13
2 SNES Function norm 5.544771436497e-16
Kinetic Energy = 7.3164E-16
Magnetic Energy = 1.0105E-13
Total CPU time since PetscInitialize: 5.7506E+02
CPU time used for SNESSolve: 4.0146E+00
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 39
Time (tau_A) = 5.1333E-01
0 SNES Function norm 7.909554806664e-08
1 SNES Function norm 1.117368923673e-13
2 SNES Function norm 5.552869590808e-16
Kinetic Energy = 7.3268E-16
Magnetic Energy = 1.0485E-13
Total CPU time since PetscInitialize: 5.9949E+02
CPU time used for SNESSolve: 1.7531E+01
Number of linear iterations : 12
Number of function evaluations : 17
Iteration number = 40
Time (tau_A) = 5.2667E-01
0 SNES Function norm 7.653064797874e-08
1 SNES Function norm 1.134332671074e-13
2 SNES Function norm 5.551034395178e-16
Kinetic Energy = 7.3414E-16
Magnetic Energy = 1.0872E-13
Total CPU time since PetscInitialize: 6.2212E+02
CPU time used for SNESSolve: 1.4068E+01
Number of linear iterations : 12
Number of function evaluations : 17
Exiting the main MHD Loop
Deallocating remaining arrays
Destroying remaining Petsc elements
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./a.out.local on a arch-linux2-cxx-opt named helios1497 with 256 processors, by tnicolas Tue Aug 25 14:55:05 2015
Using Petsc Release Version 3.6.0, Jun, 09, 2015
Max Max/Min Avg Total
Time (sec): 6.518e+02 1.00002 6.518e+02
Objects: 1.666e+03 1.00000 1.666e+03
Flops: 4.582e+09 1.00000 4.582e+09 1.173e+12
Flops/sec: 7.029e+06 1.00002 7.029e+06 1.799e+09
MPI Messages: 8.081e+03 1.49454 6.746e+03 1.727e+06
MPI Message Lengths: 7.891e+08 3.82180 3.905e+04 6.744e+10
MPI Reductions: 3.368e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 6.5182e+02 100.0% 1.1729e+12 100.0% 1.727e+06 100.0% 3.905e+04 100.0% 3.367e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
SNESSolve 40 1.0 4.9745e+02 3.3 4.25e+09 1.0 1.7e+06 3.8e+04 2.7e+03 46 93 99 95 80 46 93 99 95 80 2187
SNESFunctionEval 666 1.0 4.8990e+02 3.4 5.73e+08 1.0 1.7e+06 3.8e+04 1.3e+03 45 13 99 95 40 45 13 99 95 40 299
SNESJacobianEval 79 1.0 2.8245e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESLineSearch 79 1.0 3.8578e+00 1.0 4.98e+08 1.0 4.0e+05 3.8e+04 6.3e+02 1 11 23 23 19 1 11 23 23 19 33068
VecView 2 1.0 3.2757e+01 6.9 0.00e+00 0.0 1.6e+04 1.9e+05 9.0e+00 3 0 1 5 0 3 0 1 5 0 0
VecDot 79 1.0 2.1557e-01 5.1 4.53e+07 1.0 0.0e+00 0.0e+00 7.9e+01 0 1 0 0 2 0 1 0 0 2 53797
VecMDot 468 1.0 2.1886e+00 2.4 9.31e+08 1.0 0.0e+00 0.0e+00 4.7e+02 0 20 0 0 14 0 20 0 0 14 108862
VecNorm 781 1.0 6.5641e-01 1.3 4.48e+08 1.0 0.0e+00 0.0e+00 7.8e+02 0 10 0 0 23 0 10 0 0 23 174663
VecScale 1202 1.0 5.7695e-01 2.1 3.45e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 8 0 0 0 0 8 0 0 0 152921
VecCopy 1092 1.0 1.3964e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 827 1.0 1.2017e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1661 1.0 2.1384e+00 1.3 9.52e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 21 0 0 0 0 21 0 0 0 114027
VecWAXPY 1365 1.0 2.9492e+00 1.1 5.48e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 12 0 0 0 0 12 0 0 0 47586
VecMAXPY 547 1.0 1.7110e+00 1.0 1.20e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 179407
VecAssemblyBegin 5 1.0 2.7040e-0213.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.5e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 5 1.0 8.6308e-0511.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 80 1.0 1.4662e-01 1.2 2.29e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 40050
VecScatterBegin 1335 1.0 2.4852e+00 1.2 0.00e+00 0.0 1.7e+06 3.8e+04 0.0e+00 0 0 99 97 0 0 0 99 97 0 0
VecScatterEnd 1335 1.0 3.4761e+02 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 31 0 0 0 0 31 0 0 0 0 0
VecReduceArith 158 1.0 7.9838e-02 1.1 9.06e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 290521
VecReduceComm 79 1.0 1.2613e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 7.9e+01 0 0 0 0 2 0 0 0 0 2 0
VecNormalize 547 1.0 4.6664e-01 1.2 4.25e+08 1.0 0.0e+00 0.0e+00 4.7e+02 0 9 0 0 14 0 9 0 0 14 233267
MatMult MF 547 1.0 1.2570e+01 1.1 1.27e+09 1.0 1.4e+06 3.8e+04 1.1e+03 2 28 81 78 34 2 28 81 78 34 25962
MatMult 547 1.0 1.2571e+01 1.1 1.27e+09 1.0 1.4e+06 3.8e+04 1.1e+03 2 28 81 78 34 2 28 81 78 34 25960
MatAssemblyBegin 79 1.0 1.8835e-0519.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 79 1.0 2.2821e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 468 1.0 3.4735e+00 1.6 1.86e+09 1.0 0.0e+00 0.0e+00 4.7e+02 0 41 0 0 14 0 41 0 0 14 137185
KSPSetUp 79 1.0 7.9496e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 79 1.0 1.4808e+01 1.0 3.69e+09 1.0 1.2e+06 3.8e+04 1.9e+03 2 81 69 67 58 2 81 69 67 58 63857
PCSetUp 79 1.0 2.1935e-05 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 547 1.0 6.8284e-01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
SNES 1 1 1332 0
SNESLineSearch 1 1 864 0
DMSNES 2 1 664 0
Vector 1630 290 654183152 0
Vector Scatter 3 1 1600 0
MatMFFD 1 1 784 0
Matrix 1 1 2304 0
Distributed Mesh 3 2 9456 0
Star Forest Bipartite Graph 6 4 3312 0
Discrete System 3 2 1696 0
Index Set 6 6 187248 0
IS L to G Mapping 2 1 948 0
Krylov Solver 1 1 18360 0
DMKSP interface 1 0 0 0
Preconditioner 1 1 880 0
Viewer 3 2 1536 0
PetscRandom 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 9.39369e-06
Average time for zero size MPI_Send(): 0.000582926
#PETSc Option Table entries:
-dt 0.01
-log_summary
-nts 40
-skip 500
-snes_mf
-snes_monitor
-time_limit 35950
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/csc/softs/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real --with-clanguage=cxx --with-debugging=0 --with-x=0 --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-fortran --known-mpi-shared-libraries=1 --with-scalar-type=real --with-precision=double --CFLAGS="-g -O3 -mavx -mkl" --CXXFLAGS="-g -O3 -mavx -mkl" --FFLAGS="-g -O3 -mavx -mkl"
-----------------------------------------
Libraries compiled on Mon Jun 22 11:05:39 2015 on helios85
Machine characteristics: Linux-2.6.32-504.16.2.el6.Bull.74.x86_64-x86_64-with-redhat-6.4-Santiago
Using PETSc directory: /csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0
Using PETSc arch: arch-linux2-cxx-opt
-----------------------------------------
Using C compiler: mpicxx -g -O3 -mavx -mkl -fPIC ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpif90 -g -O3 -mavx -mkl -fPIC ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/arch-linux2-cxx-opt/include -I/csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include -I/csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/include -I/csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/arch-linux2-cxx-opt/include -I/opt/mpi/bullxmpi/1.2.8.2/include
-----------------------------------------
Using C linker: mpicxx
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/arch-linux2-cxx-opt/lib -L/csc/releases/buildlog/anl/petsc-3.6.0/intel-15.0.0.090/bullxmpi-1.2.8.2/real/petsc-3.6.0/arch-linux2-cxx-opt/lib -lpetsc -lhwloc -lxml2 -lssl -lcrypto -Wl,-rpath,/opt/mpi/bullxmpi/1.2.8.2/lib -L/opt/mpi/bullxmpi/1.2.8.2/lib -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lmpi_f90 -lmpi_f77 -lm -lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/mpi/bullxmpi/1.2.8.2/lib -L/opt/mpi/bullxmpi/1.2.8.2/lib -lmpi -lnuma -lrt -lnsl -lutil -Wl,-rpath,/opt/mpi/bullxmpi/1.2.8.2/lib -L/opt/mpi/bullxmpi/1.2.8.2/lib -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -limf -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread -lirc_s -Wl,-rpath,/opt/mpi/bullxmpi/1.2.8.2/lib -L/opt/mpi/bullxmpi/1.2.8.2/lib -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/compiler/lib/intel64 -Wl,-rpath,/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -L/opt/intel/composer_xe_2015.0.090/mkl/lib/intel64 -ldl
-----------------------------------------
More information about the petsc-users
mailing list