[petsc-dev] [petsc-users] Poor weak scaling when solving successive linearsystems

Junchao Zhang jczhang at mcs.anl.gov
Sun Jun 10 23:46:39 CDT 2018


I used an LCRC machine named Bebop. I tested on its Intel Broadwell nodes.
Each nodes has 2 CPUs and 36 cores in total. I collected data using 36
cores in a node or 18 cores in a node.  As you can see, 18 cores/node gave
much better performance, which is reasonable as routines like MatSOR,
MatMult, MatMultAdd are all bandwidth bound.

The code uses a DMDA 3D grid, 7-point stencil, and defines nodes(vertices)
at the surface or second to the surface as boundary nodes. Boundary nodes
only have a diagonal one in their row in the matrix. Interior nodes have 7
nonzeros in their row. Boundary processors in the processor grid has less
nonzero. This is one source of load-imbalance. Will load-imbalance get
severer at coarser grids of an MG level?

I attach a trace view figure that show activity of each ranks along the
time axis in one KSPSove. White color means MPI wait. You can see white
takes a large space.

I don't have a good explanation why at large scale (1728 cores), processors
wait longer time, as the communication pattern is still 7-point stencil in
a cubic processor gird.

--Junchao Zhang

On Sat, Jun 9, 2018 at 11:32 AM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>   Junchao,
>
>       Thanks, the load balance of matrix entries is remarkably similar for
> the two runs so it can't be a matter of worse work load imbalance for SOR
> for the larger case explaining why the SOR takes more time.
>
>       Here is my guess (and I know no way to confirm it). In the smaller
> case the overlap of different processes on the same node running SOR at the
> same time is lower than the larger case hence the larger case is slower
> because there are more SOR processes fighting over the same memory
> bandwidth at the same time than in the smaller case.   Ahh, here is
> something you can try, lets undersubscribe the memory bandwidth needs, run
> on say 16 processes per node with 8 nodes and 16 processes per node with 64
> nodes and send the two -log_view output files. I assume this is an LCRC
> machine and NOT a KNL system?
>
>    Thanks
>
>
>    Barry
>
>
> > On Jun 9, 2018, at 8:29 AM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > -pc_gamg_type classical
> >
> > FYI, we only support smoothed aggregation "agg" (the default). (This
> thread started by saying you were using GAMG.)
> >
> > It is not clear how much this will make a difference for you, but you
> don't want to use classical because we do not support it. It is meant as a
> reference implementation for developers.
> >
> > First, how did you get the idea to use classical? If the documentation
> lead you to believe this was a good thing to do then we need to fix that!
> >
> > Anyway, here is a generic input for GAMG:
> >
> > -pc_type gamg
> > -pc_gamg_type agg
> > -pc_gamg_agg_nsmooths 1
> > -pc_gamg_coarse_eq_limit 1000
> > -pc_gamg_reuse_interpolation true
> > -pc_gamg_square_graph 1
> > -pc_gamg_threshold 0.05
> > -pc_gamg_threshold_scale .0
> >
> >
> >
> >
> > On Thu, Jun 7, 2018 at 6:52 PM, Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
> > OK, I have thought that space was a typo. btw, this option does not show
> up in -h.
> > I changed number of ranks to use all cores on each node to avoid
> misleading ratio in -log_view. Since one node has 36 cores, I ran with
> 6^3=216 ranks, and 12^3=1728 ranks. I also found call counts of MatSOR etc
> in the two tests were different. So they are not strict weak scaling tests.
> I tried to add -ksp_max_it 6 -pc_mg_levels 6, but still could not make the
> two have the same MatSOR count. Anyway, I attached the load balance output.
> >
> > I find PCApply_MG calls PCMGMCycle_Private, which is recursive and
> indirectly calls MatSOR_MPIAIJ. I believe the following code in
> MatSOR_MPIAIJ practically syncs {MatSOR, MatMultAdd}_SeqAIJ  between
> processors through VecScatter at each MG level. If SOR and MatMultAdd are
> imbalanced, the cost is accumulated along MG levels and shows up as large
> VecScatter cost.
> > 1460:     while
> >  (its--) {
> >
> > 1461:       VecScatterBegin(mat->Mvctx,xx,mat->lvec,INSERT_VALUES,
> SCATTER_FORWARD
> > );
> >
> > 1462:       VecScatterEnd(mat->Mvctx,xx,mat->lvec,INSERT_VALUES,
> SCATTER_FORWARD
> > );
> >
> >
> > 1464:       /* update rhs: bb1 = bb - B*x */
> > 1465:       VecScale
> > (mat->lvec,-1.0);
> >
> > 1466:       (*mat->B->ops->multadd)(mat->
> > B,mat->lvec,bb,bb1);
> >
> >
> > 1468:       /* local sweep */
> > 1469:       (*mat->A->ops->sor)(mat->A,bb1,omega,SOR_SYMMETRIC_SWEEP,
> > fshift,lits,1,xx);
> >
> > 1470:     }
> >
> >
> >
> > --Junchao Zhang
> >
> > On Thu, Jun 7, 2018 at 3:11 PM, Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >
> >
> > > On Jun 7, 2018, at 12:27 PM, Zhang, Junchao <jczhang at mcs.anl.gov>
> wrote:
> > >
> > > Searched but could not find this option, -mat_view::load_balance
> >
> >    There is a space between the view and the :   load_balance is a
> particular viewer format that causes the printing of load balance
> information about number of nonzeros in the matrix.
> >
> >    Barry
> >
> > >
> > > --Junchao Zhang
> > >
> > > On Thu, Jun 7, 2018 at 10:46 AM, Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> > >  So the only surprise in the results is the SOR. It is embarrassingly
> parallel and normally one would not see a jump.
> > >
> > >  The load balance for SOR time 1.5 is better at 1000 processes than
> for 125 processes of 2.1  not worse so this number doesn't easily explain
> it.
> > >
> > >  Could you run the 125 and 1000 with -mat_view ::load_balance and see
> what you get out?
> > >
> > >    Thanks
> > >
> > >      Barry
> > >
> > >  Notice that the MatSOR time jumps a lot about 5 secs when the
> -log_sync is on. My only guess is that the MatSOR is sharing memory
> bandwidth (or some other resource? cores?) with the VecScatter and for some
> reason this is worse for 1000 cores but I don't know why.
> > >
> > > > On Jun 6, 2018, at 9:13 PM, Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
> > > >
> > > > Hi, PETSc developers,
> > > >  I tested Michael Becker's code. The code calls the same KSPSolve
> 1000 times in the second stage and needs cubic number of processors to run.
> I ran with 125 ranks and 1000 ranks, with or without -log_sync option. I
> attach the log view output files and a scaling loss excel file.
> > > >  I profiled the code with 125 processors. It looks {MatSOR, MatMult,
> MatMultAdd, MatMultTranspose, MatMultTransposeAdd}_SeqAIJ in aij.c took
> ~50% of the time,  The other half time was spent on waiting in MPI.
> MatSOR_SeqAIJ took 30%, mostly in PetscSparseDenseMinusDot().
> > > >  I tested it on a 36 cores/node machine. I found 32 ranks/node gave
> better performance (about 10%) than 36 ranks/node in the 125 ranks
> testing.  I guess it is because processors in the former had more balanced
> memory bandwidth. I collected PAPI_DP_OPS (double precision operations) and
> PAPI_TOT_CYC (total cycles) of the 125 ranks case (see the attached files).
> It looks ranks at the two ends have less DP_OPS and TOT_CYC.
> > > >  Does anyone familiar with the algorithm have quick explanations?
> > > >
> > > > --Junchao Zhang
> > > >
> > > > On Mon, Jun 4, 2018 at 11:59 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de> wrote:
> > > > Hello again,
> > > >
> > > > this took me longer than I anticipated, but here we go.
> > > > I did reruns of the cases where only half the processes per node
> were used (without -log_sync):
> > > >
> > > >                     125 procs,1st           125 procs,2nd
> 1000 procs,1st          1000 procs,2nd
> > > >                   Max        Ratio        Max        Ratio
> Max        Ratio        Max        Ratio
> > > > KSPSolve           1.203E+02    1.0        1.210E+02    1.0
> 1.399E+02    1.1        1.365E+02    1.0
> > > > VecTDot            6.376E+00    3.7        6.551E+00    4.0
> 7.885E+00    2.9        7.175E+00    3.4
> > > > VecNorm            4.579E+00    7.1        5.803E+00   10.2
> 8.534E+00    6.9        6.026E+00    4.9
> > > > VecScale           1.070E-01    2.1        1.129E-01    2.2
> 1.301E-01    2.5        1.270E-01    2.4
> > > > VecCopy            1.123E-01    1.3        1.149E-01    1.3
> 1.301E-01    1.6        1.359E-01    1.6
> > > > VecSet             7.063E-01    1.7        6.968E-01    1.7
> 7.432E-01    1.8        7.425E-01    1.8
> > > > VecAXPY            1.166E+00    1.4        1.167E+00    1.4
> 1.221E+00    1.5        1.279E+00    1.6
> > > > VecAYPX            1.317E+00    1.6        1.290E+00    1.6
> 1.536E+00    1.9        1.499E+00    2.0
> > > > VecScatterBegin    6.142E+00    3.2        5.974E+00    2.8
> 6.448E+00    3.0        6.472E+00    2.9
> > > > VecScatterEnd      3.606E+01    4.2        3.551E+01    4.0
> 5.244E+01    2.7        4.995E+01    2.7
> > > > MatMult            3.561E+01    1.6        3.403E+01    1.5
> 3.435E+01    1.4        3.332E+01    1.4
> > > > MatMultAdd         1.124E+01    2.0        1.130E+01    2.1
> 2.093E+01    2.9        1.995E+01    2.7
> > > > MatMultTranspose   1.372E+01    2.5        1.388E+01    2.6
> 1.477E+01    2.2        1.381E+01    2.1
> > > > MatSolve           1.949E-02    0.0        1.653E-02    0.0
> 4.789E-02    0.0        4.466E-02    0.0
> > > > MatSOR             6.610E+01    1.3        6.673E+01    1.3
> 7.111E+01    1.3        7.105E+01    1.3
> > > > MatResidual        2.647E+01    1.7        2.667E+01    1.7
> 2.446E+01    1.4        2.467E+01    1.5
> > > > PCSetUpOnBlocks    5.266E-03    1.4        5.295E-03    1.4
> 5.427E-03    1.5        5.289E-03    1.4
> > > > PCApply            1.031E+02    1.0        1.035E+02    1.0
> 1.180E+02    1.0        1.164E+02    1.0
> > > >
> > > > I also slimmed down my code and basically wrote a simple weak
> scaling test (source files attached) so you can profile it yourself. I
> appreciate the offer Junchao, thank you.
> > > > You can adjust the system size per processor at runtime via
> "-nodes_per_proc 30" and the number of repeated calls to the function
> containing KSPsolve() via "-iterations 1000". The physical problem is
> simply calculating the electric potential from a homogeneous charge
> distribution, done multiple times to accumulate time in KSPsolve().
> > > > A job would be started using something like
> > > > mpirun -n 125 ~/petsc_ws/ws_test -nodes_per_proc 30 -mesh_size 1E-4
> -iterations 1000 \\
> > > > -ksp_rtol 1E-6 \
> > > > -log_view -log_sync\
> > > > -pc_type gamg -pc_gamg_type classical\
> > > > -ksp_type cg \
> > > > -ksp_norm_type unpreconditioned \
> > > > -mg_levels_ksp_type richardson \
> > > > -mg_levels_ksp_norm_type none \
> > > > -mg_levels_pc_type sor \
> > > > -mg_levels_ksp_max_it 1 \
> > > > -mg_levels_pc_sor_its 1 \
> > > > -mg_levels_esteig_ksp_type cg \
> > > > -mg_levels_esteig_ksp_max_it 10 \
> > > > -gamg_est_ksp_type cg
> > > > , ideally started on a cube number of processes for a cubical
> process grid.
> > > > Using 125 processes and 10.000 iterations I get the output in
> "log_view_125_new.txt", which shows the same imbalance for me.
> > > > Michael
> > > >
> > > >
> > > > Am 02.06.2018 um 13:40 schrieb Mark Adams:
> > > >>
> > > >>
> > > >> On Fri, Jun 1, 2018 at 11:20 PM, Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
> > > >> Hi,Michael,
> > > >>  You can add -log_sync besides -log_view, which adds barriers to
> certain events but measures barrier time separately from the events. I find
> this option makes it easier to interpret log_view output.
> > > >>
> > > >> That is great (good to know).
> > > >>
> > > >> This should give us a better idea if your large VecScatter costs
> are from slow communication or if it catching some sort of load imbalance.
> > > >>
> > > >>
> > > >> --Junchao Zhang
> > > >>
> > > >> On Wed, May 30, 2018 at 3:27 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de> wrote:
> > > >> Barry: On its way. Could take a couple days again.
> > > >>
> > > >> Junchao: I unfortunately don't have access to a cluster with a
> faster network. This one has a mixed 4X QDR-FDR InfiniBand 2:1 blocking
> fat-tree network, which I realize causes parallel slowdown if the nodes are
> not connected to the same switch. Each node has 24 processors (2x12/socket)
> and four NUMA domains (two for each socket).
> > > >> The ranks are usually not distributed perfectly even, i.e. for 125
> processes, of the six required nodes, five would use 21 cores and one 20.
> > > >> Would using another CPU type make a difference communication-wise?
> I could switch to faster ones (on the same network), but I always assumed
> this would only improve performance of the stuff that is unrelated to
> communication.
> > > >>
> > > >> Michael
> > > >>
> > > >>
> > > >>
> > > >>> The log files have something like "Average time for zero size
> MPI_Send(): 1.84231e-05". It looks you ran on a cluster with a very slow
> network. A typical machine should give less than 1/10 of the latency you
> have. An easy way to try is just running the code on a machine with a
> faster network and see what happens.
> > > >>>
> > > >>> Also, how many cores & numa domains does a compute node have? I
> could not figure out how you distributed the 125 MPI ranks evenly.
> > > >>>
> > > >>> --Junchao Zhang
> > > >>>
> > > >>> On Tue, May 29, 2018 at 6:18 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de> wrote:
> > > >>> Hello again,
> > > >>>
> > > >>> here are the updated log_view files for 125 and 1000 processors. I
> ran both problems twice, the first time with all processors per node
> allocated ("-1.txt"), the second with only half on twice the number of
> nodes ("-2.txt").
> > > >>>
> > > >>>>> On May 24, 2018, at 12:24 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>> I noticed that for every individual KSP iteration, six vector
> objects are created and destroyed (with CG, more with e.g. GMRES).
> > > >>>>>
> > > >>>>   Hmm, it is certainly not intended at vectors be created and
> destroyed within each KSPSolve() could you please point us to the code that
> makes you think they are being created and destroyed?   We create all the
> work vectors at KSPSetUp() and destroy them in KSPReset() not during the
> solve. Not that this would be a measurable distance.
> > > >>>>
> > > >>>
> > > >>> I mean this, right in the log_view output:
> > > >>>
> > > >>>> Memory usage is given in bytes:
> > > >>>>
> > > >>>> Object Type Creations Destructions Memory Descendants' Mem.
> > > >>>> Reports information only for process 0.
> > > >>>>
> > > >>>> --- Event Stage 0: Main Stage
> > > >>>>
> > > >>>> ...
> > > >>>>
> > > >>>> --- Event Stage 1: First Solve
> > > >>>>
> > > >>>> ...
> > > >>>>
> > > >>>> --- Event Stage 2: Remaining Solves
> > > >>>>
> > > >>>> Vector 23904 23904 1295501184 0.
> > > >>> I logged the exact number of KSP iterations over the 999 timesteps
> and its exactly 23904/6 = 3984.
> > > >>> Michael
> > > >>>
> > > >>>
> > > >>> Am 24.05.2018 um 19:50 schrieb Smith, Barry F.:
> > > >>>>
> > > >>>>  Please send the log file for 1000 with cg as the solver.
> > > >>>>
> > > >>>>   You should make a bar chart of each event for the two cases to
> see which ones are taking more time and which are taking less (we cannot
> tell with the two logs you sent us since they are for different solvers.)
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>> On May 24, 2018, at 12:24 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>> I noticed that for every individual KSP iteration, six vector
> objects are created and destroyed (with CG, more with e.g. GMRES).
> > > >>>>>
> > > >>>>   Hmm, it is certainly not intended at vectors be created and
> destroyed within each KSPSolve() could you please point us to the code that
> makes you think they are being created and destroyed?   We create all the
> work vectors at KSPSetUp() and destroy them in KSPReset() not during the
> solve. Not that this would be a measurable distance.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>> This seems kind of wasteful, is this supposed to be like this?
> Is this even the reason for my problems? Apart from that, everything seems
> quite normal to me (but I'm not the expert here).
> > > >>>>>
> > > >>>>>
> > > >>>>> Thanks in advance.
> > > >>>>>
> > > >>>>> Michael
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> <log_view_125procs.txt><log_vi
> > > >>>>> ew_1000procs.txt>
> > > >>>>>
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > > > <o-wstest-125.txt><Scaling-loss.png><o-wstest-1000.txt><
> o-wstest-sync-125.txt><o-wstest-sync-1000.txt><MatSOR_
> SeqAIJ.png><PAPI_TOT_CYC.png><PAPI_DP_OPS.png>
> > >
> > >
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180610/2e23a692/attachment-0001.html>
-------------- next part --------------
using 216 of 216 processes
30^3 unknowns per processor
total system size: 180^3
mesh size: 0.0001

initsolve: 7 iterations
solve 1: 7 iterations
solve 2: 7 iterations
solve 3: 7 iterations
solve 4: 7 iterations
solve 5: 7 iterations
solve 6: 7 iterations
solve 7: 7 iterations
solve 8: 7 iterations
solve 9: 7 iterations
solve 10: 7 iterations
solve 20: 7 iterations
solve 30: 7 iterations
solve 40: 7 iterations
solve 50: 7 iterations
solve 60: 7 iterations
solve 70: 7 iterations
solve 80: 7 iterations
solve 90: 7 iterations
solve 100: 7 iterations
solve 200: 7 iterations
solve 300: 7 iterations
solve 400: 7 iterations
solve 500: 7 iterations
solve 600: 7 iterations
solve 700: 7 iterations
solve 800: 7 iterations
solve 900: 7 iterations
solve 1000: 7 iterations

Time in solve():      65.0281 s
Time in KSPSolve():   64.8664 s (99.7513%)

Number of   KSP iterations (total): 7000
Number of solve iterations (total): 1000 (ratio: 7.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdw-0491 with 216 processors, by jczhang Sat Jun  9 16:31:13 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           1.171e+02      1.00000   1.171e+02
Objects:              4.253e+04      1.00002   4.253e+04
Flop:                 3.698e+10      1.15842   3.534e+10  7.632e+12
Flop/sec:            3.157e+08      1.15842   3.017e+08  6.516e+10
MPI Messages:         1.858e+06      3.50879   1.262e+06  2.725e+08
MPI Message Lengths:  2.275e+09      2.20338   1.459e+03  3.975e+11
MPI Reductions:       3.764e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.7386e-02   0.0%  0.0000e+00   0.0%  2.160e+03   0.0%  1.802e+03        0.0%  1.700e+01   0.0% 
 1:     First Solve: 5.2061e+01  44.4%  9.8678e+09   0.1%  7.475e+05   0.3%  3.517e+03        0.7%  6.130e+02   1.6% 
 2: Remaining Solves: 6.5040e+01  55.5%  7.6225e+12  99.9%  2.718e+08  99.7%  1.453e+03       99.3%  3.700e+04  98.3% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 9.1314e-05 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSided         12 1.0 3.7413e-03 1.6 0.00e+00 0.0 1.6e+04 4.0e+00 0.0e+00  0  0  0  0  0   0  0  2  0  0     0
BuildTwoSidedF        30 1.0 4.7839e+00 3.4 0.00e+00 0.0 1.2e+04 1.1e+04 0.0e+00  2  0  0  0  0   5  0  2  5  0     0
KSPSetUp               9 1.0 1.2829e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
KSPSolve               1 1.0 5.2060e+01 1.0 4.82e+07 1.2 7.5e+05 3.5e+03 6.1e+02 44  0  0  1  2 100100100100100   190
VecTDot               14 1.0 1.8892e-03 1.6 7.56e+05 1.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  0   0  2  0  0  2 86434
VecNorm                9 1.0 1.1718e-03 1.8 4.86e+05 1.0 0.0e+00 0.0e+00 9.0e+00  0  0  0  0  0   0  1  0  0  1 89583
VecScale              42 1.0 2.2149e-04 2.6 9.47e+04 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 74579
VecCopy                1 1.0 1.2589e-04 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               187 1.0 1.3061e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               14 1.0 5.4979e-04 1.2 7.56e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  2  0  0  0 297013
VecAYPX               49 1.0 8.3947e-04 1.3 6.46e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 164549
VecAssemblyBegin       3 1.0 5.1260e-0512.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         3 1.0 2.4080e-05 8.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      183 1.0 3.7851e-03 2.1 0.00e+00 0.0 2.7e+05 1.5e+03 0.0e+00  0  0  0  0  0   0  0 36 15  0     0
VecScatterEnd        183 1.0 1.2801e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               50 1.0 1.2769e-02 1.2 1.05e+07 1.1 9.2e+04 2.1e+03 0.0e+00  0  0  0  0  0   0 22 12  7  0 170012
MatMultAdd            42 1.0 6.6104e-03 1.5 2.40e+06 1.3 4.8e+04 7.1e+02 0.0e+00  0  0  0  0  0   0  5  6  1  0 73068
MatMultTranspose      42 1.0 7.5936e-03 1.4 2.40e+06 1.3 4.8e+04 7.1e+02 0.0e+00  0  0  0  0  0   0  5  6  1  0 63607
MatSolve               7 0.0 4.7922e-05 0.0 1.62e+03 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    34
MatSOR                84 1.0 2.6659e-02 1.1 1.90e+07 1.2 8.3e+04 1.6e+03 1.4e+01  0  0  0  0  0   0 40 11  5  2 147225
MatLUFactorSym         1 1.0 1.1301e-04 8.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 8.5831e-0527.7 8.32e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    10
MatResidual           42 1.0 1.0384e-02 1.3 7.97e+06 1.2 8.3e+04 1.6e+03 0.0e+00  0  0  0  0  0   0 17 11  5  0 156947
MatAssemblyBegin     102 1.0 4.7864e+00 3.4 0.00e+00 0.0 1.2e+04 1.1e+04 0.0e+00  2  0  0  0  0   5  0  2  5  0     0
MatAssemblyEnd       102 1.0 4.3458e-02 1.1 0.00e+00 0.0 1.1e+05 2.2e+02 2.5e+02  0  0  0  0  1   0  0 15  1 40     0
MatGetRow        3100265 1.2 2.4304e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 20  0  0  0  0  44  0  0  0  0     0
MatGetRowIJ            1 0.0 9.0599e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMats       6 1.0 1.8371e-01 2.4 0.00e+00 0.0 1.0e+05 1.8e+04 1.2e+01  0  0  0  0  0   0  0 13 67  2     0
MatCreateSubMat        6 1.0 7.5428e-03 1.2 0.00e+00 0.0 3.2e+03 4.3e+02 9.4e+01  0  0  0  0  0   0  0  0  0 15     0
MatGetOrdering         1 0.0 6.1035e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       6 1.0 3.0780e-02 1.2 0.00e+00 0.0 4.8e+04 1.0e+03 1.2e+01  0  0  0  0  0   0  0  6  2  2     0
MatCoarsen             6 1.0 8.8811e-03 1.2 0.00e+00 0.0 9.2e+04 6.3e+02 3.2e+01  0  0  0  0  0   0  0 12  2  5     0
MatZeroEntries         6 1.0 9.8872e-04 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                6 1.0 1.2688e-01 1.0 1.13e+07 1.3 1.2e+05 2.7e+03 9.2e+01  0  0  0  0  0   0 23 15 12 15 17626
MatPtAPSymbolic        6 1.0 8.0804e-02 1.0 0.00e+00 0.0 6.1e+04 2.8e+03 4.2e+01  0  0  0  0  0   0  0  8  6  7     0
MatPtAPNumeric         6 1.0 4.6607e-02 1.0 1.13e+07 1.3 5.5e+04 2.6e+03 4.8e+01  0  0  0  0  0   0 23  7  5  8 47984
MatGetLocalMat         6 1.0 2.1553e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          6 1.0 4.4849e-03 1.7 0.00e+00 0.0 3.6e+04 3.7e+03 0.0e+00  0  0  0  0  0   0  0  5  5  0     0
SFSetGraph            12 1.0 6.7472e-05 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp               12 1.0 6.0279e-03 1.1 0.00e+00 0.0 4.8e+04 6.4e+02 0.0e+00  0  0  0  0  0   0  0  6  1  0     0
SFBcastBegin          44 1.0 1.1101e-03 2.0 0.00e+00 0.0 9.4e+04 7.4e+02 0.0e+00  0  0  0  0  0   0  0 13  3  0     0
SFBcastEnd            44 1.0 1.5342e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
GAMG: createProl       6 1.0 5.1834e+01 1.0 0.00e+00 0.0 3.6e+05 5.4e+03 2.8e+02 44  0  0  0  1 100  0 48 73 46     0
GAMG: partLevel        6 1.0 1.3778e-01 1.0 1.13e+07 1.3 1.2e+05 2.6e+03 2.4e+02  0  0  0  0  1   0 23 16 12 40 16232
  repartition          3 1.0 8.8692e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
  Invert-Sort          3 1.0 8.7905e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  2     0
  Move A               3 1.0 5.3742e-03 1.4 0.00e+00 0.0 1.5e+03 9.0e+02 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
  Move P               3 1.0 4.1795e-03 1.5 0.00e+00 0.0 1.7e+03 1.7e+01 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
PCSetUp                2 1.0 5.1975e+01 1.0 1.13e+07 1.3 4.7e+05 4.7e+03 5.6e+02 44  0  0  1  1 100 23 63 85 91    43
PCSetUpOnBlocks        7 1.0 3.1543e-04 2.6 8.32e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     3
PCApply                7 1.0 5.0538e-02 1.0 3.18e+07 1.2 2.6e+05 1.3e+03 1.4e+01  0  0  0  0  0   0 66 35 13  2 129023

--- Event Stage 2: Remaining Solves

KSPSolve            1000 1.0 6.4867e+01 1.0 3.69e+10 1.2 2.7e+08 1.5e+03 3.7e+04 55100100 99 98 100100100100100 117510
VecTDot            14000 1.0 4.1821e+00 1.2 7.56e+08 1.0 0.0e+00 0.0e+00 1.4e+04  3  2  0  0 37   6  2  0  0 38 39046
VecNorm             9000 1.0 1.8765e+00 1.1 4.86e+08 1.0 0.0e+00 0.0e+00 9.0e+03  2  1  0  0 24   3  1  0  0 24 55943
VecScale           42000 1.0 1.6005e-01 1.9 9.47e+07 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 103207
VecCopy             1000 1.0 5.0528e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            147000 1.0 1.0271e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY            14000 1.0 5.3448e-01 1.1 7.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   1  2  0  0  0 305521
VecAYPX            49000 1.0 7.6484e-01 1.4 6.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0 180606
VecScatterBegin   176000 1.0 3.5663e+00 2.2 0.00e+00 0.0 2.7e+08 1.5e+03 0.0e+00  2  0100 99  0   4  0100100  0     0
VecScatterEnd     176000 1.0 1.6369e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11  0  0  0  0  21  0  0  0  0     0
MatMult            50000 1.0 1.2707e+01 1.2 1.05e+10 1.1 9.2e+07 2.1e+03 0.0e+00 10 28 34 49  0  18 28 34 49  0 170833
MatMultAdd         42000 1.0 9.3107e+00 1.8 2.40e+09 1.3 4.8e+07 7.1e+02 0.0e+00  7  6 18  9  0  12  6 18  9  0 51877
MatMultTranspose   42000 1.0 8.5215e+00 1.5 2.40e+09 1.3 4.8e+07 7.1e+02 0.0e+00  6  6 18  9  0  10  6 18  9  0 56681
MatSolve            7000 0.0 5.1229e-02 0.0 1.62e+06 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    32
MatSOR             84000 1.0 2.9626e+01 1.1 1.90e+10 1.2 8.3e+07 1.6e+03 1.4e+04 24 51 31 33 37  44 51 31 33 38 132181
MatResidual        42000 1.0 1.1003e+01 1.2 7.97e+09 1.2 8.3e+07 1.6e+03 0.0e+00  8 21 31 33  0  15 21 31 33  0 148118
PCSetUpOnBlocks     7000 1.0 5.6956e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             7000 1.0 5.5983e+01 1.0 3.18e+10 1.2 2.6e+08 1.3e+03 1.4e+04 48 85 97 84 37  86 85 97 84 38 116316
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              9        11424     0.
     DMKSP interface     1              1          656     0.
              Vector     4             52      2374752     0.
              Matrix     0             65     14268388     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2             18       142504     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1             14       233696     0.
      Preconditioner     1              9         9676     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     8              0            0     0.
              Vector   170            122      3205936     0.
              Matrix   148             83     22245544     0.
      Matrix Coarsen     6              6         3816     0.
           Index Set   128            112       559424     0.
   Star Forest Graph    12             12        10368     0.
         Vec Scatter    34             21        26544     0.
      Preconditioner     8              0            0     0.

--- Event Stage 2: Remaining Solves

              Vector 42000          42000   2279704000     0.
========================================================================================================================
Average time to get PetscTime(): 8.10623e-07
Average time for MPI_Barrier(): 8.63075e-06
Average time for zero size MPI_Send(): 6.39315e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff.png
Type: image/png
Size: 123397 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180610/2e23a692/attachment-0002.png>
-------------- next part --------------
using 216 of 216 processes
30^3 unknowns per processor
total system size: 180^3
mesh size: 0.0001

initsolve: 7 iterations
solve 1: 7 iterations
solve 2: 7 iterations
solve 3: 7 iterations
solve 4: 7 iterations
solve 5: 7 iterations
solve 6: 7 iterations
solve 7: 7 iterations
solve 8: 7 iterations
solve 9: 7 iterations
solve 10: 7 iterations
solve 20: 7 iterations
solve 30: 7 iterations
solve 40: 7 iterations
solve 50: 7 iterations
solve 60: 7 iterations
solve 70: 7 iterations
solve 80: 7 iterations
solve 90: 7 iterations
solve 100: 7 iterations
solve 200: 7 iterations
solve 300: 7 iterations
solve 400: 7 iterations
solve 500: 7 iterations
solve 600: 7 iterations
solve 700: 7 iterations
solve 800: 7 iterations
solve 900: 7 iterations
solve 1000: 7 iterations

Time in solve():      107.655 s
Time in KSPSolve():   107.398 s (99.7609%)

Number of   KSP iterations (total): 7000
Number of solve iterations (total): 1000 (ratio: 7.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdwd-0001 with 216 processors, by jczhang Sat Jun  9 16:36:21 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           2.108e+02      1.00001   2.108e+02
Objects:              4.253e+04      1.00002   4.253e+04
Flop:                 3.698e+10      1.15842   3.534e+10  7.632e+12
Flop/sec:            1.754e+08      1.15842   1.676e+08  3.621e+10
MPI Messages:         1.858e+06      3.50879   1.262e+06  2.725e+08
MPI Message Lengths:  2.275e+09      2.20338   1.459e+03  3.975e+11
MPI Reductions:       3.764e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 9.0703e-02   0.0%  0.0000e+00   0.0%  2.160e+03   0.0%  1.802e+03        0.0%  1.700e+01   0.0% 
 1:     First Solve: 1.0304e+02  48.9%  9.8678e+09   0.1%  7.475e+05   0.3%  3.517e+03        0.7%  6.130e+02   1.6% 
 2: Remaining Solves: 1.0767e+02  51.1%  7.6225e+12  99.9%  2.718e+08  99.7%  1.453e+03       99.3%  3.700e+04  98.3% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 1.4281e-04 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSided         12 1.0 3.5601e-03 1.6 0.00e+00 0.0 1.6e+04 4.0e+00 0.0e+00  0  0  0  0  0   0  0  2  0  0     0
BuildTwoSidedF        30 1.0 9.1126e+00 3.7 0.00e+00 0.0 1.2e+04 1.1e+04 0.0e+00  2  0  0  0  0   5  0  2  5  0     0
KSPSetUp               9 1.0 1.7250e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
KSPSolve               1 1.0 1.0304e+02 1.0 4.82e+07 1.2 7.5e+05 3.5e+03 6.1e+02 49  0  0  1  2 100100100100100    96
VecTDot               14 1.0 3.1445e-03 2.3 7.56e+05 1.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  0   0  2  0  0  2 51930
VecNorm                9 1.0 1.2560e-03 1.8 4.86e+05 1.0 0.0e+00 0.0e+00 9.0e+00  0  0  0  0  0   0  1  0  0  1 83580
VecScale              42 1.0 4.3011e-04 3.2 9.47e+04 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 38406
VecCopy                1 1.0 1.2112e-04 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               187 1.0 2.3854e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               14 1.0 9.7251e-04 1.2 7.56e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  2  0  0  0 167912
VecAYPX               49 1.0 1.6963e-03 1.5 6.46e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 81430
VecAssemblyBegin       3 1.0 8.1062e-0511.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         3 1.0 6.2227e-0516.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      183 1.0 6.0692e-03 1.9 0.00e+00 0.0 2.7e+05 1.5e+03 0.0e+00  0  0  0  0  0   0  0 36 15  0     0
VecScatterEnd        183 1.0 2.1380e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               50 1.0 2.2952e-02 1.2 1.05e+07 1.1 9.2e+04 2.1e+03 0.0e+00  0  0  0  0  0   0 22 12  7  0 94582
MatMultAdd            42 1.0 1.1678e-02 1.7 2.40e+06 1.3 4.8e+04 7.1e+02 0.0e+00  0  0  0  0  0   0  5  6  1  0 41362
MatMultTranspose      42 1.0 1.1627e-02 1.5 2.40e+06 1.3 4.8e+04 7.1e+02 0.0e+00  0  0  0  0  0   0  5  6  1  0 41542
MatSolve               7 0.0 1.3185e-04 0.0 1.62e+03 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    12
MatSOR                84 1.0 5.1574e-02 1.1 1.90e+07 1.2 8.3e+04 1.6e+03 1.4e+01  0  0  0  0  0   0 40 11  5  2 76102
MatLUFactorSym         1 1.0 7.2956e-05 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 1.1492e-0424.1 8.32e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     7
MatResidual           42 1.0 1.9120e-02 1.3 7.97e+06 1.2 8.3e+04 1.6e+03 0.0e+00  0  0  0  0  0   0 17 11  5  0 85240
MatAssemblyBegin     102 1.0 9.1159e+00 3.7 0.00e+00 0.0 1.2e+04 1.1e+04 0.0e+00  2  0  0  0  0   5  0  2  5  0     0
MatAssemblyEnd       102 1.0 5.0428e-02 1.1 0.00e+00 0.0 1.1e+05 2.2e+02 2.5e+02  0  0  0  0  1   0  0 15  1 40     0
MatGetRow        3100265 1.2 4.8632e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 22  0  0  0  0  45  0  0  0  0     0
MatGetRowIJ            1 0.0 1.0014e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMats       6 1.0 1.8569e-01 2.3 0.00e+00 0.0 1.0e+05 1.8e+04 1.2e+01  0  0  0  0  0   0  0 13 67  2     0
MatCreateSubMat        6 1.0 1.2963e-02 1.7 0.00e+00 0.0 3.2e+03 4.3e+02 9.4e+01  0  0  0  0  0   0  0  0  0 15     0
MatGetOrdering         1 0.0 1.0300e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       6 1.0 3.1326e-02 1.2 0.00e+00 0.0 4.8e+04 1.0e+03 1.2e+01  0  0  0  0  0   0  0  6  2  2     0
MatCoarsen             6 1.0 1.5702e-02 1.6 0.00e+00 0.0 9.2e+04 6.3e+02 3.2e+01  0  0  0  0  0   0  0 12  2  5     0
MatZeroEntries         6 1.0 1.6329e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                6 1.0 1.3658e-01 1.0 1.13e+07 1.3 1.2e+05 2.7e+03 9.2e+01  0  0  0  0  0   0 23 15 12 15 16374
MatPtAPSymbolic        6 1.0 8.5818e-02 1.0 0.00e+00 0.0 6.1e+04 2.8e+03 4.2e+01  0  0  0  0  0   0  0  8  6  7     0
MatPtAPNumeric         6 1.0 5.0997e-02 1.0 1.13e+07 1.3 5.5e+04 2.6e+03 4.8e+01  0  0  0  0  0   0 23  7  5  8 43853
MatGetLocalMat         6 1.0 2.9812e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          6 1.0 5.5532e-03 1.8 0.00e+00 0.0 3.6e+04 3.7e+03 0.0e+00  0  0  0  0  0   0  0  5  5  0     0
SFSetGraph            12 1.0 1.1659e-04 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp               12 1.0 6.0501e-03 1.1 0.00e+00 0.0 4.8e+04 6.4e+02 0.0e+00  0  0  0  0  0   0  0  6  1  0     0
SFBcastBegin          44 1.0 1.5197e-03 1.8 0.00e+00 0.0 9.4e+04 7.4e+02 0.0e+00  0  0  0  0  0   0  0 13  3  0     0
SFBcastEnd            44 1.0 2.7242e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
GAMG: createProl       6 1.0 1.0273e+02 1.0 0.00e+00 0.0 3.6e+05 5.4e+03 2.8e+02 49  0  0  0  1 100  0 48 73 46     0
GAMG: partLevel        6 1.0 1.5403e-01 1.0 1.13e+07 1.3 1.2e+05 2.6e+03 2.4e+02  0  0  0  0  1   0 23 16 12 40 14519
  repartition          3 1.0 1.0488e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
  Invert-Sort          3 1.0 1.2918e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  2     0
  Move A               3 1.0 1.0398e-02 2.1 0.00e+00 0.0 1.5e+03 9.0e+02 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
  Move P               3 1.0 8.6331e-03 2.7 0.00e+00 0.0 1.7e+03 1.7e+01 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
PCSetUp                2 1.0 1.0290e+02 1.0 1.13e+07 1.3 4.7e+05 4.7e+03 5.6e+02 49  0  0  1  1 100 23 63 85 91    22
PCSetUpOnBlocks        7 1.0 5.5504e-04 2.8 8.32e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1
PCApply                7 1.0 9.0100e-02 1.0 3.18e+07 1.2 2.6e+05 1.3e+03 1.4e+01  0  0  0  0  0   0 66 35 13  2 72371

--- Event Stage 2: Remaining Solves

KSPSolve            1000 1.0 1.0742e+02 1.0 3.69e+10 1.2 2.7e+08 1.5e+03 3.7e+04 51100100 99 98 100100100100100 70961
VecTDot            14000 1.0 5.2095e+00 1.4 7.56e+08 1.0 0.0e+00 0.0e+00 1.4e+04  2  2  0  0 37   4  2  0  0 38 31345
VecNorm             9000 1.0 1.9601e+00 1.1 4.86e+08 1.0 0.0e+00 0.0e+00 9.0e+03  1  1  0  0 24   2  1  0  0 24 53556
VecScale           42000 1.0 3.1761e-01 2.3 9.47e+07 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 52009
VecCopy             1000 1.0 8.1215e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            147000 1.0 1.9388e+00 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0
VecAXPY            14000 1.0 9.6349e-01 1.2 7.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   1  2  0  0  0 169484
VecAYPX            49000 1.0 1.4718e+00 1.3 6.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0 93851
VecScatterBegin   176000 1.0 5.5100e+00 2.0 0.00e+00 0.0 2.7e+08 1.5e+03 0.0e+00  2  0100 99  0   4  0100100  0     0
VecScatterEnd     176000 1.0 2.6342e+01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0  18  0  0  0  0     0
MatMult            50000 1.0 2.2700e+01 1.2 1.05e+10 1.1 9.2e+07 2.1e+03 0.0e+00 10 28 34 49  0  19 28 34 49  0 95632
MatMultAdd         42000 1.0 1.4646e+01 1.7 2.40e+09 1.3 4.8e+07 7.1e+02 0.0e+00  5  6 18  9  0  10  6 18  9  0 32979
MatMultTranspose   42000 1.0 1.3086e+01 1.6 2.40e+09 1.3 4.8e+07 7.1e+02 0.0e+00  5  6 18  9  0   9  6 18  9  0 36911
MatSolve            7000 0.0 8.7209e-02 0.0 1.62e+06 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    19
MatSOR             84000 1.0 5.3951e+01 1.1 1.90e+10 1.2 8.3e+07 1.6e+03 1.4e+04 24 51 31 33 37  47 51 31 33 38 72584
MatResidual        42000 1.0 1.9184e+01 1.2 7.97e+09 1.2 8.3e+07 1.6e+03 0.0e+00  8 21 31 33  0  16 21 31 33  0 84954
PCSetUpOnBlocks     7000 1.0 1.1972e-01 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             7000 1.0 9.4526e+01 1.0 3.18e+10 1.2 2.6e+08 1.3e+03 1.4e+04 45 85 97 84 37  87 85 97 84 38 68888
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              9        11424     0.
     DMKSP interface     1              1          656     0.
              Vector     4             52      2374752     0.
              Matrix     0             65     14268388     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2             18       142504     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1             14       233696     0.
      Preconditioner     1              9         9676     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     8              0            0     0.
              Vector   170            122      3205936     0.
              Matrix   148             83     22245544     0.
      Matrix Coarsen     6              6         3816     0.
           Index Set   128            112       559424     0.
   Star Forest Graph    12             12        10368     0.
         Vec Scatter    34             21        26544     0.
      Preconditioner     8              0            0     0.

--- Event Stage 2: Remaining Solves

              Vector 42000          42000   2279704000     0.
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 1.1158e-05
Average time for zero size MPI_Send(): 6.42626e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
srun: Warning: can't honor --ntasks-per-node set to 18 which doesn't match the requested tasks 96 with the number of requested nodes 96. Ignoring --ntasks-per-node.
using 1728 of 1728 processes
30^3 unknowns per processor
total system size: 360^3
mesh size: 0.0001

initsolve: 8 iterations
solve 1: 8 iterations
solve 2: 8 iterations
solve 3: 8 iterations
solve 4: 8 iterations
solve 5: 8 iterations
solve 6: 8 iterations
solve 7: 8 iterations
solve 8: 8 iterations
solve 9: 8 iterations
solve 10: 8 iterations
solve 20: 8 iterations
solve 30: 8 iterations
solve 40: 8 iterations
solve 50: 8 iterations
solve 60: 8 iterations
solve 70: 8 iterations
solve 80: 8 iterations
solve 90: 8 iterations
solve 100: 8 iterations
solve 200: 8 iterations
solve 300: 8 iterations
solve 400: 8 iterations
solve 500: 8 iterations
solve 600: 8 iterations
solve 700: 8 iterations
solve 800: 8 iterations
solve 900: 8 iterations
solve 1000: 8 iterations

Time in solve():      114.7 s
Time in KSPSolve():   114.54 s (99.8607%)

Number of   KSP iterations (total): 8000
Number of solve iterations (total): 1000 (ratio: 8.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdw-0129 with 1728 processors, by jczhang Sat Jun  9 16:39:40 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           1.698e+02      1.00003   1.698e+02
Objects:              4.854e+04      1.00002   4.854e+04
Flop:                 4.220e+10      1.15865   4.125e+10  7.129e+13
Flop/sec:            2.485e+08      1.15864   2.429e+08  4.197e+11
MPI Messages:         2.548e+06      4.16034   1.730e+06  2.989e+09
MPI Message Lengths:  2.592e+09      2.20360   1.356e+03  4.053e+12
MPI Reductions:       4.266e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 5.2859e-02   0.0%  0.0000e+00   0.0%  1.901e+04   0.0%  1.802e+03        0.0%  1.700e+01   0.0% 
 1:     First Solve: 5.5079e+01  32.4%  8.9686e+10   0.1%  7.756e+06   0.3%  3.230e+03        0.6%  6.410e+02   1.5% 
 2: Remaining Solves: 1.1471e+02  67.5%  7.1197e+13  99.9%  2.981e+09  99.7%  1.351e+03       99.4%  4.200e+04  98.4% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 7.0095e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSided         12 1.0 1.4518e-02 4.5 0.00e+00 0.0 1.6e+05 4.0e+00 0.0e+00  0  0  0  0  0   0  0  2  0  0     0
BuildTwoSidedF        30 1.0 6.1632e+00 3.4 0.00e+00 0.0 1.2e+05 1.1e+04 0.0e+00  2  0  0  0  0   5  0  1  5  0     0
KSPSetUp               9 1.0 8.9891e-03 5.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
KSPSolve               1 1.0 5.5078e+01 1.0 5.33e+07 1.2 7.8e+06 3.2e+03 6.4e+02 32  0  0  1  2 100100100100100  1628
VecTDot               16 1.0 4.4720e-03 2.3 8.64e+05 1.0 0.0e+00 0.0e+00 1.6e+01  0  0  0  0  0   0  2  0  0  2 333846
VecNorm               10 1.0 9.3472e-0310.8 5.40e+05 1.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  1  0  0  2 99829
VecScale              48 1.0 1.4431e-0316.4 1.08e+05 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 117314
VecCopy                1 1.0 7.4291e-0419.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               208 1.0 5.8553e-03 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               16 1.0 6.4611e-04 1.3 8.64e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  2  0  0  0 2310724
VecAYPX               56 1.0 1.0061e-03 1.5 7.42e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 1268367
VecAssemblyBegin       3 1.0 4.9114e-0517.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         3 1.0 3.6001e-0518.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      208 1.0 1.1714e-02 5.9 0.00e+00 0.0 3.0e+06 1.4e+03 0.0e+00  0  0  0  0  0   0  0 38 16  0     0
VecScatterEnd        208 1.0 5.5308e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               57 1.0 3.4521e-02 2.8 1.19e+07 1.1 1.0e+06 2.0e+03 0.0e+00  0  0  0  0  0   0 23 13  8  0 584742
MatMultAdd            48 1.0 4.0031e-02 6.8 2.75e+06 1.3 5.3e+05 6.6e+02 0.0e+00  0  0  0  0  0   0  5  7  1  0 114344
MatMultTranspose      48 1.0 3.1601e-02 4.8 2.75e+06 1.3 5.3e+05 6.6e+02 0.0e+00  0  0  0  0  0   0  5  7  1  0 144847
MatSolve               8 0.0 5.8174e-05 0.0 2.90e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   498
MatSOR                96 1.0 6.4496e-02 1.5 2.18e+07 1.2 9.2e+05 1.5e+03 1.6e+01  0  0  0  0  0   0 41 12  5  2 569549
MatLUFactorSym         1 1.0 3.5360e-03296.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 3.3751e-031179.7 5.07e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    15
MatResidual           48 1.0 3.2003e-02 3.4 9.11e+06 1.2 9.2e+05 1.5e+03 0.0e+00  0  0  0  0  0   0 17 12  5  0 478619
MatAssemblyBegin     102 1.0 6.1656e+00 3.4 0.00e+00 0.0 1.2e+05 1.1e+04 0.0e+00  2  0  0  0  0   5  0  1  5  0     0
MatAssemblyEnd       102 1.0 8.5799e-02 1.1 0.00e+00 0.0 1.1e+06 2.0e+02 2.5e+02  0  0  0  0  1   0  0 14  1 39     0
MatGetRow        3100266 1.2 2.5176e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  44  0  0  0  0     0
MatGetRowIJ            1 0.0 1.2875e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMats       6 1.0 2.0034e-01 2.2 0.00e+00 0.0 1.0e+06 1.6e+04 1.2e+01  0  0  0  0  0   0  0 13 67  2     0
MatCreateSubMat        6 1.0 4.2367e-02 1.0 0.00e+00 0.0 3.7e+04 3.4e+02 9.4e+01  0  0  0  0  0   0  0  0  0 15     0
MatGetOrdering         1 0.0 6.2943e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       6 1.0 6.1879e-02 1.2 0.00e+00 0.0 4.6e+05 9.9e+02 1.2e+01  0  0  0  0  0   0  0  6  2  2     0
MatCoarsen             6 1.0 2.6180e-02 1.1 0.00e+00 0.0 9.7e+05 5.5e+02 5.4e+01  0  0  0  0  0   0  0 12  2  8     0
MatZeroEntries         6 1.0 1.7581e-03 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                6 1.0 2.1420e-01 1.0 1.11e+07 1.3 1.1e+06 2.5e+03 9.3e+01  0  0  0  0  0   0 21 15 11 15 85980
MatPtAPSymbolic        6 1.0 1.2393e-01 1.1 0.00e+00 0.0 5.8e+05 2.7e+03 4.2e+01  0  0  0  0  0   0  0  7  6  7     0
MatPtAPNumeric         6 1.0 9.3141e-02 1.1 1.11e+07 1.3 5.5e+05 2.3e+03 4.8e+01  0  0  0  0  0   0 21  7  5  7 197734
MatGetLocalMat         6 1.0 2.1052e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          6 1.0 6.8512e-03 3.0 0.00e+00 0.0 3.4e+05 3.4e+03 0.0e+00  0  0  0  0  0   0  0  4  5  0     0
SFSetGraph            12 1.0 7.7248e-05 5.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp               12 1.0 1.7301e-02 2.7 0.00e+00 0.0 4.8e+05 5.8e+02 0.0e+00  0  0  0  0  0   0  0  6  1  0     0
SFBcastBegin          66 1.0 3.3102e-03 4.7 0.00e+00 0.0 1.0e+06 6.4e+02 0.0e+00  0  0  0  0  0   0  0 13  3  0     0
SFBcastEnd            66 1.0 1.0333e-02 9.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
GAMG: createProl       6 1.0 5.4556e+01 1.0 0.00e+00 0.0 3.6e+06 5.1e+03 3.1e+02 32  0  0  0  1  99  0 46 73 48     0
GAMG: partLevel        6 1.0 3.1220e-01 1.0 1.11e+07 1.3 1.2e+06 2.4e+03 2.4e+02  0  0  0  0  1   1 21 15 11 38 58992
  repartition          3 1.0 9.0489e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
  Invert-Sort          3 1.0 3.5654e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  2     0
  Move A               3 1.0 3.5768e-02 1.0 0.00e+00 0.0 1.6e+04 7.9e+02 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
  Move P               3 1.0 1.2372e-02 1.2 0.00e+00 0.0 2.2e+04 1.3e+01 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
PCSetUp                2 1.0 5.4924e+01 1.0 1.11e+07 1.3 4.8e+06 4.4e+03 5.8e+02 32  0  0  1  1 100 21 61 84 91   335
PCSetUpOnBlocks        8 1.0 3.7484e-0337.5 5.07e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    14
PCApply                8 1.0 1.1110e-01 1.0 3.64e+07 1.2 2.9e+06 1.2e+03 1.6e+01  0  0  0  0  0   0 68 37 14  2 550891

--- Event Stage 2: Remaining Solves

KSPSolve            1000 1.0 1.1454e+02 1.0 4.21e+10 1.2 3.0e+09 1.4e+03 4.2e+04 67100100 99 98 100100100100100 621610
VecTDot            16000 1.0 1.2048e+01 1.1 8.64e+08 1.0 0.0e+00 0.0e+00 1.6e+04  7  2  0  0 38  10  2  0  0 38 123923
VecNorm            10000 1.0 4.1293e+00 1.0 5.40e+08 1.0 0.0e+00 0.0e+00 1.0e+04  2  1  0  0 23   4  1  0  0 24 225976
VecScale           48000 1.0 7.3612e-01 9.5 1.08e+08 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 229992
VecCopy             1000 1.0 6.3032e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            168000 1.0 1.5064e+00 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY            16000 1.0 6.2166e-01 1.2 8.64e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 2401639
VecAYPX            56000 1.0 1.6279e+00 2.6 7.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   1  2  0  0  0 783928
VecScatterBegin   201000 1.0 4.6709e+00 2.6 0.00e+00 0.0 3.0e+09 1.4e+03 0.0e+00  2  0100 99  0   3  0100100  0     0
VecScatterEnd     201000 1.0 3.9701e+01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21  0  0  0  0  31  0  0  0  0     0
MatMult            57000 1.0 1.9548e+01 1.5 1.19e+10 1.1 1.0e+09 2.0e+03 0.0e+00  9 28 34 49  0  13 28 34 49  0 1032633
MatMultAdd         48000 1.0 2.8641e+01 3.6 2.75e+09 1.3 5.3e+08 6.6e+02 0.0e+00 14  6 18  9  0  20  6 18  9  0 159815
MatMultTranspose   48000 1.0 1.9239e+01 2.9 2.75e+09 1.3 5.3e+08 6.6e+02 0.0e+00  5  6 18  9  0   8  6 18  9  0 237923
MatSolve            8000 0.0 6.0686e-02 0.0 2.90e+07 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   477
MatSOR             96000 1.0 5.0867e+01 1.2 2.17e+10 1.2 9.2e+08 1.5e+03 1.6e+04 28 51 31 33 38  41 51 31 34 38 720735
MatResidual        48000 1.0 1.7601e+01 1.7 9.11e+09 1.2 9.2e+08 1.5e+03 0.0e+00  8 21 31 33  0  11 22 31 34  0 870258
PCSetUpOnBlocks     8000 1.0 6.8316e-02 9.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             8000 1.0 9.5367e+01 1.0 3.63e+10 1.2 2.9e+09 1.2e+03 1.6e+04 56 86 97 84 38  83 86 97 85 38 641033
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              9        11424     0.
     DMKSP interface     1              1          656     0.
              Vector     4             52      2388736     0.
              Matrix     0             65     15239012     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2             18       197872     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1             14       233696     0.
      Preconditioner     1              9         9676     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     8              0            0     0.
              Vector   176            128      3559744     0.
              Matrix   148             83     23608464     0.
      Matrix Coarsen     6              6         3816     0.
           Index Set   128            112       619344     0.
   Star Forest Graph    12             12        10368     0.
         Vec Scatter    34             21        26544     0.
      Preconditioner     8              0            0     0.

--- Event Stage 2: Remaining Solves

              Vector 48000          48000   2625920000     0.
========================================================================================================================
Average time to get PetscTime(): 8.82149e-07
Average time for MPI_Barrier(): 1.65939e-05
Average time for zero size MPI_Send(): 6.39467e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
srun: Warning: can't honor --ntasks-per-node set to 36 which doesn't match the requested tasks 48 with the number of requested nodes 48. Ignoring --ntasks-per-node.
using 1728 of 1728 processes
30^3 unknowns per processor
total system size: 360^3
mesh size: 0.0001

initsolve: 8 iterations
solve 1: 8 iterations
solve 2: 8 iterations
solve 3: 8 iterations
solve 4: 8 iterations
solve 5: 8 iterations
solve 6: 8 iterations
solve 7: 8 iterations
solve 8: 8 iterations
solve 9: 8 iterations
solve 10: 8 iterations
solve 20: 8 iterations
solve 30: 8 iterations
solve 40: 8 iterations
solve 50: 8 iterations
solve 60: 8 iterations
solve 70: 8 iterations
solve 80: 8 iterations
solve 90: 8 iterations
solve 100: 8 iterations
solve 200: 8 iterations
solve 300: 8 iterations
solve 400: 8 iterations
solve 500: 8 iterations
solve 600: 8 iterations
solve 700: 8 iterations
solve 800: 8 iterations
solve 900: 8 iterations
solve 1000: 8 iterations

Time in solve():      161.945 s
Time in KSPSolve():   161.693 s (99.8445%)

Number of   KSP iterations (total): 8000
Number of solve iterations (total): 1000 (ratio: 8.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdw-0065 with 1728 processors, by jczhang Sat Jun  9 16:42:23 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           2.700e+02      1.00002   2.699e+02
Objects:              4.854e+04      1.00002   4.854e+04
Flop:                 4.220e+10      1.15865   4.125e+10  7.129e+13
Flop/sec:            1.563e+08      1.15865   1.528e+08  2.641e+11
MPI Messages:         2.548e+06      4.16034   1.730e+06  2.989e+09
MPI Message Lengths:  2.592e+09      2.20360   1.356e+03  4.053e+12
MPI Reductions:       4.266e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.2022e-01   0.0%  0.0000e+00   0.0%  1.901e+04   0.0%  1.802e+03        0.0%  1.700e+01   0.0% 
 1:     First Solve: 1.0787e+02  40.0%  8.9686e+10   0.1%  7.756e+06   0.3%  3.230e+03        0.6%  6.410e+02   1.5% 
 2: Remaining Solves: 1.6196e+02  60.0%  7.1197e+13  99.9%  2.981e+09  99.7%  1.351e+03       99.4%  4.200e+04  98.4% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 1.8120e-04 6.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSided         12 1.0 5.7149e-03 1.6 0.00e+00 0.0 1.6e+05 4.0e+00 0.0e+00  0  0  0  0  0   0  0  2  0  0     0
BuildTwoSidedF        30 1.0 1.1775e+01 4.0 0.00e+00 0.0 1.2e+05 1.1e+04 0.0e+00  2  0  0  0  0   5  0  1  5  0     0
KSPSetUp               9 1.0 2.5797e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
KSPSolve               1 1.0 1.0786e+02 1.0 5.33e+07 1.2 7.8e+06 3.2e+03 6.4e+02 40  0  0  1  2 100100100100100   831
VecTDot               16 1.0 6.9978e-03 2.0 8.64e+05 1.0 0.0e+00 0.0e+00 1.6e+01  0  0  0  0  0   0  2  0  0  2 213347
VecNorm               10 1.0 3.2461e-03 2.9 5.40e+05 1.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  1  0  0  2 287462
VecScale              48 1.0 5.1904e-04 4.7 1.08e+05 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 326185
VecCopy                1 1.0 1.3995e-04 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               208 1.0 3.8517e-03 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               16 1.0 1.2281e-03 1.4 8.64e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  2  0  0  0 1215698
VecAYPX               56 1.0 1.9097e-03 1.7 7.42e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 668228
VecAssemblyBegin       3 1.0 4.3488e-04140.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         3 1.0 6.6996e-0535.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      208 1.0 7.2007e-03 2.1 0.00e+00 0.0 3.0e+06 1.4e+03 0.0e+00  0  0  0  0  0   0  0 38 16  0     0
VecScatterEnd        208 1.0 4.7876e-02 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               57 1.0 3.8546e-02 1.8 1.19e+07 1.1 1.0e+06 2.0e+03 0.0e+00  0  0  0  0  0   0 23 13  8  0 523683
MatMultAdd            48 1.0 2.8487e-02 2.9 2.75e+06 1.3 5.3e+05 6.6e+02 0.0e+00  0  0  0  0  0   0  5  7  1  0 160680
MatMultTranspose      48 1.0 2.5251e-02 2.7 2.75e+06 1.3 5.3e+05 6.6e+02 0.0e+00  0  0  0  0  0   0  5  7  1  0 181274
MatSolve               8 0.0 5.1498e-05 0.0 2.90e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   562
MatSOR                96 1.0 6.6756e-02 1.2 2.18e+07 1.2 9.2e+05 1.5e+03 1.6e+01  0  0  0  0  0   0 41 12  5  2 550259
MatLUFactorSym         1 1.0 1.5211e-0412.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 1.2302e-0443.0 5.07e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   412
MatResidual           48 1.0 3.3680e-02 2.0 9.11e+06 1.2 9.2e+05 1.5e+03 0.0e+00  0  0  0  0  0   0 17 12  5  0 454784
MatAssemblyBegin     102 1.0 1.1779e+01 4.0 0.00e+00 0.0 1.2e+05 1.1e+04 0.0e+00  2  0  0  0  0   5  0  1  5  0     0
MatAssemblyEnd       102 1.0 6.4854e-02 1.1 0.00e+00 0.0 1.1e+06 2.0e+02 2.5e+02  0  0  0  0  1   0  0 14  1 39     0
MatGetRow        3100266 1.2 5.0476e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18  0  0  0  0  45  0  0  0  0     0
MatGetRowIJ            1 0.0 1.2159e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMats       6 1.0 1.8868e-01 2.4 0.00e+00 0.0 1.0e+06 1.6e+04 1.2e+01  0  0  0  0  0   0  0 13 67  2     0
MatCreateSubMat        6 1.0 3.1791e-02 1.2 0.00e+00 0.0 3.7e+04 3.4e+02 9.4e+01  0  0  0  0  0   0  0  0  0 15     0
MatGetOrdering         1 0.0 5.8889e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       6 1.0 7.8569e-02 1.2 0.00e+00 0.0 4.6e+05 9.9e+02 1.2e+01  0  0  0  0  0   0  0  6  2  2     0
MatCoarsen             6 1.0 1.7872e-02 1.5 0.00e+00 0.0 9.7e+05 5.5e+02 5.4e+01  0  0  0  0  0   0  0 12  2  8     0
MatZeroEntries         6 1.0 1.6198e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                6 1.0 1.8265e-01 1.0 1.11e+07 1.3 1.1e+06 2.5e+03 9.3e+01  0  0  0  0  0   0 21 15 11 15 100831
MatPtAPSymbolic        6 1.0 1.1047e-01 1.0 0.00e+00 0.0 5.8e+05 2.7e+03 4.2e+01  0  0  0  0  0   0  0  7  6  7     0
MatPtAPNumeric         6 1.0 7.1542e-02 1.0 1.11e+07 1.3 5.5e+05 2.3e+03 4.8e+01  0  0  0  0  0   0 21  7  5  7 257431
MatGetLocalMat         6 1.0 2.9891e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          6 1.0 4.4601e-03 1.6 0.00e+00 0.0 3.4e+05 3.4e+03 0.0e+00  0  0  0  0  0   0  0  4  5  0     0
SFSetGraph            12 1.0 1.3447e-04 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp               12 1.0 8.3802e-03 1.2 0.00e+00 0.0 4.8e+05 5.8e+02 0.0e+00  0  0  0  0  0   0  0  6  1  0     0
SFBcastBegin          66 1.0 2.1842e-03 2.1 0.00e+00 0.0 1.0e+06 6.4e+02 0.0e+00  0  0  0  0  0   0  0 13  3  0     0
SFBcastEnd            66 1.0 3.4332e-03 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
GAMG: createProl       6 1.0 1.0743e+02 1.0 0.00e+00 0.0 3.6e+06 5.1e+03 3.1e+02 40  0  0  0  1 100  0 46 73 48     0
GAMG: partLevel        6 1.0 2.3124e-01 1.0 1.11e+07 1.3 1.2e+06 2.4e+03 2.4e+02  0  0  0  0  1   0 21 15 11 38 79645
  repartition          3 1.0 2.2879e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
  Invert-Sort          3 1.0 1.0046e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  2     0
  Move A               3 1.0 2.2692e-02 1.4 0.00e+00 0.0 1.6e+04 7.9e+02 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
  Move P               3 1.0 1.7002e-02 1.5 0.00e+00 0.0 2.2e+04 1.3e+01 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
PCSetUp                2 1.0 1.0768e+02 1.0 1.11e+07 1.3 4.8e+06 4.4e+03 5.8e+02 40  0  0  1  1 100 21 61 84 91   171
PCSetUpOnBlocks        8 1.0 5.9175e-04 6.5 5.07e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    86
PCApply                8 1.0 1.2410e-01 1.0 3.64e+07 1.2 2.9e+06 1.2e+03 1.6e+01  0  0  0  0  0   0 68 37 14  2 493181

--- Event Stage 2: Remaining Solves

KSPSolve            1000 1.0 1.6171e+02 1.0 4.21e+10 1.2 3.0e+09 1.4e+03 4.2e+04 60100100 99 98 100100100100100 440278
VecTDot            16000 1.0 1.4152e+01 1.3 8.64e+08 1.0 0.0e+00 0.0e+00 1.6e+04  5  2  0  0 38   8  2  0  0 38 105498
VecNorm            10000 1.0 5.2611e+00 1.2 5.40e+08 1.0 0.0e+00 0.0e+00 1.0e+04  2  1  0  0 23   3  1  0  0 24 177362
VecScale           48000 1.0 3.4804e-01 3.2 1.08e+08 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 486441
VecCopy             1000 1.0 8.5613e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            168000 1.0 2.5884e+00 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY            16000 1.0 1.0942e+00 1.2 8.64e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   1  2  0  0  0 1364458
VecAYPX            56000 1.0 2.1396e+00 1.8 7.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0 596431
VecScatterBegin   201000 1.0 7.2699e+00 2.3 0.00e+00 0.0 3.0e+09 1.4e+03 0.0e+00  2  0100 99  0   3  0100100  0     0
VecScatterEnd     201000 1.0 5.0307e+01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  24  0  0  0  0     0
MatMult            57000 1.0 3.0696e+01 1.4 1.19e+10 1.1 1.0e+09 2.0e+03 0.0e+00  9 28 34 49  0  15 28 34 49  0 657590
MatMultAdd         48000 1.0 3.1671e+01 2.8 2.75e+09 1.3 5.3e+08 6.6e+02 0.0e+00 10  6 18  9  0  16  6 18  9  0 144528
MatMultTranspose   48000 1.0 2.4791e+01 2.5 2.75e+09 1.3 5.3e+08 6.6e+02 0.0e+00  5  6 18  9  0   8  6 18  9  0 184633
MatSolve            8000 0.0 5.5241e-02 0.0 2.90e+07 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   524
MatSOR             96000 1.0 7.5631e+01 1.1 2.17e+10 1.2 9.2e+08 1.5e+03 1.6e+04 26 51 31 33 38  44 51 31 34 38 484742
MatResidual        48000 1.0 2.6641e+01 1.5 9.11e+09 1.2 9.2e+08 1.5e+03 0.0e+00  8 21 31 33  0  13 22 31 34  0 574955
PCSetUpOnBlocks     8000 1.0 1.2669e-0116.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             8000 1.0 1.3677e+02 1.0 3.63e+10 1.2 2.9e+09 1.2e+03 1.6e+04 50 86 97 84 38  84 86 97 85 38 446993
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              9        11424     0.
     DMKSP interface     1              1          656     0.
              Vector     4             52      2388736     0.
              Matrix     0             65     15239012     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2             18       197872     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1             14       233696     0.
      Preconditioner     1              9         9676     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     8              0            0     0.
              Vector   176            128      3559744     0.
              Matrix   148             83     23608464     0.
      Matrix Coarsen     6              6         3816     0.
           Index Set   128            112       619344     0.
   Star Forest Graph    12             12        10368     0.
         Vec Scatter    34             21        26544     0.
      Preconditioner     8              0            0     0.

--- Event Stage 2: Remaining Solves

              Vector 48000          48000   2625920000     0.
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 0.000298405
Average time for zero size MPI_Send(): 7.23203e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trace.png
Type: image/png
Size: 317483 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180610/2e23a692/attachment-0003.png>


More information about the petsc-dev mailing list