[petsc-dev] [petsc-users] Poor weak scaling when solving successive linearsystems

Junchao Zhang jczhang at mcs.anl.gov
Tue Jun 12 11:32:28 CDT 2018


Mark,
  I tried "-pc_gamg_type agg ..." options you mentioned, and also -ksp_type
cg + PETSc's default PC bjacobi. In the latter case, to reduce execution
time I called KSPSolve 100 times instead of 1000, and also used -ksp_max_it
100. In the 36x48=1728 ranks case, I also did a test with -log_sync. From
there you can see a lot of time is spent on VecNormBarrier, which implies
load imbalance. Note VecScatterBarrie time is misleading, since it barriers
ALL ranks, but in reality VecScatter sort of syncs in a small neighborhood.
  Barry suggested trying periodic boundary condition so that the nonzeros
are perfectly balanced across processes. I will try that to see what
happens.

--Junchao Zhang

On Mon, Jun 11, 2018 at 8:09 AM, Mark Adams <mfadams at lbl.gov> wrote:

>
>
> On Mon, Jun 11, 2018 at 12:46 AM, Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
>
>> I used an LCRC machine named Bebop. I tested on its Intel Broadwell
>> nodes. Each nodes has 2 CPUs and 36 cores in total. I collected data using
>> 36 cores in a node or 18 cores in a node.  As you can see, 18 cores/node
>> gave much better performance, which is reasonable as routines like MatSOR,
>> MatMult, MatMultAdd are all bandwidth bound.
>>
>> The code uses a DMDA 3D grid, 7-point stencil, and defines
>> nodes(vertices) at the surface or second to the surface as boundary nodes.
>> Boundary nodes only have a diagonal one in their row in the matrix.
>> Interior nodes have 7 nonzeros in their row. Boundary processors in the
>> processor grid has less nonzero. This is one source of load-imbalance. Will
>> load-imbalance get severer at coarser grids of an MG level?
>>
>
> Yes.
>
> You can use a simple Jacobi solver to see the basic performance of your
> operator and machine. Do you see as much time spent in Vec Scatters?
> VecAXPY? etc.
>
>
>>
>> I attach a trace view figure that show activity of each ranks along the
>> time axis in one KSPSove. White color means MPI wait. You can see white
>> takes a large space.
>>
>> I don't have a good explanation why at large scale (1728 cores),
>> processors wait longer time, as the communication pattern is still 7-point
>> stencil in a cubic processor gird.
>>
>> --Junchao Zhang
>>
>> On Sat, Jun 9, 2018 at 11:32 AM, Smith, Barry F. <bsmith at mcs.anl.gov>
>> wrote:
>>
>>>
>>>   Junchao,
>>>
>>>       Thanks, the load balance of matrix entries is remarkably similar
>>> for the two runs so it can't be a matter of worse work load imbalance for
>>> SOR for the larger case explaining why the SOR takes more time.
>>>
>>>       Here is my guess (and I know no way to confirm it). In the smaller
>>> case the overlap of different processes on the same node running SOR at the
>>> same time is lower than the larger case hence the larger case is slower
>>> because there are more SOR processes fighting over the same memory
>>> bandwidth at the same time than in the smaller case.   Ahh, here is
>>> something you can try, lets undersubscribe the memory bandwidth needs, run
>>> on say 16 processes per node with 8 nodes and 16 processes per node with 64
>>> nodes and send the two -log_view output files. I assume this is an LCRC
>>> machine and NOT a KNL system?
>>>
>>>    Thanks
>>>
>>>
>>>    Barry
>>>
>>>
>>> > On Jun 9, 2018, at 8:29 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>> >
>>> > -pc_gamg_type classical
>>> >
>>> > FYI, we only support smoothed aggregation "agg" (the default). (This
>>> thread started by saying you were using GAMG.)
>>> >
>>> > It is not clear how much this will make a difference for you, but you
>>> don't want to use classical because we do not support it. It is meant as a
>>> reference implementation for developers.
>>> >
>>> > First, how did you get the idea to use classical? If the documentation
>>> lead you to believe this was a good thing to do then we need to fix that!
>>> >
>>> > Anyway, here is a generic input for GAMG:
>>> >
>>> > -pc_type gamg
>>> > -pc_gamg_type agg
>>> > -pc_gamg_agg_nsmooths 1
>>> > -pc_gamg_coarse_eq_limit 1000
>>> > -pc_gamg_reuse_interpolation true
>>> > -pc_gamg_square_graph 1
>>> > -pc_gamg_threshold 0.05
>>> > -pc_gamg_threshold_scale .0
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Jun 7, 2018 at 6:52 PM, Junchao Zhang <jczhang at mcs.anl.gov>
>>> wrote:
>>> > OK, I have thought that space was a typo. btw, this option does not
>>> show up in -h.
>>> > I changed number of ranks to use all cores on each node to avoid
>>> misleading ratio in -log_view. Since one node has 36 cores, I ran with
>>> 6^3=216 ranks, and 12^3=1728 ranks. I also found call counts of MatSOR etc
>>> in the two tests were different. So they are not strict weak scaling tests.
>>> I tried to add -ksp_max_it 6 -pc_mg_levels 6, but still could not make the
>>> two have the same MatSOR count. Anyway, I attached the load balance output.
>>> >
>>> > I find PCApply_MG calls PCMGMCycle_Private, which is recursive and
>>> indirectly calls MatSOR_MPIAIJ. I believe the following code in
>>> MatSOR_MPIAIJ practically syncs {MatSOR, MatMultAdd}_SeqAIJ  between
>>> processors through VecScatter at each MG level. If SOR and MatMultAdd are
>>> imbalanced, the cost is accumulated along MG levels and shows up as large
>>> VecScatter cost.
>>> > 1460:     while
>>> >  (its--) {
>>> >
>>> > 1461:       VecScatterBegin(mat->Mvctx,xx
>>> ,mat->lvec,INSERT_VALUES,SCATTER_FORWARD
>>> > );
>>> >
>>> > 1462:       VecScatterEnd(mat->Mvctx,xx,m
>>> at->lvec,INSERT_VALUES,SCATTER_FORWARD
>>> > );
>>> >
>>> >
>>> > 1464:       /* update rhs: bb1 = bb - B*x */
>>> > 1465:       VecScale
>>> > (mat->lvec,-1.0);
>>> >
>>> > 1466:       (*mat->B->ops->multadd)(mat->
>>> > B,mat->lvec,bb,bb1);
>>> >
>>> >
>>> > 1468:       /* local sweep */
>>> > 1469:       (*mat->A->ops->sor)(mat->A,bb1,omega,SOR_SYMMETRIC_SWEEP,
>>> > fshift,lits,1,xx);
>>> >
>>> > 1470:     }
>>> >
>>> >
>>> >
>>> > --Junchao Zhang
>>> >
>>> > On Thu, Jun 7, 2018 at 3:11 PM, Smith, Barry F. <bsmith at mcs.anl.gov>
>>> wrote:
>>> >
>>> >
>>> > > On Jun 7, 2018, at 12:27 PM, Zhang, Junchao <jczhang at mcs.anl.gov>
>>> wrote:
>>> > >
>>> > > Searched but could not find this option, -mat_view::load_balance
>>> >
>>> >    There is a space between the view and the :   load_balance is a
>>> particular viewer format that causes the printing of load balance
>>> information about number of nonzeros in the matrix.
>>> >
>>> >    Barry
>>> >
>>> > >
>>> > > --Junchao Zhang
>>> > >
>>> > > On Thu, Jun 7, 2018 at 10:46 AM, Smith, Barry F. <bsmith at mcs.anl.gov>
>>> wrote:
>>> > >  So the only surprise in the results is the SOR. It is
>>> embarrassingly parallel and normally one would not see a jump.
>>> > >
>>> > >  The load balance for SOR time 1.5 is better at 1000 processes than
>>> for 125 processes of 2.1  not worse so this number doesn't easily explain
>>> it.
>>> > >
>>> > >  Could you run the 125 and 1000 with -mat_view ::load_balance and
>>> see what you get out?
>>> > >
>>> > >    Thanks
>>> > >
>>> > >      Barry
>>> > >
>>> > >  Notice that the MatSOR time jumps a lot about 5 secs when the
>>> -log_sync is on. My only guess is that the MatSOR is sharing memory
>>> bandwidth (or some other resource? cores?) with the VecScatter and for some
>>> reason this is worse for 1000 cores but I don't know why.
>>> > >
>>> > > > On Jun 6, 2018, at 9:13 PM, Junchao Zhang <jczhang at mcs.anl.gov>
>>> wrote:
>>> > > >
>>> > > > Hi, PETSc developers,
>>> > > >  I tested Michael Becker's code. The code calls the same KSPSolve
>>> 1000 times in the second stage and needs cubic number of processors to run.
>>> I ran with 125 ranks and 1000 ranks, with or without -log_sync option. I
>>> attach the log view output files and a scaling loss excel file.
>>> > > >  I profiled the code with 125 processors. It looks {MatSOR,
>>> MatMult, MatMultAdd, MatMultTranspose, MatMultTransposeAdd}_SeqAIJ in aij.c
>>> took ~50% of the time,  The other half time was spent on waiting in MPI.
>>> MatSOR_SeqAIJ took 30%, mostly in PetscSparseDenseMinusDot().
>>> > > >  I tested it on a 36 cores/node machine. I found 32 ranks/node
>>> gave better performance (about 10%) than 36 ranks/node in the 125 ranks
>>> testing.  I guess it is because processors in the former had more balanced
>>> memory bandwidth. I collected PAPI_DP_OPS (double precision operations) and
>>> PAPI_TOT_CYC (total cycles) of the 125 ranks case (see the attached files).
>>> It looks ranks at the two ends have less DP_OPS and TOT_CYC.
>>> > > >  Does anyone familiar with the algorithm have quick explanations?
>>> > > >
>>> > > > --Junchao Zhang
>>> > > >
>>> > > > On Mon, Jun 4, 2018 at 11:59 AM, Michael Becker <
>>> Michael.Becker at physik.uni-giessen.de> wrote:
>>> > > > Hello again,
>>> > > >
>>> > > > this took me longer than I anticipated, but here we go.
>>> > > > I did reruns of the cases where only half the processes per node
>>> were used (without -log_sync):
>>> > > >
>>> > > >                     125 procs,1st           125 procs,2nd
>>> 1000 procs,1st          1000 procs,2nd
>>> > > >                   Max        Ratio        Max        Ratio
>>> Max        Ratio        Max        Ratio
>>> > > > KSPSolve           1.203E+02    1.0        1.210E+02    1.0
>>> 1.399E+02    1.1        1.365E+02    1.0
>>> > > > VecTDot            6.376E+00    3.7        6.551E+00    4.0
>>> 7.885E+00    2.9        7.175E+00    3.4
>>> > > > VecNorm            4.579E+00    7.1        5.803E+00   10.2
>>> 8.534E+00    6.9        6.026E+00    4.9
>>> > > > VecScale           1.070E-01    2.1        1.129E-01    2.2
>>> 1.301E-01    2.5        1.270E-01    2.4
>>> > > > VecCopy            1.123E-01    1.3        1.149E-01    1.3
>>> 1.301E-01    1.6        1.359E-01    1.6
>>> > > > VecSet             7.063E-01    1.7        6.968E-01    1.7
>>> 7.432E-01    1.8        7.425E-01    1.8
>>> > > > VecAXPY            1.166E+00    1.4        1.167E+00    1.4
>>> 1.221E+00    1.5        1.279E+00    1.6
>>> > > > VecAYPX            1.317E+00    1.6        1.290E+00    1.6
>>> 1.536E+00    1.9        1.499E+00    2.0
>>> > > > VecScatterBegin    6.142E+00    3.2        5.974E+00    2.8
>>> 6.448E+00    3.0        6.472E+00    2.9
>>> > > > VecScatterEnd      3.606E+01    4.2        3.551E+01    4.0
>>> 5.244E+01    2.7        4.995E+01    2.7
>>> > > > MatMult            3.561E+01    1.6        3.403E+01    1.5
>>> 3.435E+01    1.4        3.332E+01    1.4
>>> > > > MatMultAdd         1.124E+01    2.0        1.130E+01    2.1
>>> 2.093E+01    2.9        1.995E+01    2.7
>>> > > > MatMultTranspose   1.372E+01    2.5        1.388E+01    2.6
>>> 1.477E+01    2.2        1.381E+01    2.1
>>> > > > MatSolve           1.949E-02    0.0        1.653E-02    0.0
>>> 4.789E-02    0.0        4.466E-02    0.0
>>> > > > MatSOR             6.610E+01    1.3        6.673E+01    1.3
>>> 7.111E+01    1.3        7.105E+01    1.3
>>> > > > MatResidual        2.647E+01    1.7        2.667E+01    1.7
>>> 2.446E+01    1.4        2.467E+01    1.5
>>> > > > PCSetUpOnBlocks    5.266E-03    1.4        5.295E-03    1.4
>>> 5.427E-03    1.5        5.289E-03    1.4
>>> > > > PCApply            1.031E+02    1.0        1.035E+02    1.0
>>> 1.180E+02    1.0        1.164E+02    1.0
>>> > > >
>>> > > > I also slimmed down my code and basically wrote a simple weak
>>> scaling test (source files attached) so you can profile it yourself. I
>>> appreciate the offer Junchao, thank you.
>>> > > > You can adjust the system size per processor at runtime via
>>> "-nodes_per_proc 30" and the number of repeated calls to the function
>>> containing KSPsolve() via "-iterations 1000". The physical problem is
>>> simply calculating the electric potential from a homogeneous charge
>>> distribution, done multiple times to accumulate time in KSPsolve().
>>> > > > A job would be started using something like
>>> > > > mpirun -n 125 ~/petsc_ws/ws_test -nodes_per_proc 30 -mesh_size
>>> 1E-4 -iterations 1000 \\
>>> > > > -ksp_rtol 1E-6 \
>>> > > > -log_view -log_sync\
>>> > > > -pc_type gamg -pc_gamg_type classical\
>>> > > > -ksp_type cg \
>>> > > > -ksp_norm_type unpreconditioned \
>>> > > > -mg_levels_ksp_type richardson \
>>> > > > -mg_levels_ksp_norm_type none \
>>> > > > -mg_levels_pc_type sor \
>>> > > > -mg_levels_ksp_max_it 1 \
>>> > > > -mg_levels_pc_sor_its 1 \
>>> > > > -mg_levels_esteig_ksp_type cg \
>>> > > > -mg_levels_esteig_ksp_max_it 10 \
>>> > > > -gamg_est_ksp_type cg
>>> > > > , ideally started on a cube number of processes for a cubical
>>> process grid.
>>> > > > Using 125 processes and 10.000 iterations I get the output in
>>> "log_view_125_new.txt", which shows the same imbalance for me.
>>> > > > Michael
>>> > > >
>>> > > >
>>> > > > Am 02.06.2018 um 13:40 schrieb Mark Adams:
>>> > > >>
>>> > > >>
>>> > > >> On Fri, Jun 1, 2018 at 11:20 PM, Junchao Zhang <
>>> jczhang at mcs.anl.gov> wrote:
>>> > > >> Hi,Michael,
>>> > > >>  You can add -log_sync besides -log_view, which adds barriers to
>>> certain events but measures barrier time separately from the events. I find
>>> this option makes it easier to interpret log_view output.
>>> > > >>
>>> > > >> That is great (good to know).
>>> > > >>
>>> > > >> This should give us a better idea if your large VecScatter costs
>>> are from slow communication or if it catching some sort of load imbalance.
>>> > > >>
>>> > > >>
>>> > > >> --Junchao Zhang
>>> > > >>
>>> > > >> On Wed, May 30, 2018 at 3:27 AM, Michael Becker <
>>> Michael.Becker at physik.uni-giessen.de> wrote:
>>> > > >> Barry: On its way. Could take a couple days again.
>>> > > >>
>>> > > >> Junchao: I unfortunately don't have access to a cluster with a
>>> faster network. This one has a mixed 4X QDR-FDR InfiniBand 2:1 blocking
>>> fat-tree network, which I realize causes parallel slowdown if the nodes are
>>> not connected to the same switch. Each node has 24 processors (2x12/socket)
>>> and four NUMA domains (two for each socket).
>>> > > >> The ranks are usually not distributed perfectly even, i.e. for
>>> 125 processes, of the six required nodes, five would use 21 cores and one
>>> 20.
>>> > > >> Would using another CPU type make a difference
>>> communication-wise? I could switch to faster ones (on the same network),
>>> but I always assumed this would only improve performance of the stuff that
>>> is unrelated to communication.
>>> > > >>
>>> > > >> Michael
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>> The log files have something like "Average time for zero size
>>> MPI_Send(): 1.84231e-05". It looks you ran on a cluster with a very slow
>>> network. A typical machine should give less than 1/10 of the latency you
>>> have. An easy way to try is just running the code on a machine with a
>>> faster network and see what happens.
>>> > > >>>
>>> > > >>> Also, how many cores & numa domains does a compute node have? I
>>> could not figure out how you distributed the 125 MPI ranks evenly.
>>> > > >>>
>>> > > >>> --Junchao Zhang
>>> > > >>>
>>> > > >>> On Tue, May 29, 2018 at 6:18 AM, Michael Becker <
>>> Michael.Becker at physik.uni-giessen.de> wrote:
>>> > > >>> Hello again,
>>> > > >>>
>>> > > >>> here are the updated log_view files for 125 and 1000 processors.
>>> I ran both problems twice, the first time with all processors per node
>>> allocated ("-1.txt"), the second with only half on twice the number of
>>> nodes ("-2.txt").
>>> > > >>>
>>> > > >>>>> On May 24, 2018, at 12:24 AM, Michael Becker <
>>> Michael.Becker at physik.uni-giessen.de>
>>> > > >>>>> wrote:
>>> > > >>>>>
>>> > > >>>>> I noticed that for every individual KSP iteration, six vector
>>> objects are created and destroyed (with CG, more with e.g. GMRES).
>>> > > >>>>>
>>> > > >>>>   Hmm, it is certainly not intended at vectors be created and
>>> destroyed within each KSPSolve() could you please point us to the code that
>>> makes you think they are being created and destroyed?   We create all the
>>> work vectors at KSPSetUp() and destroy them in KSPReset() not during the
>>> solve. Not that this would be a measurable distance.
>>> > > >>>>
>>> > > >>>
>>> > > >>> I mean this, right in the log_view output:
>>> > > >>>
>>> > > >>>> Memory usage is given in bytes:
>>> > > >>>>
>>> > > >>>> Object Type Creations Destructions Memory Descendants' Mem.
>>> > > >>>> Reports information only for process 0.
>>> > > >>>>
>>> > > >>>> --- Event Stage 0: Main Stage
>>> > > >>>>
>>> > > >>>> ...
>>> > > >>>>
>>> > > >>>> --- Event Stage 1: First Solve
>>> > > >>>>
>>> > > >>>> ...
>>> > > >>>>
>>> > > >>>> --- Event Stage 2: Remaining Solves
>>> > > >>>>
>>> > > >>>> Vector 23904 23904 1295501184 0.
>>> > > >>> I logged the exact number of KSP iterations over the 999
>>> timesteps and its exactly 23904/6 = 3984.
>>> > > >>> Michael
>>> > > >>>
>>> > > >>>
>>> > > >>> Am 24.05.2018 um 19:50 schrieb Smith, Barry F.:
>>> > > >>>>
>>> > > >>>>  Please send the log file for 1000 with cg as the solver.
>>> > > >>>>
>>> > > >>>>   You should make a bar chart of each event for the two cases
>>> to see which ones are taking more time and which are taking less (we cannot
>>> tell with the two logs you sent us since they are for different solvers.)
>>> > > >>>>
>>> > > >>>>
>>> > > >>>>
>>> > > >>>>
>>> > > >>>>> On May 24, 2018, at 12:24 AM, Michael Becker <
>>> Michael.Becker at physik.uni-giessen.de>
>>> > > >>>>> wrote:
>>> > > >>>>>
>>> > > >>>>> I noticed that for every individual KSP iteration, six vector
>>> objects are created and destroyed (with CG, more with e.g. GMRES).
>>> > > >>>>>
>>> > > >>>>   Hmm, it is certainly not intended at vectors be created and
>>> destroyed within each KSPSolve() could you please point us to the code that
>>> makes you think they are being created and destroyed?   We create all the
>>> work vectors at KSPSetUp() and destroy them in KSPReset() not during the
>>> solve. Not that this would be a measurable distance.
>>> > > >>>>
>>> > > >>>>
>>> > > >>>>
>>> > > >>>>
>>> > > >>>>> This seems kind of wasteful, is this supposed to be like this?
>>> Is this even the reason for my problems? Apart from that, everything seems
>>> quite normal to me (but I'm not the expert here).
>>> > > >>>>>
>>> > > >>>>>
>>> > > >>>>> Thanks in advance.
>>> > > >>>>>
>>> > > >>>>> Michael
>>> > > >>>>>
>>> > > >>>>>
>>> > > >>>>>
>>> > > >>>>> <log_view_125procs.txt><log_vi
>>> > > >>>>> ew_1000procs.txt>
>>> > > >>>>>
>>> > > >>>
>>> > > >>>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >
>>> > > >
>>> > > > <o-wstest-125.txt><Scaling-loss.png><o-wstest-1000.txt><o-ws
>>> test-sync-125.txt><o-wstest-sync-1000.txt><MatSOR_SeqAIJ.png
>>> ><PAPI_TOT_CYC.png><PAPI_DP_OPS.png>
>>> > >
>>> > >
>>> >
>>> >
>>> >
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180612/cf768443/attachment-0001.html>
-------------- next part --------------
using 216 of 216 processes
30^3 unknowns per processor
total system size: 180^3
mesh size: 0.0001

initsolve: 10 iterations
solve 1: 10 iterations
solve 2: 10 iterations
solve 3: 10 iterations
solve 4: 10 iterations
solve 5: 10 iterations
solve 6: 10 iterations
solve 7: 10 iterations
solve 8: 10 iterations
solve 9: 10 iterations
solve 10: 10 iterations
solve 20: 10 iterations
solve 30: 10 iterations
solve 40: 10 iterations
solve 50: 10 iterations
solve 60: 10 iterations
solve 70: 10 iterations
solve 80: 10 iterations
solve 90: 10 iterations
solve 100: 10 iterations
solve 200: 10 iterations
solve 300: 10 iterations
solve 400: 10 iterations
solve 500: 10 iterations
solve 600: 10 iterations
solve 700: 10 iterations
solve 800: 10 iterations
solve 900: 10 iterations
solve 1000: 10 iterations

Time in solve():      109.012 s
Time in KSPSolve():   108.771 s (99.7789%)

Number of   KSP iterations (total): 10000
Number of solve iterations (total): 1000 (ratio: 10.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdwd-0003 with 216 processors, by jczhang Sun Jun 10 23:54:41 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           1.126e+02      1.00002   1.126e+02
Objects:              4.045e+04      1.00002   4.045e+04
Flop:                 3.198e+10      1.10193   3.098e+10  6.691e+12
Flop/sec:            2.839e+08      1.10195   2.751e+08  5.941e+10
MPI Messages:         2.571e+06      4.18338   1.529e+06  3.302e+08
MPI Message Lengths:  2.182e+09      2.17075   1.164e+03  3.844e+11
MPI Reductions:       5.260e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.7742e-01   0.2%  0.0000e+00   0.0%  2.160e+03   0.0%  1.802e+03        0.0%  1.700e+01   0.0% 
 1:     First Solve: 3.4111e+00   3.0%  1.0911e+10   0.2%  7.694e+05   0.2%  1.496e+03        0.3%  5.760e+02   1.1% 
 2: Remaining Solves: 1.0903e+02  96.8%  6.6798e+12  99.8%  3.294e+08  99.8%  1.163e+03       99.7%  5.200e+04  98.9% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 7.1764e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSided          4 1.0 2.3761e-03 2.9 0.00e+00 0.0 5.1e+03 4.0e+00 0.0e+00  0  0  0  0  0   0  0  1  0  0     0
BuildTwoSidedF        40 1.0 7.3972e-02 2.2 0.00e+00 0.0 2.2e+04 6.4e+03 0.0e+00  0  0  0  0  0   2  0  3 12  0     0
KSPSetUp              11 1.0 1.9271e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  0   0  0  0  0  2     0
KSPSolve               1 1.0 3.4107e+00 1.0 5.31e+07 1.2 7.7e+05 1.5e+03 5.8e+02  3  0  0  0  1 100100100100100  3199
VecTDot              104 1.0 1.0672e-02 1.7 2.32e+06 1.0 0.0e+00 0.0e+00 1.0e+02  0  0  0  0  0   0  5  0  0 18 46901
VecNorm               12 1.0 1.6296e-03 1.8 6.48e+05 1.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  1  0  0  2 85891
VecScale              40 1.0 3.9005e-04 4.7 9.01e+04 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 40510
VecCopy                9 1.0 3.4094e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               197 1.0 2.0678e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              100 1.0 3.4885e-03 1.2 2.26e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  4  0  0  0 139836
VecAYPX               86 1.0 2.7862e-03 1.3 1.34e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  3  0  0  0 103723
VecAssemblyBegin      14 1.0 2.4695e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd        14 1.0 2.8300e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult      44 1.0 1.2159e-03 1.3 3.26e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 57569
VecScatterBegin      230 1.0 7.8228e-03 2.3 0.00e+00 0.0 4.6e+05 1.4e+03 0.0e+00  0  0  0  0  0   0  0 59 55  0     0
VecScatterEnd        230 1.0 2.2854e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSetRandom           4 1.0 1.1837e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               91 1.0 3.2917e-02 1.2 1.52e+07 1.1 1.8e+05 1.9e+03 0.0e+00  0  0  0  0  0   1 29 24 29  0 96787
MatMultAdd            40 1.0 1.4560e-02 2.1 2.38e+06 1.3 7.4e+04 3.0e+02 0.0e+00  0  0  0  0  0   0  4 10  2  0 32696
MatMultTranspose      40 1.0 1.3394e-02 1.7 2.38e+06 1.3 7.4e+04 3.0e+02 0.0e+00  0  0  0  0  0   0  4 10  2  0 35544
MatSolve              10 0.0 6.1989e-05 0.0 1.22e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   198
MatSOR                80 1.0 4.1121e-02 1.1 1.40e+07 1.1 8.5e+04 1.5e+03 2.0e+01  0  0  0  0  0   1 27 11 11  3 71623
MatLUFactorSym         1 1.0 7.9155e-05 6.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 7.7009e-0519.0 1.01e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   131
MatConvert             4 1.0 4.0369e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale              12 1.0 2.5949e-03 1.7 8.36e+05 1.2 8.5e+03 1.5e+03 0.0e+00  0  0  0  0  0   0  2  1  1  0 66198
MatResidual           40 1.0 1.5780e-02 1.5 5.98e+06 1.1 8.5e+04 1.5e+03 0.0e+00  0  0  0  0  0   0 11 11 11  0 78689
MatAssemblyBegin      83 1.0 7.5437e-02 2.0 0.00e+00 0.0 2.2e+04 6.4e+03 0.0e+00  0  0  0  0  0   2  0  3 12  0     0
MatAssemblyEnd        83 1.0 7.3102e-02 1.3 0.00e+00 0.0 1.0e+05 2.2e+02 1.8e+02  0  0  0  0  0   2  0 13  2 31     0
MatGetRow          88851 1.0 1.4340e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0  42  0  0  0  0     0
MatGetRowIJ            1 0.0 1.8120e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMat        4 1.0 1.1639e-02 1.9 0.00e+00 0.0 4.5e+03 5.5e+02 6.4e+01  0  0  0  0  0   0  0  1  0 11     0
MatGetOrdering         1 0.0 9.7036e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCoarsen             4 1.0 9.6412e-03 1.1 0.00e+00 0.0 9.4e+04 8.0e+02 2.5e+01  0  0  0  0  0   0  0 12  6  4     0
MatZeroEntries         4 1.0 4.4084e-04 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAXPY                4 1.0 9.6810e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0  28  0  0  0  0     0
MatMatMult             4 1.0 4.4704e-02 1.0 5.98e+05 1.3 5.4e+04 7.0e+02 4.9e+01  0  0  0  0  0   1  1  7  3  9  2667
MatMatMultSym          4 1.0 3.8034e-02 1.0 0.00e+00 0.0 4.6e+04 5.6e+02 4.8e+01  0  0  0  0  0   1  0  6  2  8     0
MatMatMultNum          4 1.0 6.2041e-03 1.0 5.98e+05 1.3 8.5e+03 1.5e+03 0.0e+00  0  0  0  0  0   0  1  1  1  0 19219
MatPtAP                4 1.0 1.0061e-01 1.0 8.06e+06 1.5 1.2e+05 2.5e+03 6.1e+01  0  0  0  0  0   3 14 15 26 11 14973
MatPtAPSymbolic        4 1.0 6.1238e-02 1.0 0.00e+00 0.0 5.6e+04 2.8e+03 2.8e+01  0  0  0  0  0   2  0  7 14  5     0
MatPtAPNumeric         4 1.0 3.9100e-02 1.0 8.06e+06 1.5 6.1e+04 2.2e+03 3.2e+01  0  0  0  0  0   1 14  8 12  6 38528
MatTrnMatMult          1 1.0 8.1200e-02 1.0 2.72e+06 1.3 1.0e+04 9.2e+03 1.6e+01  0  0  0  0  0   2  5  1  8  3  6689
MatTrnMatMultSym       1 1.0 5.5003e-02 1.0 0.00e+00 0.0 9.0e+03 4.6e+03 1.6e+01  0  0  0  0  0   2  0  1  4  3     0
MatTrnMatMultNum       1 1.0 2.6250e-02 1.0 2.72e+06 1.3 1.1e+03 4.8e+04 0.0e+00  0  0  0  0  0   1  5  0  5  0 20693
MatGetLocalMat        14 1.0 7.8259e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol         12 1.0 5.9967e-03 1.5 0.00e+00 0.0 5.9e+04 2.7e+03 0.0e+00  0  0  0  0  0   0  0  8 14  0     0
SFSetGraph             4 1.0 3.9816e-0510.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                4 1.0 2.9175e-03 1.8 0.00e+00 0.0 1.5e+04 6.6e+02 0.0e+00  0  0  0  0  0   0  0  2  1  0     0
SFBcastBegin          33 1.0 1.0576e-03 2.5 0.00e+00 0.0 7.8e+04 8.2e+02 0.0e+00  0  0  0  0  0   0  0 10  6  0     0
SFBcastEnd            33 1.0 1.6055e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCGAMGGraph_AGG        4 1.0 1.9680e+00 1.0 5.98e+05 1.1 2.5e+04 7.4e+02 4.8e+01  2  0  0  0  0  58  1  3  2  8    63
PCGAMGCoarse_AGG       4 1.0 1.0160e-01 1.0 2.72e+06 1.3 1.2e+05 2.0e+03 4.5e+01  0  0  0  0  0   3  5 15 21  8  5346
PCGAMGProl_AGG         4 1.0 2.3153e-02 1.1 0.00e+00 0.0 3.5e+04 1.3e+03 6.4e+01  0  0  0  0  0   1  0  5  4 11     0
PCGAMGPOpt_AGG         4 1.0 1.0441e+00 1.0 9.81e+06 1.1 1.4e+05 1.2e+03 1.6e+02  1  0  0  0  0  31 19 18 14 29  1965
GAMG: createProl       4 1.0 3.1363e+00 1.0 1.31e+07 1.1 3.2e+05 1.5e+03 3.2e+02  3  0  0  0  1  92 25 41 40 56   867
  Graph                8 1.0 1.9667e+00 1.0 5.98e+05 1.1 2.5e+04 7.4e+02 4.8e+01  2  0  0  0  0  58  1  3  2  8    63
  MIS/Agg              4 1.0 9.7492e-03 1.1 0.00e+00 0.0 9.4e+04 8.0e+02 2.5e+01  0  0  0  0  0   0  0 12  6  4     0
  SA: col data         4 1.0 5.0085e-03 1.1 0.00e+00 0.0 2.1e+04 2.0e+03 1.6e+01  0  0  0  0  0   0  0  3  4  3     0
  SA: frmProl0         4 1.0 1.5262e-02 1.0 0.00e+00 0.0 1.4e+04 3.7e+02 3.2e+01  0  0  0  0  0   0  0  2  0  6     0
  SA: smooth           4 1.0 1.0149e+00 1.0 8.36e+05 1.3 5.4e+04 7.0e+02 5.7e+01  1  0  0  0  0  30  2  7  3 10   164
GAMG: partLevel        4 1.0 1.1628e-01 1.0 8.06e+06 1.5 1.2e+05 2.4e+03 1.6e+02  0  0  0  0  0   3 14 16 26 28 12956
  repartition          2 1.0 8.7881e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  2     0
  Invert-Sort          2 1.0 1.3301e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  1     0
  Move A               2 1.0 9.6662e-03 2.4 0.00e+00 0.0 1.9e+03 1.3e+03 3.4e+01  0  0  0  0  0   0  0  0  0  6     0
  Move P               2 1.0 8.2741e-03 3.1 0.00e+00 0.0 2.6e+03 2.1e+01 3.4e+01  0  0  0  0  0   0  0  0  0  6     0
PCSetUp                2 1.0 3.2645e+00 1.0 2.12e+07 1.3 4.4e+05 1.7e+03 5.1e+02  3  0  0  0  1  96 39 57 66 88  1294
PCSetUpOnBlocks       10 1.0 4.4870e-04 3.5 1.01e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    23
PCApply               10 1.0 7.8002e-02 1.0 2.48e+07 1.1 3.2e+05 9.4e+02 2.0e+01  0  0  0  0  0   2 47 41 26  3 65884

--- Event Stage 2: Remaining Solves

KSPSolve            1000 1.0 1.0878e+02 1.0 3.19e+10 1.1 3.3e+08 1.2e+03 5.2e+04 97100100100 99 100100100100100 61407
VecTDot            20000 1.0 7.9760e+00 1.4 1.08e+09 1.0 0.0e+00 0.0e+00 2.0e+04  6  3  0  0 38   6  3  0  0 38 29247
VecNorm            12000 1.0 2.9418e+00 1.1 6.48e+08 1.0 0.0e+00 0.0e+00 1.2e+04  3  2  0  0 23   3  2  0  0 23 47579
VecScale           40000 1.0 2.1746e-01 1.7 9.01e+07 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 72661
VecCopy             1000 1.0 8.4139e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            150000 1.0 1.4097e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY            20000 1.0 1.3505e+00 1.1 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0 172735
VecAYPX            50000 1.0 1.6715e+00 1.3 8.09e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0 104366
VecScatterBegin   171000 1.0 5.6701e+00 2.4 0.00e+00 0.0 3.3e+08 1.2e+03 0.0e+00  4  0100100  0   4  0100100  0     0
VecScatterEnd     171000 1.0 2.8652e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21  0  0  0  0  21  0  0  0  0     0
MatMult            51000 1.0 2.2225e+01 1.2 9.55e+09 1.1 9.7e+07 2.2e+03 0.0e+00 17 30 29 55  0  18 30 29 55  0 90340
MatMultAdd         40000 1.0 1.9086e+01 2.1 2.38e+09 1.3 7.4e+07 3.0e+02 0.0e+00 13  7 22  6  0  13  7 22  6  0 24944
MatMultTranspose   40000 1.0 1.5344e+01 1.8 2.38e+09 1.3 7.4e+07 3.0e+02 0.0e+00 10  7 22  6  0  11  7 22  6  0 31027
MatSolve           10000 0.0 5.1090e-02 0.0 1.22e+07 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   240
MatSOR             80000 1.0 4.7799e+01 1.2 1.40e+10 1.1 8.5e+07 1.5e+03 2.0e+04 40 44 26 33 38  41 44 26 33 38 61484
MatResidual        40000 1.0 1.6702e+01 1.4 5.98e+09 1.1 8.5e+07 1.5e+03 0.0e+00 13 19 26 33  0  13 19 26 33  0 74349
PCSetUpOnBlocks    10000 1.0 1.4676e-01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply            10000 1.0 9.0367e+01 1.0 2.47e+10 1.1 3.2e+08 9.4e+02 2.0e+04 80 77 96 77 38  82 77 96 78 38 56799
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              7         8816     0.
     DMKSP interface     1              1          656     0.
              Vector     4             38      1915568     0.
              Matrix     0             50      9208336     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2             12       137820     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1             10       228640     0.
      Preconditioner     1              7         7448     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver    10              4         6400     0.
              Vector   171            137      6010696     0.
              Matrix   122             72     21592712     0.
      Matrix Coarsen     4              4         2544     0.
           Index Set    76             66       118624     0.
   Star Forest Graph     4              4         3456     0.
         Vec Scatter    28             19        24048     0.
      Preconditioner    10              4         3424     0.
         PetscRandom     8              8         5168     0.

--- Event Stage 2: Remaining Solves

              Vector 40000          40000   2398240000     0.
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 8.96454e-06
Average time for zero size MPI_Send(): 1.3921e-05
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_agg_nsmooths 1
-pc_gamg_coarse_eq_limit 1000
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 1
-pc_gamg_threshold 0.05
-pc_gamg_threshold_scale .0
-pc_gamg_type agg
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
srun: Warning: can't honor --ntasks-per-node set to 36 which doesn't match the requested tasks 48 with the number of requested nodes 48. Ignoring --ntasks-per-node.
using 1728 of 1728 processes
30^3 unknowns per processor
total system size: 360^3
mesh size: 0.0001

initsolve: 12 iterations
solve 1: 12 iterations
solve 2: 12 iterations
solve 3: 12 iterations
solve 4: 12 iterations
solve 5: 12 iterations
solve 6: 12 iterations
solve 7: 12 iterations
solve 8: 12 iterations
solve 9: 12 iterations
solve 10: 12 iterations
solve 20: 12 iterations
solve 30: 12 iterations
solve 40: 12 iterations
solve 50: 12 iterations
solve 60: 12 iterations
solve 70: 12 iterations
solve 80: 12 iterations
solve 90: 12 iterations
solve 100: 12 iterations
solve 200: 12 iterations
solve 300: 12 iterations
solve 400: 12 iterations
solve 500: 12 iterations
solve 600: 12 iterations
solve 700: 12 iterations
solve 800: 12 iterations
solve 900: 12 iterations
solve 1000: 12 iterations

Time in solve():      183.41 s
Time in KSPSolve():   183.154 s (99.8607%)

Number of   KSP iterations (total): 12000
Number of solve iterations (total): 1000 (ratio: 12.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdw-0065 with 1728 processors, by jczhang Sun Jun 10 23:56:56 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           1.873e+02      1.00001   1.873e+02
Objects:              4.846e+04      1.00002   4.846e+04
Flop:                 3.894e+10      1.11735   3.767e+10  6.509e+13
Flop/sec:            2.080e+08      1.11735   2.012e+08  3.476e+11
MPI Messages:         4.708e+06      6.19009   2.297e+06  3.969e+09
MPI Message Lengths:  2.889e+09      2.40120   1.037e+03  4.115e+12
MPI Reductions:       6.264e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.1424e-01   0.1%  0.0000e+00   0.0%  1.901e+04   0.0%  1.802e+03        0.0%  1.700e+01   0.0% 
 1:     First Solve: 3.7160e+00   2.0%  1.0002e+11   0.2%  8.597e+06   0.2%  1.305e+03        0.3%  6.140e+02   1.0% 
 2: Remaining Solves: 1.8343e+02  98.0%  6.4988e+13  99.8%  3.961e+09  99.8%  1.036e+03       99.7%  6.200e+04  99.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 1.5020e-04 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSided          4 1.0 2.8999e-03 2.2 0.00e+00 0.0 5.1e+04 4.0e+00 0.0e+00  0  0  0  0  0   0  0  1  0  0     0
BuildTwoSidedF        40 1.0 1.2657e-01 2.3 0.00e+00 0.0 2.1e+05 6.2e+03 0.0e+00  0  0  0  0  0   2  0  2 12  0     0
KSPSetUp              11 1.0 5.3284e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  0   0  0  0  0  2     0
KSPSolve               1 1.0 3.7157e+00 1.0 6.06e+07 1.2 8.6e+06 1.3e+03 6.1e+02  2  0  0  0  1 100100100100100 26919
VecTDot              108 1.0 2.7509e-02 1.4 2.54e+06 1.0 0.0e+00 0.0e+00 1.1e+02  0  0  0  0  0   1  4  0  0 18 159329
VecNorm               14 1.0 2.2557e-03 1.7 7.56e+05 1.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  0   0  1  0  0  2 579146
VecScale              48 1.0 4.7922e-04 3.6 1.24e+05 2.5 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 355623
VecCopy                9 1.0 4.0150e-04 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               227 1.0 2.5015e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              104 1.0 5.3051e-03 1.7 2.48e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  4  0  0  0 806970
VecAYPX               96 1.0 3.2096e-03 1.4 1.51e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  3  0  0  0 811392
VecAssemblyBegin      14 1.0 3.1312e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd        14 1.0 3.3569e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult      44 1.0 1.2739e-03 1.3 3.26e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 440721
VecScatterBegin      264 1.0 1.2353e-02 2.9 0.00e+00 0.0 5.2e+06 1.2e+03 0.0e+00  0  0  0  0  0   0  0 61 57  0     0
VecScatterEnd        264 1.0 6.0780e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
VecSetRandom           4 1.0 1.1327e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult              101 1.0 4.6448e-02 1.5 1.76e+07 1.1 2.0e+06 1.7e+03 0.0e+00  0  0  0  0  0   1 29 23 30  0 625296
MatMultAdd            48 1.0 5.0853e-02 4.2 2.86e+06 1.3 9.1e+05 2.7e+02 0.0e+00  0  0  0  0  0   1  5 11  2  0 93269
MatMultTranspose      48 1.0 3.3805e-02 3.4 2.86e+06 1.3 9.1e+05 2.7e+02 0.0e+00  0  0  0  0  0   0  5 11  2  0 140302
MatSolve              12 0.0 6.8688e-04 0.0 8.45e+05 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1230
MatSOR                96 1.0 5.9859e-02 1.2 1.72e+07 1.1 1.0e+06 1.4e+03 2.4e+01  0  0  0  0  0   1 29 12 12  4 478874
MatLUFactorSym         1 1.0 3.2248e-03294.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 2.8222e-03986.4 4.22e+06 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1494
MatConvert             4 1.0 1.2332e-02 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatScale              12 1.0 3.8207e-03 2.6 8.60e+05 1.2 8.4e+04 1.4e+03 0.0e+00  0  0  0  0  0   0  1  1  1  0 368774
MatResidual           48 1.0 2.7284e-02 2.0 7.46e+06 1.2 1.0e+06 1.4e+03 0.0e+00  0  0  0  0  0   0 12 12 12  0 445843
MatAssemblyBegin      83 1.0 1.2762e-01 2.1 0.00e+00 0.0 2.1e+05 6.2e+03 0.0e+00  0  0  0  0  0   2  0  2 12  0     0
MatAssemblyEnd        83 1.0 1.2948e-01 1.6 0.00e+00 0.0 1.1e+06 1.8e+02 1.8e+02  0  0  0  0  0   3  0 13  2 29     0
MatGetRow          89034 1.0 1.4585e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0  38  0  0  0  0     0
MatGetRowIJ            1 0.0 1.0991e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMat        4 1.0 2.8479e-02 1.3 0.00e+00 0.0 5.9e+04 4.9e+02 6.4e+01  0  0  0  0  0   1  0  1  0 10     0
MatGetOrdering         1 0.0 2.7084e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCoarsen             4 1.0 2.0675e-02 1.1 0.00e+00 0.0 1.1e+06 6.2e+02 5.1e+01  0  0  0  0  0   1  0 13  6  8     0
MatZeroEntries         4 1.0 3.2139e-04 3.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAXPY                4 1.0 9.9720e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0  27  0  0  0  0     0
MatMatMult             4 1.0 8.0693e-02 1.1 6.22e+05 1.3 5.4e+05 6.4e+02 5.0e+01  0  0  0  0  0   2  1  6  3  8 12315
MatMatMultSym          4 1.0 6.5204e-02 1.0 0.00e+00 0.0 4.6e+05 5.1e+02 4.8e+01  0  0  0  0  0   2  0  5  2  8     0
MatMatMultNum          4 1.0 9.7268e-03 1.0 6.22e+05 1.3 8.4e+04 1.3e+03 0.0e+00  0  0  0  0  0   0  1  1  1  0 102165
MatPtAP                4 1.0 1.7622e-01 1.0 8.31e+06 1.6 1.2e+06 2.3e+03 6.2e+01  0  0  0  0  0   5 13 14 25 10 72429
MatPtAPSymbolic        4 1.0 9.2023e-02 1.0 0.00e+00 0.0 5.6e+05 2.7e+03 2.8e+01  0  0  0  0  0   2  0  7 14  5     0
MatPtAPNumeric         4 1.0 8.2069e-02 1.0 8.31e+06 1.6 6.7e+05 1.9e+03 3.2e+01  0  0  0  0  0   2 13  8 12  5 155523
MatTrnMatMult          1 1.0 9.1041e-02 1.0 2.72e+06 1.3 9.2e+04 9.1e+03 1.6e+01  0  0  0  0  0   2  5  1  7  3 49668
MatTrnMatMultSym       1 1.0 6.3481e-02 1.0 0.00e+00 0.0 8.2e+04 4.5e+03 1.6e+01  0  0  0  0  0   2  0  1  3  3     0
MatTrnMatMultNum       1 1.0 2.7639e-02 1.0 2.72e+06 1.3 9.5e+03 4.9e+04 0.0e+00  0  0  0  0  0   1  5  0  4  0 163601
MatGetLocalMat        14 1.0 9.3658e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol         12 1.0 9.1021e-03 2.3 0.00e+00 0.0 5.9e+05 2.6e+03 0.0e+00  0  0  0  0  0   0  0  7 14  0     0
SFSetGraph             4 1.0 7.6056e-0526.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                4 1.0 3.6201e-03 1.7 0.00e+00 0.0 1.5e+05 6.0e+02 0.0e+00  0  0  0  0  0   0  0  2  1  0     0
SFBcastBegin          59 1.0 2.5663e-03 3.4 0.00e+00 0.0 9.8e+05 6.2e+02 0.0e+00  0  0  0  0  0   0  0 11  5  0     0
SFBcastEnd            59 1.0 7.1716e-03 7.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCGAMGGraph_AGG        4 1.0 2.0067e+00 1.0 6.22e+05 1.2 2.5e+05 6.7e+02 4.8e+01  1  0  0  0  0  54  1  3  2  8   505
PCGAMGCoarse_AGG       4 1.0 1.2379e-01 1.0 2.72e+06 1.3 1.3e+06 1.6e+03 7.1e+01  0  0  0  0  0   3  5 16 19 12 36527
PCGAMGProl_AGG         4 1.0 2.7567e-02 1.1 0.00e+00 0.0 3.3e+05 1.2e+03 6.4e+01  0  0  0  0  0   1  0  4  4 10     0
PCGAMGPOpt_AGG         4 1.0 1.1260e+00 1.0 1.01e+07 1.1 1.4e+06 1.1e+03 1.7e+02  1  0  0  0  0  30 17 16 13 27 14814
GAMG: createProl       4 1.0 3.2832e+00 1.0 1.34e+07 1.2 3.3e+06 1.3e+03 3.5e+02  2  0  0  0  1  88 22 39 38 57  6767
  Graph                8 1.0 2.0053e+00 1.0 6.22e+05 1.2 2.5e+05 6.7e+02 4.8e+01  1  0  0  0  0  54  1  3  2  8   506
  MIS/Agg              4 1.0 2.0879e-02 1.1 0.00e+00 0.0 1.1e+06 6.2e+02 5.1e+01  0  0  0  0  0   1  0 13  6  8     0
  SA: col data         4 1.0 5.9071e-03 1.1 0.00e+00 0.0 2.0e+05 1.8e+03 1.6e+01  0  0  0  0  0   0  0  2  3  3     0
  SA: frmProl0         4 1.0 1.8157e-02 1.0 0.00e+00 0.0 1.3e+05 3.7e+02 3.2e+01  0  0  0  0  0   0  0  2  0  5     0
  SA: smooth           4 1.0 1.0804e+00 1.0 8.60e+05 1.3 5.4e+05 6.4e+02 5.8e+01  1  0  0  0  0  29  1  6  3  9  1286
GAMG: partLevel        4 1.0 2.1361e-01 1.0 8.31e+06 1.6 1.3e+06 2.2e+03 1.6e+02  0  0  0  0  0   6 13 15 25 27 59752
  repartition          2 1.0 1.5340e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  2     0
  Invert-Sort          2 1.0 4.6380e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  1     0
  Move A               2 1.0 2.1691e-02 1.4 0.00e+00 0.0 2.4e+04 1.2e+03 3.4e+01  0  0  0  0  0   0  0  0  0  6     0
  Move P               2 1.0 1.3658e-02 1.8 0.00e+00 0.0 3.5e+04 1.7e+01 3.4e+01  0  0  0  0  0   0  0  0  0  6     0
PCSetUp                2 1.0 3.5135e+00 1.0 2.17e+07 1.3 4.6e+06 1.5e+03 5.4e+02  2  0  0  0  1  94 35 54 63 87  9957
PCSetUpOnBlocks       12 1.0 6.3727e-0342.4 4.22e+06 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   662
PCApply               12 1.0 1.3277e-01 1.0 3.14e+07 1.2 3.8e+06 8.4e+02 2.4e+01  0  0  0  0  0   4 50 45 29  4 379007

--- Event Stage 2: Remaining Solves

KSPSolve            1000 1.0 1.8317e+02 1.0 3.89e+10 1.1 4.0e+09 1.0e+03 6.2e+04 98100100100 99 100100100100100 354794
VecTDot            24000 1.0 2.3516e+01 1.2 1.30e+09 1.0 0.0e+00 0.0e+00 2.4e+04 11  3  0  0 38  12  3  0  0 39 95232
VecNorm            14000 1.0 6.8848e+00 1.1 7.56e+08 1.0 0.0e+00 0.0e+00 1.4e+04  4  2  0  0 22   4  2  0  0 23 189748
VecScale           48000 1.0 2.9046e-01 2.5 1.24e+08 2.5 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 586732
VecCopy             1000 1.0 8.8122e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            180000 1.0 1.8199e+00 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY            24000 1.0 1.6535e+00 1.2 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0 1354423
VecAYPX            60000 1.0 1.9924e+00 1.3 9.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0 846000
VecScatterBegin   205000 1.0 9.1513e+00 3.1 0.00e+00 0.0 4.0e+09 1.0e+03 0.0e+00  3  0100100  0   3  0100100  0     0
VecScatterEnd     205000 1.0 5.6576e+01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 26  0  0  0  0  27  0  0  0  0     0
MatMult            61000 1.0 3.2314e+01 1.5 1.17e+10 1.1 1.1e+09 2.0e+03 0.0e+00 13 30 29 55  0  14 30 29 55  0 600882
MatMultAdd         48000 1.0 4.4993e+01 2.9 2.86e+09 1.3 9.1e+08 2.7e+02 0.0e+00 20  7 23  6  0  20  7 23  6  0 105415
MatMultTranspose   48000 1.0 3.4573e+01 3.1 2.86e+09 1.3 9.1e+08 2.7e+02 0.0e+00  8  7 23  6  0   8  7 23  6  0 137187
MatSolve           12000 0.0 7.3331e-01 0.0 8.45e+08 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  1152
MatSOR             96000 1.0 7.1605e+01 1.2 1.72e+10 1.1 1.0e+09 1.4e+03 2.4e+04 36 44 25 33 38  37 44 25 33 39 399609
MatResidual        48000 1.0 2.5073e+01 1.7 7.46e+09 1.2 1.0e+09 1.4e+03 0.0e+00 10 19 25 33  0  10 19 25 33  0 485156
PCSetUpOnBlocks    12000 1.0 1.7560e-0115.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply            12000 1.0 1.4437e+02 1.0 3.04e+10 1.2 3.8e+09 8.4e+02 2.4e+04 77 77 97 78 38  78 77 97 78 39 348175
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              7         8816     0.
     DMKSP interface     1              1          656     0.
              Vector     4             38      1919384     0.
              Matrix     0             50     10310768     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2             12       206536     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1             10       228640     0.
      Preconditioner     1              7         7448     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver    10              4         6400     0.
              Vector   179            145      6496648     0.
              Matrix   122             72     21618064     0.
      Matrix Coarsen     4              4         2544     0.
           Index Set    74             64       185744     0.
   Star Forest Graph     4              4         3456     0.
         Vec Scatter    28             19        24056     0.
      Preconditioner    10              4         3424     0.
         PetscRandom     8              8         5168     0.

--- Event Stage 2: Remaining Solves

              Vector 48000          48000   2878368000     0.
========================================================================================================================
Average time to get PetscTime(): 5.00679e-07
Average time for MPI_Barrier(): 1.45912e-05
Average time for zero size MPI_Send(): 6.50518e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_agg_nsmooths 1
-pc_gamg_coarse_eq_limit 1000
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 1
-pc_gamg_threshold 0.05
-pc_gamg_threshold_scale .0
-pc_gamg_type agg
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
using 216 of 216 processes
30^3 unknowns per processor
total system size: 180^3
mesh size: 0.0001

initsolve: 163 iterations
solve 1: 163 iterations
solve 2: 163 iterations
solve 3: 163 iterations
solve 4: 163 iterations
solve 5: 163 iterations
solve 6: 163 iterations
solve 7: 163 iterations
solve 8: 163 iterations
solve 9: 163 iterations
solve 10: 163 iterations
solve 20: 163 iterations
solve 30: 163 iterations
solve 40: 163 iterations
solve 50: 163 iterations
solve 60: 163 iterations
solve 70: 163 iterations
solve 80: 163 iterations
solve 90: 163 iterations
solve 100: 163 iterations
solve 200: 163 iterations
solve 300: 163 iterations
solve 400: 163 iterations
solve 500: 163 iterations
solve 600: 163 iterations
solve 700: 163 iterations
solve 800: 163 iterations
solve 900: 163 iterations
solve 1000: 163 iterations

Time in solve():      423.567 s
Time in KSPSolve():   423.327 s (99.9431%)

Number of   KSP iterations (total): 163000
Number of solve iterations (total): 1000 (ratio: 163.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdw-0437 with 216 processors, by jczhang Tue Jun 12 02:09:30 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           4.243e+02      1.00000   4.243e+02
Objects:              3.500e+01      1.00000   3.500e+01
Flop:                 1.661e+11      1.00537   1.658e+11  3.581e+13
Flop/sec:            3.914e+08      1.00537   3.907e+08  8.439e+10
MPI Messages:         9.850e+05      2.00000   8.208e+05  1.773e+08
MPI Message Lengths:  7.092e+09      2.00000   7.200e+03  1.277e+12
MPI Reductions:       4.915e+05      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.0660e-01   0.0%  0.0000e+00   0.0%  2.160e+03   0.0%  1.802e+03        0.0%  1.700e+01   0.0% 
 1:     First Solve: 6.4530e-01   0.2%  3.5887e+10   0.1%  1.793e+05   0.1%  7.135e+03        0.1%  5.070e+02   0.1% 
 2: Remaining Solves: 4.2358e+02  99.8%  3.5773e+13  99.9%  1.771e+08  99.9%  7.200e+03       99.9%  4.910e+05  99.9% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 6.1989e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSidedF         2 1.0 6.4178e-03 8.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
KSPSetUp               2 1.0 9.5639e-0316.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   1  0  0  0  1     0
KSPSolve               1 1.0 6.5393e-01 1.0 1.66e+08 1.0 1.8e+05 7.1e+03 5.1e+02  0  0  0  0  0 100100100100100 54879
VecTDot              326 1.0 2.6734e-01 1.3 1.76e+07 1.0 0.0e+00 0.0e+00 3.3e+02  0  0  0  0  0  39 11  0  0 64 14223
VecNorm              165 1.0 4.5129e-02 1.5 8.91e+06 1.0 0.0e+00 0.0e+00 1.6e+02  0  0  0  0  0   6  5  0  0 33 42646
VecCopy                1 1.0 9.1076e-05 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               164 1.0 4.1964e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
VecAXPY              326 1.0 2.9594e-02 1.8 1.76e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3 11  0  0  0 128488
VecAYPX              163 1.0 2.5827e-02 5.8 8.78e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  5  0  0  0 73387
VecScatterBegin      164 1.0 1.1089e-02 2.3 0.00e+00 0.0 1.8e+05 7.2e+03 0.0e+00  0  0  0  0  0   1  0 99100  0     0
VecScatterEnd        164 1.0 4.3868e-02 8.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0
MatMult              164 1.0 1.7538e-01 1.4 5.76e+07 1.0 1.8e+05 7.2e+03 0.0e+00  0  0  0  0  0  22 34 99100  0 70534
MatSolve             163 1.0 1.2920e-01 1.2 5.55e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  18 33  0  0  0 92709
MatLUFactorNum         1 1.0 6.4151e-03 3.9 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 17867
MatILUFactorSym        1 1.0 2.3391e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       2 1.0 1.4147e-0217.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatAssemblyEnd         2 1.0 4.0359e-03 1.1 0.00e+00 0.0 2.2e+03 1.8e+03 8.0e+00  0  0  0  0  0   1  0  1  0  2     0
MatGetRowIJ            1 1.0 5.1022e-0553.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 2.6393e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                2 1.0 1.3018e-02 3.1 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0  8805
PCSetUpOnBlocks        1 1.0 8.4250e-03 2.6 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0 13605
PCApply              163 1.0 1.3862e-01 1.2 5.55e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  19 33  0  0  0 86409

--- Event Stage 2: Remaining Solves

KSPSolve            1000 1.0 4.2333e+02 1.0 1.66e+11 1.0 1.8e+08 7.2e+03 4.9e+05100100100100100 100100100100100 84502
VecTDot           326000 1.0 1.0805e+02 1.3 1.76e+10 1.0 0.0e+00 0.0e+00 3.3e+05 22 11  0  0 66  22 11  0  0 66 35191
VecNorm           165000 1.0 3.5069e+01 1.2 8.91e+09 1.0 0.0e+00 0.0e+00 1.6e+05  8  5  0  0 34   8  5  0  0 34 54879
VecCopy             1000 1.0 9.4171e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            163000 1.0 4.2720e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY           326000 1.0 2.1878e+01 1.3 1.76e+10 1.0 0.0e+00 0.0e+00 0.0e+00  4 11  0  0  0   4 11  0  0  0 173800
VecAYPX           163000 1.0 1.0099e+01 2.1 8.78e+09 1.0 0.0e+00 0.0e+00 0.0e+00  2  5  0  0  0   2  5  0  0  0 187673
VecScatterBegin   164000 1.0 1.1169e+01 2.2 0.00e+00 0.0 1.8e+08 7.2e+03 0.0e+00  2  0100100  0   2  0100100  0     0
VecScatterEnd     164000 1.0 1.9032e+01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
MatMult           164000 1.0 1.4498e+02 1.2 5.76e+10 1.0 1.8e+08 7.2e+03 0.0e+00 32 35100100  0  32 35100100  0 85325
MatSolve          163000 1.0 1.2030e+02 1.1 5.55e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 33  0  0  0  27 33  0  0  0 99564
PCSetUpOnBlocks     1000 1.0 2.3875e-02 7.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply           163000 1.0 1.3030e+02 1.1 5.55e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 33  0  0  0  29 33  0  0  0 91925
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              2         2424     0.
     DMKSP interface     1              1          656     0.
              Vector     4             10      1117680     0.
              Matrix     0              4      5974172     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2              5       446760     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1              2       218528     0.
      Preconditioner     1              2         1912     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     1              0            0     0.
              Vector     7              1         1656     0.
              Matrix     4              0            0     0.
           Index Set     5              2        12384     0.
         Vec Scatter     1              0            0     0.
      Preconditioner     1              0            0     0.

--- Event Stage 2: Remaining Solves

========================================================================================================================
Average time to get PetscTime(): 6.19888e-07
Average time for MPI_Barrier(): 2.01702e-05
Average time for zero size MPI_Send(): 5.63926e-06
#PETSc Option Table entries:
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-nodes_per_proc 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
using 216 of 216 processes
30^3 unknowns per processor
total system size: 180^3
mesh size: 0.0001

initsolve: 100 iterations
solve 1: 100 iterations
solve 2: 100 iterations
solve 3: 100 iterations
solve 4: 100 iterations
solve 5: 100 iterations
solve 6: 100 iterations
solve 7: 100 iterations
solve 8: 100 iterations
solve 9: 100 iterations
solve 10: 100 iterations
solve 20: 100 iterations
solve 30: 100 iterations
solve 40: 100 iterations
solve 50: 100 iterations
solve 60: 100 iterations
solve 70: 100 iterations
solve 80: 100 iterations
solve 90: 100 iterations
solve 100: 100 iterations

Time in solve():      29.0941 s
Time in KSPSolve():   29.0545 s (99.8639%)

Number of   KSP iterations (total): 10000
Number of solve iterations (total): 100 (ratio: 100.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdwd-0033 with 216 processors, by jczhang Tue Jun 12 09:57:11 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           2.961e+01      1.00008   2.961e+01
Objects:              3.500e+01      1.00000   3.500e+01
Flop:                 1.034e+10      1.00537   1.032e+10  2.229e+12
Flop/sec:            3.491e+08      1.00544   3.485e+08  7.528e+10
MPI Messages:         6.123e+04      2.00000   5.102e+04  1.102e+07
MPI Message Lengths:  4.407e+08      2.00000   7.198e+03  7.933e+10
MPI Reductions:       3.064e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 8.8708e-02   0.3%  0.0000e+00   0.0%  2.160e+03   0.0%  1.802e+03        0.0%  1.700e+01   0.1% 
 1:     First Solve: 4.2526e-01   1.4%  2.2182e+10   1.0%  1.112e+05   1.0%  7.095e+03        1.0%  3.190e+02   1.0% 
 2: Remaining Solves: 2.9096e+01  98.3%  2.2067e+12  99.0%  1.091e+07  99.0%  7.200e+03       99.0%  3.030e+04  98.9% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 1.9312e-04 6.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSidedF         2 1.0 1.1070e-0245.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0
KSPSetUp               2 1.0 7.4315e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  2     0
KSPSolve               1 1.0 4.2496e-01 1.0 1.03e+08 1.0 1.1e+05 7.1e+03 3.2e+02  1  1  1  1  1 100100100100100 52198
VecTDot              201 1.0 1.3848e-01 1.3 1.09e+07 1.0 0.0e+00 0.0e+00 2.0e+02  0  0  0  0  1  29 11  0  0 63 16930
VecNorm              102 1.0 4.7182e-02 1.5 5.51e+06 1.0 0.0e+00 0.0e+00 1.0e+02  0  0  0  0  0  10  5  0  0 32 25216
VecCopy                1 1.0 7.5817e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               102 1.0 2.6131e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
VecAXPY              200 1.0 1.2963e-02 1.3 1.08e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3 11  0  0  0 179961
VecAYPX              100 1.0 5.0817e-03 1.6 5.37e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  5  0  0  0 228384
VecScatterBegin      101 1.0 6.9609e-03 2.7 0.00e+00 0.0 1.1e+05 7.2e+03 0.0e+00  0  0  1  1  0   1  0 98100  0     0
VecScatterEnd        101 1.0 2.8235e-02 7.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0
MatMult              101 1.0 1.0444e-01 1.3 3.55e+07 1.0 1.1e+05 7.2e+03 0.0e+00  0  0  1  1  0  20 34 98100  0 72945
MatSolve             101 1.0 8.1540e-02 1.2 3.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  17 33  0  0  0 91020
MatLUFactorNum         1 1.0 5.0550e-03 3.1 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 22675
MatILUFactorSym        1 1.0 2.3129e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       2 1.0 1.1154e-0236.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   2  0  0  0  0     0
MatAssemblyEnd         2 1.0 4.6539e-03 1.1 0.00e+00 0.0 2.2e+03 1.8e+03 8.0e+00  0  0  0  0  0   1  0  2  0  3     0
MatGetRowIJ            1 1.0 8.1062e-06 8.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 2.6512e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                2 1.0 1.2755e-02 3.3 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   2  1  0  0  0  8986
PCSetUpOnBlocks        1 1.0 7.0622e-03 2.2 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  1  0  0  0 16230
PCApply              101 1.0 9.0092e-02 1.2 3.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  18 33  0  0  0 82380

--- Event Stage 2: Remaining Solves

KSPSolve             100 1.0 2.9067e+01 1.0 1.02e+10 1.0 1.1e+07 7.2e+03 3.0e+04 98 99 99 99 99 100100100100100 75919
VecTDot            20100 1.0 9.0870e+00 1.3 1.09e+09 1.0 0.0e+00 0.0e+00 2.0e+04 27 11  0  0 66  28 11  0  0 66 25800
VecNorm            10200 1.0 2.8064e+00 1.2 5.51e+08 1.0 0.0e+00 0.0e+00 1.0e+04  9  5  0  0 33   9  5  0  0 34 42393
VecCopy              100 1.0 1.1604e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet             10100 1.0 2.5848e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY            20000 1.0 1.3038e+00 1.3 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00  4 10  0  0  0   4 11  0  0  0 178922
VecAYPX            10000 1.0 5.1897e-01 1.6 5.37e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  5  0  0  0   1  5  0  0  0 223631
VecScatterBegin    10100 1.0 7.0715e-01 2.5 0.00e+00 0.0 1.1e+07 7.2e+03 0.0e+00  2  0 99 99  0   2  0100100  0     0
VecScatterEnd      10100 1.0 1.2401e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
MatMult            10100 1.0 9.1593e+00 1.2 3.55e+09 1.0 1.1e+07 7.2e+03 0.0e+00 28 34 99 99  0  29 35100100  0 83174
MatSolve           10100 1.0 7.4668e+00 1.1 3.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00 24 33  0  0  0  24 34  0  0  0 99398
PCSetUpOnBlocks      100 1.0 2.2914e-0312.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply            10100 1.0 8.0915e+00 1.1 3.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 33  0  0  0  26 34  0  0  0 91724
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              2         2424     0.
     DMKSP interface     1              1          656     0.
              Vector     4             10      1117680     0.
              Matrix     0              4      5974172     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2              5       446760     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1              2       218528     0.
      Preconditioner     1              2         1912     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     1              0            0     0.
              Vector     7              1         1656     0.
              Matrix     4              0            0     0.
           Index Set     5              2        12384     0.
         Vec Scatter     1              0            0     0.
      Preconditioner     1              0            0     0.

--- Event Stage 2: Remaining Solves

========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 1.07765e-05
Average time for zero size MPI_Send(): 5.81035e-06
#PETSc Option Table entries:
-iterations 100
-ksp_max_it 100
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-nodes_per_proc 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
srun: Warning: can't honor --ntasks-per-node set to 36 which doesn't match the requested tasks 48 with the number of requested nodes 48. Ignoring --ntasks-per-node.
using 1728 of 1728 processes
30^3 unknowns per processor
total system size: 360^3
mesh size: 0.0001

initsolve: 100 iterations
solve 1: 100 iterations
solve 2: 100 iterations
solve 3: 100 iterations
solve 4: 100 iterations
solve 5: 100 iterations
solve 6: 100 iterations
solve 7: 100 iterations
solve 8: 100 iterations
solve 9: 100 iterations
solve 10: 100 iterations
solve 20: 100 iterations
solve 30: 100 iterations
solve 40: 100 iterations
solve 50: 100 iterations
solve 60: 100 iterations
solve 70: 100 iterations
solve 80: 100 iterations
solve 90: 100 iterations
solve 100: 100 iterations

Time in solve():      92.6599 s
Time in KSPSolve():   92.6245 s (99.9617%)

Number of   KSP iterations (total): 10000
Number of solve iterations (total): 100 (ratio: 100.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdw-0255 with 1728 processors, by jczhang Tue Jun 12 10:32:54 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           9.433e+01      1.00008   9.433e+01
Objects:              3.500e+01      1.00000   3.500e+01
Flop:                 1.034e+10      1.00537   1.033e+10  1.785e+13
Flop/sec:            1.096e+08      1.00543   1.095e+08  1.892e+11
MPI Messages:         6.123e+04      2.00000   5.613e+04  9.699e+07
MPI Message Lengths:  4.407e+08      2.00000   7.198e+03  6.981e+11
MPI Reductions:       3.064e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.6240e-01   0.2%  0.0000e+00   0.0%  1.901e+04   0.0%  1.802e+03        0.0%  1.700e+01   0.1% 
 1:     First Solve: 1.5073e+00   1.6%  1.7764e+11   1.0%  9.789e+05   1.0%  7.095e+03        1.0%  3.190e+02   1.0% 
 2: Remaining Solves: 9.2661e+01  98.2%  1.7670e+13  99.0%  9.599e+07  99.0%  7.200e+03       99.0%  3.030e+04  98.9% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 1.4305e-04 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSidedF         2 1.0 1.8184e-02 8.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
KSPSetUp               2 1.0 9.9397e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  2     0
KSPSolve               1 1.0 1.5071e+00 1.0 1.03e+08 1.0 9.8e+05 7.1e+03 3.2e+02  2  1  1  1  1 100100100100100 117873
VecTDot              201 1.0 6.6377e-01 1.1 1.09e+07 1.0 0.0e+00 0.0e+00 2.0e+02  1  0  0  0  1  43 11  0  0 63 28256
VecNormBarrier       102 1.0 1.8365e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  11  0  0  0  0     0
VecNorm              102 1.0 1.8855e-01 1.2 5.51e+06 1.0 0.0e+00 0.0e+00 1.0e+02  0  0  0  0  0  12  5  0  0 32 50480
VecCopy                1 1.0 8.1062e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               102 1.0 2.6536e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              200 1.0 2.1202e-02 2.2 1.08e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1 11  0  0  0 880215
VecAYPX              100 1.0 1.2952e-02 4.4 5.37e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  5  0  0  0 716824
VecScatterBarrie     101 1.0 2.4745e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0  16  0  0  0  0     0
VecScatterBegin      101 1.0 1.7003e-02 5.3 0.00e+00 0.0 9.6e+05 7.2e+03 0.0e+00  0  0  1  1  0   0  0 98100  0     0
VecScatterEnd        101 1.0 2.4191e-02 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatMult              101 1.0 3.4479e-01 1.1 3.55e+07 1.0 9.6e+05 7.2e+03 0.0e+00  0  0  1  1  0  21 34 98100  0 177214
MatSolve             101 1.0 8.3888e-02 1.3 3.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   5 33  0  0  0 707780
MatLUFactorNum         1 1.0 7.0050e-03 4.3 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 135051
MatILUFactorSym        1 1.0 2.3711e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       2 1.0 1.8245e-02 7.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatAssemblyEnd         2 1.0 2.4878e-02 1.3 0.00e+00 0.0 1.9e+04 1.8e+03 8.0e+00  0  0  0  0  0   2  0  2  0  3     0
MatGetRowIJ            1 1.0 3.0041e-0531.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 3.9482e-04 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                2 1.0 1.4109e-02 3.7 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  1  0  0  0 67051
PCSetUpOnBlocks        1 1.0 8.9779e-03 2.9 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 105373
PCApply              101 1.0 9.6632e-02 1.3 3.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   5 33  0  0  0 614442

--- Event Stage 2: Remaining Solves

KSPSolve             100 1.0 9.2634e+01 1.0 1.02e+10 1.0 9.6e+07 7.2e+03 3.0e+04 98 99 99 99 99 100100100100100 190747
VecTDot            20100 1.0 4.2454e+01 1.1 1.09e+09 1.0 0.0e+00 0.0e+00 2.0e+04 44 11  0  0 66  45 11  0  0 66 44178
VecNormBarrier     10200 1.0 1.1839e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  12  0  0  0  0     0
VecNorm            10200 1.0 8.3957e+00 1.0 5.51e+08 1.0 0.0e+00 0.0e+00 1.0e+04  9  5  0  0 33   9  5  0  0 34 113365
VecCopy              100 1.0 1.9274e-02 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet             10100 1.0 2.6170e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY            20000 1.0 1.3767e+00 1.5 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1 10  0  0  0   1 11  0  0  0 1355593
VecAYPX            10000 1.0 7.4878e-01 2.5 5.37e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0 1239961
VecScatterBarrie   10100 1.0 1.2403e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 13  0  0  0  0  13  0  0  0  0     0
VecScatterBegin    10100 1.0 8.7978e-01 2.8 0.00e+00 0.0 9.6e+07 7.2e+03 0.0e+00  1  0 99 99  0   1  0100100  0     0
VecScatterEnd      10100 1.0 1.5403e+00 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatMult            10100 1.0 2.1855e+01 1.1 3.55e+09 1.0 9.6e+07 7.2e+03 0.0e+00 22 34 99 99  0  23 35100100  0 279576
MatSolve           10100 1.0 7.6917e+00 1.2 3.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00  8 33  0  0  0   8 34  0  0  0 771924
PCSetUpOnBlocks      100 1.0 5.4588e-02218.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply            10100 1.0 8.3117e+00 1.1 3.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00  8 33  0  0  0   8 34  0  0  0 714350
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              2         2424     0.
     DMKSP interface     1              1          656     0.
              Vector     4             10      1117680     0.
              Matrix     0              4      5974172     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2              5       446760     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1              2       218528     0.
      Preconditioner     1              2         1912     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     1              0            0     0.
              Vector     7              1         1656     0.
              Matrix     4              0            0     0.
           Index Set     5              2        12384     0.
         Vec Scatter     1              0            0     0.
      Preconditioner     1              0            0     0.

--- Event Stage 2: Remaining Solves

========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 1.4782e-05
Average time for zero size MPI_Send(): 5.14062e-06
#PETSc Option Table entries:
-iterations 100
-ksp_max_it 100
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_sync
-log_view
-mesh_size 1E-4
-nodes_per_proc 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
srun: Warning: can't honor --ntasks-per-node set to 36 which doesn't match the requested tasks 48 with the number of requested nodes 48. Ignoring --ntasks-per-node.
using 1728 of 1728 processes
30^3 unknowns per processor
total system size: 360^3
mesh size: 0.0001

initsolve: 100 iterations
solve 1: 100 iterations
solve 2: 100 iterations
solve 3: 100 iterations
solve 4: 100 iterations
solve 5: 100 iterations
solve 6: 100 iterations
solve 7: 100 iterations
solve 8: 100 iterations
solve 9: 100 iterations
solve 10: 100 iterations
solve 20: 100 iterations
solve 30: 100 iterations
solve 40: 100 iterations
solve 50: 100 iterations
solve 60: 100 iterations
solve 70: 100 iterations
solve 80: 100 iterations
solve 90: 100 iterations
solve 100: 100 iterations

Time in solve():      87.3816 s
Time in KSPSolve():   87.3319 s (99.9432%)

Number of   KSP iterations (total): 10000
Number of solve iterations (total): 100 (ratio: 100.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdw-0521 with 1728 processors, by jczhang Tue Jun 12 09:55:23 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           8.886e+01      1.00020   8.886e+01
Objects:              3.500e+01      1.00000   3.500e+01
Flop:                 1.034e+10      1.00537   1.033e+10  1.785e+13
Flop/sec:            1.164e+08      1.00556   1.162e+08  2.009e+11
MPI Messages:         6.123e+04      2.00000   5.613e+04  9.699e+07
MPI Message Lengths:  4.407e+08      2.00000   7.198e+03  6.981e+11
MPI Reductions:       3.064e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.4006e-01   0.2%  0.0000e+00   0.0%  1.901e+04   0.0%  1.802e+03        0.0%  1.700e+01   0.1% 
 1:     First Solve: 1.3361e+00   1.5%  1.7764e+11   1.0%  9.789e+05   1.0%  7.095e+03        1.0%  3.190e+02   1.0% 
 2: Remaining Solves: 8.7382e+01  98.3%  1.7670e+13  99.0%  9.599e+07  99.0%  7.200e+03       99.0%  3.030e+04  98.9% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 1.4186e-04 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSidedF         2 1.0 2.6349e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
KSPSetUp               2 1.0 8.3871e-0311.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   1  0  0  0  2     0
KSPSolve               1 1.0 1.3359e+00 1.0 1.03e+08 1.0 9.8e+05 7.1e+03 3.2e+02  2  1  1  1  1 100100100100100 132978
VecTDot              201 1.0 8.2208e-01 1.1 1.09e+07 1.0 0.0e+00 0.0e+00 2.0e+02  1  0  0  0  1  60 11  0  0 63 22814
VecNorm              102 1.0 2.4688e-01 1.1 5.51e+06 1.0 0.0e+00 0.0e+00 1.0e+02  0  0  0  0  0  17  5  0  0 32 38552
VecCopy                1 1.0 1.8883e-04 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               102 1.0 2.9655e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              200 1.0 2.1235e-02 2.3 1.08e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1 11  0  0  0 878832
VecAYPX              100 1.0 2.1644e-02 7.5 5.37e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  5  0  0  0 428973
VecScatterBegin      101 1.0 1.7088e-02 6.0 0.00e+00 0.0 9.6e+05 7.2e+03 0.0e+00  0  0  1  1  0   0  0 98100  0     0
VecScatterEnd        101 1.0 3.3919e-02 7.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatMult              101 1.0 1.1378e-01 1.4 3.55e+07 1.0 9.6e+05 7.2e+03 0.0e+00  0  0  1  1  0   7 34 98100  0 537005
MatSolve             101 1.0 9.4168e-02 1.5 3.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   5 33  0  0  0 630513
MatLUFactorNum         1 1.0 7.2019e-03 4.5 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 131358
MatILUFactorSym        1 1.0 2.3780e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       2 1.0 2.6405e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatAssemblyEnd         2 1.0 2.9236e-02 1.2 0.00e+00 0.0 1.9e+04 1.8e+03 8.0e+00  0  0  0  0  0   2  0  2  0  3     0
MatGetRowIJ            1 1.0 2.0981e-0522.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 2.5988e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                2 1.0 1.3383e-02 3.5 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  1  0  0  0 70690
PCSetUpOnBlocks        1 1.0 8.7321e-03 2.8 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 108340
PCApply              101 1.0 1.0021e-01 1.4 3.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   6 33  0  0  0 592485

--- Event Stage 2: Remaining Solves

KSPSolve             100 1.0 8.7357e+01 1.0 1.02e+10 1.0 9.6e+07 7.2e+03 3.0e+04 98 99 99 99 99 100100100100100 202269
VecTDot            20100 1.0 5.4682e+01 1.1 1.09e+09 1.0 0.0e+00 0.0e+00 2.0e+04 60 11  0  0 66  61 11  0  0 66 34299
VecNorm            10200 1.0 1.7258e+01 1.2 5.51e+08 1.0 0.0e+00 0.0e+00 1.0e+04 17  5  0  0 33  17  5  0  0 34 55150
VecCopy              100 1.0 1.1919e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet             10100 1.0 4.0643e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY            20000 1.0 1.4651e+00 1.6 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00  1 10  0  0  0   1 11  0  0  0 1273800
VecAYPX            10000 1.0 5.7149e-01 1.9 5.37e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  5  0  0  0   0  5  0  0  0 1624624
VecScatterBegin    10100 1.0 8.3204e-01 2.8 0.00e+00 0.0 9.6e+07 7.2e+03 0.0e+00  1  0 99 99  0   1  0100100  0     0
VecScatterEnd      10100 1.0 2.8371e+00 5.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
MatMult            10100 1.0 1.0399e+01 1.3 3.55e+09 1.0 9.6e+07 7.2e+03 0.0e+00 10 34 99 99  0  10 35100100  0 587563
MatSolve           10100 1.0 7.7816e+00 1.2 3.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00  8 33  0  0  0   8 34  0  0  0 763013
PCSetUpOnBlocks      100 1.0 2.7120e-0312.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply            10100 1.0 9.1625e+00 1.3 3.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00  9 33  0  0  0   9 34  0  0  0 648014
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              2         2424     0.
     DMKSP interface     1              1          656     0.
              Vector     4             10      1117680     0.
              Matrix     0              4      5974172     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2              5       446760     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1              2       218528     0.
      Preconditioner     1              2         1912     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     1              0            0     0.
              Vector     7              1         1656     0.
              Matrix     4              0            0     0.
           Index Set     5              2        12384     0.
         Vec Scatter     1              0            0     0.
      Preconditioner     1              0            0     0.

--- Event Stage 2: Remaining Solves

========================================================================================================================
Average time to get PetscTime(): 5.00679e-07
Average time for MPI_Barrier(): 1.92165e-05
Average time for zero size MPI_Send(): 7.10013e-06
#PETSc Option Table entries:
-iterations 100
-ksp_max_it 100
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-nodes_per_proc 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------


More information about the petsc-dev mailing list