[petsc-dev] [petsc-users] Poor weak scaling when solving successive linearsystems

Junchao Zhang jczhang at mcs.anl.gov
Wed Jun 6 15:13:52 CDT 2018


Hi, PETSc developers,
  I tested Michael Becker's code. The code calls the same KSPSolve 1000
times in the second stage and needs cubic number of processors to run. I
ran with 125 ranks and 1000 ranks, with or without -log_sync option. I
attach the log view output files and a scaling loss excel file.
  I profiled the code with 125 processors. It looks {MatSOR, MatMult,
MatMultAdd, MatMultTranspose, MatMultTransposeAdd}_SeqAIJ in aij.c took
~50% of the time,  The other half time was spent on waiting in MPI.
MatSOR_SeqAIJ
took 30%, mostly in PetscSparseDenseMinusDot().
  I tested it on a 36 cores/node machine. I found 32 ranks/node gave better
performance (about 10%) than 36 ranks/node in the 125 ranks testing.  I
guess it is because processors in the former had more balanced memory
bandwidth. I collected PAPI_DP_OPS (double precision operations) and
PAPI_TOT_CYC (total cycles) of the 125 ranks case (see the attached files).
It looks ranks at the two ends have less DP_OPS and TOT_CYC.
  Does anyone familiar with the algorithm have quick explanations?

--Junchao Zhang

On Mon, Jun 4, 2018 at 11:59 AM, Michael Becker <
Michael.Becker at physik.uni-giessen.de> wrote:

> Hello again,
> this took me longer than I anticipated, but here we go.
> I did reruns of the cases where only half the processes per node were used
> (without -log_sync):
>
>                      125 procs,1st           125 procs,2nd          1000
> procs,1st          1000 procs,2nd
>                    Max        Ratio        Max        Ratio        Max
> Ratio        Max        Ratio
> KSPSolve           1.203E+02    1.0        1.210E+02    1.0
> 1.399E+02    1.1        1.365E+02    1.0
> VecTDot            6.376E+00    3.7        6.551E+00    4.0
> 7.885E+00    2.9        7.175E+00    3.4
> VecNorm            4.579E+00    7.1        5.803E+00   10.2
> 8.534E+00    6.9        6.026E+00    4.9
> VecScale           1.070E-01    2.1        1.129E-01    2.2
> 1.301E-01    2.5        1.270E-01    2.4
> VecCopy            1.123E-01    1.3        1.149E-01    1.3
> 1.301E-01    1.6        1.359E-01    1.6
> VecSet             7.063E-01    1.7        6.968E-01    1.7
> 7.432E-01    1.8        7.425E-01    1.8
> VecAXPY            1.166E+00    1.4        1.167E+00    1.4
> 1.221E+00    1.5        1.279E+00    1.6
> VecAYPX            1.317E+00    1.6        1.290E+00    1.6
> 1.536E+00    1.9        1.499E+00    2.0
> VecScatterBegin    6.142E+00    3.2        5.974E+00    2.8
> 6.448E+00    3.0        6.472E+00    2.9
> VecScatterEnd      3.606E+01    4.2        3.551E+01    4.0
> 5.244E+01    2.7        4.995E+01    2.7
> MatMult            3.561E+01    1.6        3.403E+01    1.5
> 3.435E+01    1.4        3.332E+01    1.4
> MatMultAdd         1.124E+01    2.0        1.130E+01    2.1
> 2.093E+01    2.9        1.995E+01    2.7
> MatMultTranspose   1.372E+01    2.5        1.388E+01    2.6
> 1.477E+01    2.2        1.381E+01    2.1
> MatSolve           1.949E-02    0.0        1.653E-02    0.0
> 4.789E-02    0.0        4.466E-02    0.0
> MatSOR             6.610E+01    1.3        6.673E+01    1.3
> 7.111E+01    1.3        7.105E+01    1.3
> MatResidual        2.647E+01    1.7        2.667E+01    1.7
> 2.446E+01    1.4        2.467E+01    1.5
> PCSetUpOnBlocks    5.266E-03    1.4        5.295E-03    1.4
> 5.427E-03    1.5        5.289E-03    1.4
> PCApply            1.031E+02    1.0        1.035E+02    1.0
> 1.180E+02    1.0        1.164E+02    1.0
>
> I also slimmed down my code and basically wrote a simple weak scaling test
> (source files attached) so you can profile it yourself. I appreciate the
> offer Junchao, thank you.
> You can adjust the system size per processor at runtime via
> "-nodes_per_proc 30" and the number of repeated calls to the function
> containing KSPsolve() via "-iterations 1000". The physical problem is
> simply calculating the electric potential from a homogeneous charge
> distribution, done multiple times to accumulate time in KSPsolve().
> A job would be started using something like
>
> mpirun -n 125 ~/petsc_ws/ws_test -nodes_per_proc 30 -mesh_size 1E-4
> -iterations 1000 \\
>  -ksp_rtol 1E-6 \
>  -log_view -log_sync\
>  -pc_type gamg -pc_gamg_type classical\
>  -ksp_type cg \
>  -ksp_norm_type unpreconditioned \
>  -mg_levels_ksp_type richardson \
>  -mg_levels_ksp_norm_type none \
>  -mg_levels_pc_type sor \
>  -mg_levels_ksp_max_it 1 \
>  -mg_levels_pc_sor_its 1 \
>  -mg_levels_esteig_ksp_type cg \
>  -mg_levels_esteig_ksp_max_it 10 \
>  -gamg_est_ksp_type cg
>
> , ideally started on a cube number of processes for a cubical process grid.
> Using 125 processes and 10.000 iterations I get the output in
> "log_view_125_new.txt", which shows the same imbalance for me.
>
> Michael
>
>
> Am 02.06.2018 um 13:40 schrieb Mark Adams:
>
>
>
> On Fri, Jun 1, 2018 at 11:20 PM, Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
>
>> Hi,Michael,
>>   You can add -log_sync besides -log_view, which adds barriers to certain
>> events but measures barrier time separately from the events. I find this
>> option makes it easier to interpret log_view output.
>>
>
> That is great (good to know).
>
> This should give us a better idea if your large VecScatter costs are from
> slow communication or if it catching some sort of load imbalance.
>
>
>>
>> --Junchao Zhang
>>
>> On Wed, May 30, 2018 at 3:27 AM, Michael Becker <
>> Michael.Becker at physik.uni-giessen.de> wrote:
>>
>>> Barry: On its way. Could take a couple days again.
>>>
>>> Junchao: I unfortunately don't have access to a cluster with a faster
>>> network. This one has a mixed 4X QDR-FDR InfiniBand 2:1 blocking fat-tree
>>> network, which I realize causes parallel slowdown if the nodes are not
>>> connected to the same switch. Each node has 24 processors (2x12/socket) and
>>> four NUMA domains (two for each socket).
>>> The ranks are usually not distributed perfectly even, i.e. for 125
>>> processes, of the six required nodes, five would use 21 cores and one 20.
>>> Would using another CPU type make a difference communication-wise? I
>>> could switch to faster ones (on the same network), but I always assumed
>>> this would only improve performance of the stuff that is unrelated to
>>> communication.
>>>
>>> Michael
>>>
>>>
>>>
>>> The log files have something like "Average time for zero size
>>> MPI_Send(): 1.84231e-05". It looks you ran on a cluster with a very slow
>>> network. A typical machine should give less than 1/10 of the latency you
>>> have. An easy way to try is just running the code on a machine with a
>>> faster network and see what happens.
>>>
>>> Also, how many cores & numa domains does a compute node have? I could
>>> not figure out how you distributed the 125 MPI ranks evenly.
>>>
>>> --Junchao Zhang
>>>
>>> On Tue, May 29, 2018 at 6:18 AM, Michael Becker <
>>> Michael.Becker at physik.uni-giessen.de> wrote:
>>>
>>>> Hello again,
>>>>
>>>> here are the updated log_view files for 125 and 1000 processors. I ran
>>>> both problems twice, the first time with all processors per node allocated
>>>> ("-1.txt"), the second with only half on twice the number of nodes
>>>> ("-2.txt").
>>>>
>>>> On May 24, 2018, at 12:24 AM, Michael Becker <Michael.Becker at physik.uni-giessen.de> <Michael.Becker at physik.uni-giessen.de> wrote:
>>>>
>>>> I noticed that for every individual KSP iteration, six vector objects are created and destroyed (with CG, more with e.g. GMRES).
>>>>
>>>>    Hmm, it is certainly not intended at vectors be created and destroyed within each KSPSolve() could you please point us to the code that makes you think they are being created and destroyed?   We create all the work vectors at KSPSetUp() and destroy them in KSPReset() not during the solve. Not that this would be a measurable distance.
>>>>
>>>>
>>>> I mean this, right in the log_view output:
>>>>
>>>> Memory usage is given in bytes:
>>>>
>>>> Object Type Creations Destructions Memory Descendants' Mem.
>>>> Reports information only for process 0.
>>>>
>>>> --- Event Stage 0: Main Stage
>>>>
>>>> ...
>>>>
>>>> --- Event Stage 1: First Solve
>>>>
>>>> ...
>>>>
>>>> --- Event Stage 2: Remaining Solves
>>>>
>>>> Vector 23904 23904 1295501184 0.
>>>>
>>>> I logged the exact number of KSP iterations over the 999 timesteps and
>>>> its exactly 23904/6 = 3984.
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> Am 24.05.2018 um 19:50 schrieb Smith, Barry F.:
>>>>
>>>>   Please send the log file for 1000 with cg as the solver.
>>>>
>>>>    You should make a bar chart of each event for the two cases to see which ones are taking more time and which are taking less (we cannot tell with the two logs you sent us since they are for different solvers.)
>>>>
>>>>
>>>>
>>>>
>>>> On May 24, 2018, at 12:24 AM, Michael Becker <Michael.Becker at physik.uni-giessen.de> <Michael.Becker at physik.uni-giessen.de> wrote:
>>>>
>>>> I noticed that for every individual KSP iteration, six vector objects are created and destroyed (with CG, more with e.g. GMRES).
>>>>
>>>>    Hmm, it is certainly not intended at vectors be created and destroyed within each KSPSolve() could you please point us to the code that makes you think they are being created and destroyed?   We create all the work vectors at KSPSetUp() and destroy them in KSPReset() not during the solve. Not that this would be a measurable distance.
>>>>
>>>>
>>>>
>>>>
>>>> This seems kind of wasteful, is this supposed to be like this? Is this even the reason for my problems? Apart from that, everything seems quite normal to me (but I'm not the expert here).
>>>>
>>>>
>>>> Thanks in advance.
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> <log_view_125procs.txt><log_view_1000procs.txt>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180606/fda3c121/attachment-0001.html>
-------------- next part --------------
using 125 of 125 processes
30^3 unknowns per processor
total system size: 150^3
mesh size: 0.0001

initsolve: 7 iterations
solve 1: 7 iterations
solve 2: 7 iterations
solve 3: 7 iterations
solve 4: 7 iterations
solve 5: 7 iterations
solve 6: 7 iterations
solve 7: 7 iterations
solve 8: 7 iterations
solve 9: 7 iterations
solve 10: 7 iterations
solve 20: 7 iterations
solve 30: 7 iterations
solve 40: 7 iterations
solve 50: 7 iterations
solve 60: 7 iterations
solve 70: 7 iterations
solve 80: 7 iterations
solve 90: 7 iterations
solve 100: 7 iterations
solve 200: 7 iterations
solve 300: 7 iterations
solve 400: 7 iterations
solve 500: 7 iterations
solve 600: 7 iterations
solve 700: 7 iterations
solve 800: 7 iterations
solve 900: 7 iterations
solve 1000: 7 iterations

Time in solve():      97.977 s
Time in KSPSolve():   97.7361 s (99.7541%)

Number of   KSP iterations (total): 7000
Number of solve iterations (total): 1000 (ratio: 7.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdw-0373 with 125 processors, by jczhang Mon Jun  4 23:24:13 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           1.987e+02      1.00000   1.987e+02
Objects:              4.249e+04      1.00002   4.249e+04
Flop:                 3.698e+10      1.15842   3.501e+10  4.377e+12
Flop/sec:            1.862e+08      1.15841   1.763e+08  2.203e+10
MPI Messages:         1.816e+06      3.38531   1.236e+06  1.545e+08
MPI Message Lengths:  2.275e+09      2.20338   1.423e+03  2.198e+11
MPI Reductions:       3.759e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 6.1684e-02   0.0%  0.0000e+00   0.0%  1.200e+03   0.0%  1.802e+03        0.0%  1.700e+01   0.0% 
 1:     First Solve: 1.0060e+02  50.6%  5.6491e+09   0.1%  4.212e+05   0.3%  3.421e+03        0.7%  5.660e+02   1.5% 
 2: Remaining Solves: 9.7993e+01  49.3%  4.3710e+12  99.9%  1.541e+08  99.7%  1.417e+03       99.3%  3.700e+04  98.4% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 7.2002e-05 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSided         12 1.0 7.2002e-03 2.1 0.00e+00 0.0 8.8e+03 4.0e+00 0.0e+00  0  0  0  0  0   0  0  2  0  0     0
BuildTwoSidedF        30 1.0 2.9978e+0114.5 0.00e+00 0.0 7.1e+03 1.0e+04 0.0e+00  4  0  0  0  0   7  0  2  5  0     0
KSPSetUp               9 1.0 4.5559e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
KSPSolve               1 1.0 1.0060e+02 1.0 4.82e+07 1.2 4.2e+05 3.4e+03 5.7e+02 51  0  0  1  2 100100100100100    56
VecTDot               14 1.0 2.5394e-02 2.2 7.56e+05 1.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  0   0  2  0  0  2  3721
VecNorm                9 1.0 6.1355e-03 9.9 4.86e+05 1.0 0.0e+00 0.0e+00 9.0e+00  0  0  0  0  0   0  1  0  0  2  9901
VecScale              42 1.0 4.3249e-04 3.8 9.47e+04 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 21120
VecCopy                1 1.0 1.5402e-04 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               178 1.0 2.1267e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               14 1.0 5.2488e-0311.6 7.56e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  2  0  0  0 18004
VecAYPX               49 1.0 1.5860e-03 2.7 6.46e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 50301
VecAssemblyBegin       2 1.0 2.8849e-05 7.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         2 1.0 2.6941e-05 9.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      178 1.0 6.5460e-03 4.1 0.00e+00 0.0 1.5e+05 1.4e+03 0.0e+00  0  0  0  0  0   0  0 37 15  0     0
VecScatterEnd        178 1.0 6.8015e-02 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               50 1.0 3.9924e-02 2.2 1.05e+07 1.1 5.1e+04 2.1e+03 0.0e+00  0  0  0  0  0   0 22 12  7  0 31204
MatMultAdd            42 1.0 3.1942e-02 5.7 2.40e+06 1.3 2.8e+04 6.7e+02 0.0e+00  0  0  0  0  0   0  5  7  1  0  8625
MatMultTranspose      42 1.0 1.7802e-02 2.1 2.40e+06 1.3 2.8e+04 6.7e+02 0.0e+00  0  0  0  0  0   0  5  7  1  0 15476
MatSolve               7 0.0 9.3460e-05 0.0 8.40e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     9
MatSOR                84 1.0 8.0777e-02 2.0 1.90e+07 1.2 4.7e+04 1.6e+03 1.4e+01  0  0  0  0  0   0 40 11  5  2 27852
MatLUFactorSym         1 1.0 2.8300e-0418.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 7.9155e-0519.5 3.14e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     4
MatResidual           42 1.0 3.4999e-02 2.5 7.97e+06 1.2 4.7e+04 1.6e+03 0.0e+00  0  0  0  0  0   0 17 11  5  0 26653
MatAssemblyBegin      94 1.0 2.9981e+0114.4 0.00e+00 0.0 7.1e+03 1.0e+04 0.0e+00  4  0  0  0  0   7  0  2  5  0     0
MatAssemblyEnd        94 1.0 1.1404e-01 1.1 0.00e+00 0.0 6.3e+04 2.1e+02 2.3e+02  0  0  0  0  1   0  0 15  1 41     0
MatGetRow        3100250 1.2 4.7874e+01 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21  0  0  0  0  42  0  0  0  0     0
MatGetRowIJ            1 0.0 1.3828e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMats       6 1.0 1.5563e-01 2.0 0.00e+00 0.0 5.5e+04 1.8e+04 1.2e+01  0  0  0  0  0   0  0 13 67  2     0
MatCreateSubMat        4 1.0 8.4031e-02 1.1 0.00e+00 0.0 2.8e+03 2.8e+02 6.4e+01  0  0  0  0  0   0  0  1  0 11     0
MatGetOrdering         1 0.0 1.0204e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       6 1.0 3.1606e-02 1.2 0.00e+00 0.0 2.7e+04 1.0e+03 1.2e+01  0  0  0  0  0   0  0  6  2  2     0
MatCoarsen             6 1.0 8.2941e-03 1.0 0.00e+00 0.0 5.4e+04 6.0e+02 3.4e+01  0  0  0  0  0   0  0 13  2  6     0
MatZeroEntries         6 1.0 1.7359e-03 4.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                6 1.0 2.1538e-01 1.0 1.13e+07 1.3 6.4e+04 2.7e+03 9.2e+01  0  0  0  0  0   0 23 15 12 16  5910
MatPtAPSymbolic        6 1.0 1.4594e-01 1.0 0.00e+00 0.0 3.4e+04 2.7e+03 4.2e+01  0  0  0  0  0   0  0  8  6  7     0
MatPtAPNumeric         6 1.0 6.9003e-02 1.0 1.13e+07 1.3 2.9e+04 2.6e+03 4.8e+01  0  0  0  0  0   0 23  7  5  8 18448
MatGetLocalMat         6 1.0 2.7068e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          6 1.0 5.0828e-03 1.5 0.00e+00 0.0 2.0e+04 3.6e+03 0.0e+00  0  0  0  0  0   0  0  5  5  0     0
SFSetGraph            12 1.0 1.0014e-04 5.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp               12 1.0 9.7704e-03 1.2 0.00e+00 0.0 2.6e+04 6.3e+02 0.0e+00  0  0  0  0  0   0  0  6  1  0     0
SFBcastBegin          46 1.0 1.5848e-03 2.7 0.00e+00 0.0 5.5e+04 7.0e+02 0.0e+00  0  0  0  0  0   0  0 13  3  0     0
SFBcastEnd            46 1.0 3.4416e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
GAMG: createProl       6 1.0 1.0010e+02 1.0 0.00e+00 0.0 2.0e+05 5.3e+03 2.9e+02 50  0  0  0  1 100  0 47 73 51     0
GAMG: partLevel        6 1.0 3.0237e-01 1.0 1.13e+07 1.3 6.6e+04 2.6e+03 1.9e+02  0  0  0  0  1   0 23 16 12 34  4210
  repartition          2 1.0 6.8307e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  2     0
  Invert-Sort          2 1.0 7.5531e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  1     0
  Move A               2 1.0 7.2515e-02 1.1 0.00e+00 0.0 1.4e+03 5.4e+02 3.4e+01  0  0  0  0  0   0  0  0  0  6     0
  Move P               2 1.0 1.6035e-02 1.3 0.00e+00 0.0 1.4e+03 1.3e+01 3.4e+01  0  0  0  0  0   0  0  0  0  6     0
PCSetUp                2 1.0 1.0041e+02 1.0 1.13e+07 1.3 2.7e+05 4.6e+03 5.1e+02 51  0  0  1  1 100 23 63 85 91    13
PCSetUpOnBlocks        7 1.0 5.0783e-04 3.8 3.14e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1
PCApply                7 1.0 1.2734e-01 1.1 3.18e+07 1.2 1.5e+05 1.2e+03 1.4e+01  0  0  0  0  0   0 66 35 13  2 29321

--- Event Stage 2: Remaining Solves

KSPSolve            1000 1.0 9.7831e+01 1.0 3.69e+10 1.2 1.5e+08 1.4e+03 3.7e+04 49100100 99 98 100100100100100 44679
VecTDot            14000 1.0 8.3188e+00 5.1 7.56e+08 1.0 0.0e+00 0.0e+00 1.4e+04  2  2  0  0 37   3  2  0  0 38 11360
VecNorm             9000 1.0 1.4898e+00 2.0 4.86e+08 1.0 0.0e+00 0.0e+00 9.0e+03  0  1  0  0 24   1  1  0  0 24 40778
VecScale           42000 1.0 3.7866e-01 3.2 9.47e+07 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 24123
VecCopy             1000 1.0 8.0102e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            147000 1.0 2.0034e+00 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0
VecAXPY            14000 1.0 9.5575e-01 2.1 7.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   1  2  0  0  0 98875
VecAYPX            49000 1.0 1.6010e+00 3.0 6.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0 49830
VecScatterBegin   176000 1.0 5.7110e+00 4.0 0.00e+00 0.0 1.5e+08 1.4e+03 0.0e+00  2  0100 99  0   4  0100100  0     0
VecScatterEnd     176000 1.0 4.8250e+01 6.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0  17  0  0  0  0     0
MatMult            50000 1.0 3.2538e+01 1.9 1.05e+10 1.1 5.1e+07 2.1e+03 0.0e+00 11 28 33 49  0  22 29 33 49  0 38287
MatMultAdd         42000 1.0 1.9521e+01 3.4 2.40e+09 1.3 2.8e+07 6.7e+02 0.0e+00  5  6 18  9  0   9  6 18  9  0 14113
MatMultTranspose   42000 1.0 1.5577e+01 2.1 2.40e+09 1.3 2.8e+07 6.7e+02 0.0e+00  5  6 18  9  0  10  6 18  9  0 17687
MatSolve            7000 0.0 1.0978e-01 0.0 8.40e+05 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     8
MatSOR             84000 1.0 5.0067e+01 2.1 1.90e+10 1.2 4.7e+07 1.6e+03 1.4e+04 23 51 30 33 37  46 51 30 33 38 44834
MatResidual        42000 1.0 2.8185e+01 2.1 7.97e+09 1.2 4.7e+07 1.6e+03 0.0e+00  9 21 30 33  0  18 21 30 33  0 33097
PCSetUpOnBlocks     7000 1.0 1.3226e-01 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             7000 1.0 8.8024e+01 1.1 3.18e+10 1.2 1.5e+08 1.2e+03 1.4e+04 44 85 97 84 37  89 85 97 84 38 42358
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              9        11424     0.
     DMKSP interface     1              1          656     0.
              Vector     4             52      2371888     0.
              Matrix     0             72     14160468     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2             12       133928     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1             14       233696     0.
      Preconditioner     1              9         9676     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     8              0            0     0.
              Vector   158            110      3181312     0.
              Matrix   140             68     21757144     0.
      Matrix Coarsen     6              6         3816     0.
           Index Set   110            100       543716     0.
   Star Forest Graph    12             12        10368     0.
         Vec Scatter    31             18        22752     0.
      Preconditioner     8              0            0     0.

--- Event Stage 2: Remaining Solves

              Vector 42000          42000   2276680000     0.
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 0.00109181
Average time for zero size MPI_Send(): 6.45638e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3" --CXXOPTFLAGS="-g -O3" --FOPTFLAGS="-g -O3" --with-openmp=1 --download-sowing --download-ptscotch=1 --download-fblaslapack=1 --download-scalapack=1 --download-strumpack=1 --download-superlu_dist=1 --download-metis=1 --download-parmetis=1 --download-mumps=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 03:36:36 on beboplogin2 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lstrumpack -lscalapack -lsuperlu_dist -lflapack -lfblas -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lrt -lm -lpthread -lz -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Scaling-loss.png
Type: image/png
Size: 96894 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180606/fda3c121/attachment-0004.png>
-------------- next part --------------
using 1000 of 1000 processes
30^3 unknowns per processor
total system size: 300^3
mesh size: 0.0001

initsolve: 8 iterations
solve 1: 8 iterations
solve 2: 8 iterations
solve 3: 8 iterations
solve 4: 8 iterations
solve 5: 8 iterations
solve 6: 8 iterations
solve 7: 8 iterations
solve 8: 8 iterations
solve 9: 8 iterations
solve 10: 8 iterations
solve 20: 8 iterations
solve 30: 8 iterations
solve 40: 8 iterations
solve 50: 8 iterations
solve 60: 8 iterations
solve 70: 8 iterations
solve 80: 8 iterations
solve 90: 8 iterations
solve 100: 8 iterations
solve 200: 8 iterations
solve 300: 8 iterations
solve 400: 8 iterations
solve 500: 8 iterations
solve 600: 8 iterations
solve 700: 8 iterations
solve 800: 8 iterations
solve 900: 8 iterations
solve 1000: 8 iterations

Time in solve():      127 s
Time in KSPSolve():   126.753 s (99.8054%)

Number of   KSP iterations (total): 8000
Number of solve iterations (total): 1000 (ratio: 8.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdw-0385 with 1000 processors, by jczhang Mon Jun  4 15:33:32 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           2.339e+02      1.00002   2.339e+02
Objects:              4.854e+04      1.00002   4.854e+04
Flop:                 4.220e+10      1.15865   4.106e+10  4.106e+13
Flop/sec:            1.805e+08      1.15865   1.756e+08  1.756e+11
MPI Messages:         2.436e+06      3.97680   1.683e+06  1.683e+09
MPI Message Lengths:  2.592e+09      2.20360   1.364e+03  2.296e+12
MPI Reductions:       4.266e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.3538e-01   0.1%  0.0000e+00   0.0%  1.080e+04   0.0%  1.802e+03        0.0%  1.700e+01   0.0% 
 1:     First Solve: 1.0660e+02  45.6%  5.1626e+10   0.1%  4.348e+06   0.3%  3.241e+03        0.6%  6.340e+02   1.5% 
 2: Remaining Solves: 1.2702e+02  54.3%  4.1013e+13  99.9%  1.679e+09  99.7%  1.359e+03       99.4%  4.200e+04  98.5% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 5.6801e-03212.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSided         12 1.0 1.6171e-02 5.3 0.00e+00 0.0 8.9e+04 4.0e+00 0.0e+00  0  0  0  0  0   0  0  2  0  0     0
BuildTwoSidedF        30 1.0 1.9437e+01 7.1 0.00e+00 0.0 6.5e+04 1.1e+04 0.0e+00  2  0  0  0  0   5  0  2  5  0     0
KSPSetUp               9 1.0 6.0689e-03 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
KSPSolve               1 1.0 1.0660e+02 1.0 5.33e+07 1.2 4.3e+06 3.2e+03 6.3e+02 46  0  0  1  1 100100100100100   484
VecTDot               16 1.0 1.4536e-02 2.0 8.64e+05 1.0 0.0e+00 0.0e+00 1.6e+01  0  0  0  0  0   0  2  0  0  3 59437
VecNorm               10 1.0 2.3881e-02 1.9 5.40e+05 1.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  1  0  0  2 22612
VecScale              48 1.0 4.1533e-0331.5 1.08e+05 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 23084
VecCopy                1 1.0 4.5199e-03119.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               208 1.0 2.6751e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               16 1.0 8.8606e-03 8.4 8.64e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  2  0  0  0 97510
VecAYPX               56 1.0 3.1478e-03 2.8 7.42e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 234365
VecAssemblyBegin       3 1.0 9.2983e-05 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         3 1.0 8.8930e-0512.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      208 1.0 1.5633e-02 5.7 0.00e+00 0.0 1.7e+06 1.4e+03 0.0e+00  0  0  0  0  0   0  0 39 16  0     0
VecScatterEnd        208 1.0 5.7859e-02 4.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               57 1.0 4.2757e-02 1.9 1.19e+07 1.1 5.6e+05 2.0e+03 0.0e+00  0  0  0  0  0   0 23 13  8  0 272044
MatMultAdd            48 1.0 2.7018e-02 3.1 2.75e+06 1.3 3.0e+05 6.6e+02 0.0e+00  0  0  0  0  0   0  5  7  1  0 97346
MatMultTranspose      48 1.0 2.4651e-02 2.7 2.75e+06 1.3 3.0e+05 6.6e+02 0.0e+00  0  0  0  0  0   0  5  7  1  0 106693
MatSolve               8 0.0 4.3631e-05 0.0 1.14e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   262
MatSOR                96 1.0 6.6015e-02 1.6 2.18e+07 1.2 5.2e+05 1.5e+03 1.6e+01  0  0  0  0  0   0 41 12  5  3 320486
MatLUFactorSym         1 1.0 1.0937e-02917.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 8.9009e-032333.3 1.28e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1
MatResidual           48 1.0 3.4651e-02 2.0 9.11e+06 1.2 5.2e+05 1.5e+03 0.0e+00  0  0  0  0  0   0 17 12  5  0 254403
MatAssemblyBegin     102 1.0 1.9440e+01 7.1 0.00e+00 0.0 6.5e+04 1.1e+04 0.0e+00  2  0  0  0  0   5  0  2  5  0     0
MatAssemblyEnd       102 1.0 8.1546e-02 1.3 0.00e+00 0.0 6.3e+05 2.0e+02 2.5e+02  0  0  0  0  1   0  0 14  1 39     0
MatGetRow        3100266 1.2 4.9949e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 20  0  0  0  0  44  0  0  0  0     0
MatGetRowIJ            1 0.0 2.5988e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMats       6 1.0 1.6449e-01 2.3 0.00e+00 0.0 5.7e+05 1.6e+04 1.2e+01  0  0  0  0  0   0  0 13 66  2     0
MatCreateSubMat        6 1.0 4.2039e-02 1.1 0.00e+00 0.0 2.2e+04 3.3e+02 9.4e+01  0  0  0  0  0   0  0  1  0 15     0
MatGetOrdering         1 0.0 9.8944e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       6 1.0 5.4760e-02 1.2 0.00e+00 0.0 2.6e+05 9.9e+02 1.2e+01  0  0  0  0  0   0  0  6  2  2     0
MatCoarsen             6 1.0 3.3504e-02 1.2 0.00e+00 0.0 5.4e+05 5.6e+02 4.8e+01  0  0  0  0  0   0  0 12  2  8     0
MatZeroEntries         6 1.0 1.6251e-03 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                6 1.0 1.9174e-01 1.1 1.11e+07 1.3 6.3e+05 2.5e+03 9.2e+01  0  0  0  0  0   0 20 15 11 15 55134
MatPtAPSymbolic        6 1.0 1.0971e-01 1.0 0.00e+00 0.0 3.2e+05 2.7e+03 4.2e+01  0  0  0  0  0   0  0  7  6  7     0
MatPtAPNumeric         6 1.0 7.3850e-02 1.0 1.11e+07 1.3 3.1e+05 2.3e+03 4.8e+01  0  0  0  0  0   0 20  7  5  8 143149
MatGetLocalMat         6 1.0 3.0208e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          6 1.0 1.0909e-02 4.0 0.00e+00 0.0 1.9e+05 3.5e+03 0.0e+00  0  0  0  0  0   0  0  4  5  0     0
SFSetGraph            12 1.0 1.4520e-04 5.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp               12 1.0 2.5385e-02 1.9 0.00e+00 0.0 2.7e+05 5.8e+02 0.0e+00  0  0  0  0  0   0  0  6  1  0     0
SFBcastBegin          60 1.0 5.0180e-03 6.0 0.00e+00 0.0 5.6e+05 6.5e+02 0.0e+00  0  0  0  0  0   0  0 13  3  0     0
SFBcastEnd            60 1.0 9.5491e-03 6.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
GAMG: createProl       6 1.0 1.0600e+02 1.0 0.00e+00 0.0 2.0e+06 5.1e+03 3.0e+02 45  0  0  0  1  99  0 46 72 47     0
GAMG: partLevel        6 1.0 2.8349e-01 1.0 1.11e+07 1.3 6.5e+05 2.4e+03 2.4e+02  0  0  0  0  1   0 20 15 11 38 37292
  repartition          3 1.0 1.2683e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
  Invert-Sort          3 1.0 3.0384e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  2     0
  Move A               3 1.0 3.9142e-02 1.3 0.00e+00 0.0 9.5e+03 7.4e+02 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
  Move P               3 1.0 1.2856e-02 1.6 0.00e+00 0.0 1.2e+04 1.3e+01 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
PCSetUp                2 1.0 1.0634e+02 1.0 1.11e+07 1.3 2.7e+06 4.4e+03 5.8e+02 45  0  0  1  1 100 20 61 84 91    99
PCSetUpOnBlocks        8 1.0 1.2212e-0291.8 1.28e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1
PCApply                8 1.0 1.2359e-01 1.1 3.64e+07 1.2 1.6e+06 1.2e+03 1.6e+01  0  0  0  0  0   0 68 37 14  3 285064

--- Event Stage 2: Remaining Solves

KSPSolve            1000 1.0 1.2680e+02 1.0 4.21e+10 1.2 1.7e+09 1.4e+03 4.2e+04 54100100 99 98 100100100100100 323446
VecTDot            16000 1.0 9.8928e+00 2.2 8.64e+08 1.0 0.0e+00 0.0e+00 1.6e+04  2  2  0  0 38   4  2  0  0 38 87335
VecNorm            10000 1.0 2.1671e+00 1.3 5.40e+08 1.0 0.0e+00 0.0e+00 1.0e+04  1  1  0  0 23   1  1  0  0 24 249178
VecScale           48000 1.0 5.0265e-01 4.3 1.08e+08 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 190734
VecCopy             1000 1.0 8.4914e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            168000 1.0 2.3076e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0
VecAXPY            16000 1.0 1.1082e+00 1.4 8.64e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   1  2  0  0  0 779639
VecAYPX            56000 1.0 1.9399e+00 1.8 7.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0 380298
VecScatterBegin   201000 1.0 6.9948e+00 2.9 0.00e+00 0.0 1.7e+09 1.4e+03 0.0e+00  2  0100 99  0   5  0100100  0     0
VecScatterEnd     201000 1.0 4.8907e+01 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0  17  0  0  0  0     0
MatMult            57000 1.0 3.3672e+01 1.5 1.19e+10 1.1 5.6e+08 2.0e+03 0.0e+00 10 28 34 49  0  19 28 34 49  0 345446
MatMultAdd         48000 1.0 2.2597e+01 2.4 2.75e+09 1.3 3.0e+08 6.6e+02 0.0e+00  6  6 18  9  0  11  6 18  9  0 116391
MatMultTranspose   48000 1.0 1.8192e+01 1.9 2.75e+09 1.3 3.0e+08 6.6e+02 0.0e+00  5  6 18  9  0   9  6 18  9  0 144569
MatSolve            8000 0.0 2.8324e-02 0.0 1.14e+07 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   404
MatSOR             96000 1.0 6.2678e+01 1.5 2.17e+10 1.2 5.2e+08 1.5e+03 1.6e+04 25 51 31 33 38  47 51 31 34 38 336886
MatResidual        48000 1.0 2.9771e+01 1.7 9.11e+09 1.2 5.2e+08 1.5e+03 0.0e+00  9 21 31 33  0  16 21 31 34  0 296101
PCSetUpOnBlocks     8000 1.0 1.5038e-0120.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             8000 1.0 1.1220e+02 1.0 3.63e+10 1.2 1.6e+09 1.2e+03 1.6e+04 48 86 97 84 38  88 86 97 85 38 313653
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              9        11424     0.
     DMKSP interface     1              1          656     0.
              Vector     4             52      2382272     0.
              Matrix     0             65     14796304     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2             18       171648     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1             14       233696     0.
      Preconditioner     1              9         9676     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     8              0            0     0.
              Vector   176            128      3546680     0.
              Matrix   148             83     22941288     0.
      Matrix Coarsen     6              6         3816     0.
           Index Set   128            112       590732     0.
   Star Forest Graph    12             12        10368     0.
         Vec Scatter    34             21        26544     0.
      Preconditioner     8              0            0     0.

--- Event Stage 2: Remaining Solves

              Vector 48000          48000   2616576000     0.
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 1.52111e-05
Average time for zero size MPI_Send(): 7.22694e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3" --CXXOPTFLAGS="-g -O3" --FOPTFLAGS="-g -O3" --with-openmp=1 --download-sowing --download-ptscotch=1 --download-fblaslapack=1 --download-scalapack=1 --download-strumpack=1 --download-superlu_dist=1 --download-metis=1 --download-parmetis=1 --download-mumps=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-04 18:36:31 on beboplogin1 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lstrumpack -lscalapack -lsuperlu_dist -lflapack -lfblas -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lrt -lm -lpthread -lz -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
using 125 of 125 processes
30^3 unknowns per processor
total system size: 150^3
mesh size: 0.0001

initsolve: 7 iterations
solve 1: 7 iterations
solve 2: 7 iterations
solve 3: 7 iterations
solve 4: 7 iterations
solve 5: 7 iterations
solve 6: 7 iterations
solve 7: 7 iterations
solve 8: 7 iterations
solve 9: 7 iterations
solve 10: 7 iterations
solve 20: 7 iterations
solve 30: 7 iterations
solve 40: 7 iterations
solve 50: 7 iterations
solve 60: 7 iterations
solve 70: 7 iterations
solve 80: 7 iterations
solve 90: 7 iterations
solve 100: 7 iterations
solve 200: 7 iterations
solve 300: 7 iterations
solve 400: 7 iterations
solve 500: 7 iterations
solve 600: 7 iterations
solve 700: 7 iterations
solve 800: 7 iterations
solve 900: 7 iterations
solve 1000: 7 iterations

Time in solve():      107.17 s
Time in KSPSolve():   106.928 s (99.7738%)

Number of   KSP iterations (total): 7000
Number of solve iterations (total): 1000 (ratio: 7.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdw-0247 with 125 processors, by jczhang Mon Jun  4 14:37:13 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           2.089e+02      1.00002   2.089e+02
Objects:              4.249e+04      1.00002   4.249e+04
Flop:                 3.698e+10      1.15842   3.501e+10  4.377e+12
Flop/sec:            1.770e+08      1.15842   1.676e+08  2.095e+10
MPI Messages:         1.816e+06      3.38531   1.236e+06  1.545e+08
MPI Message Lengths:  2.275e+09      2.20338   1.423e+03  2.198e+11
MPI Reductions:       3.759e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 1.8523e-01   0.1%  0.0000e+00   0.0%  1.200e+03   0.0%  1.802e+03        0.0%  1.700e+01   0.0% 
 1:     First Solve: 1.0152e+02  48.6%  5.6491e+09   0.1%  4.212e+05   0.3%  3.421e+03        0.7%  5.660e+02   1.5% 
 2: Remaining Solves: 1.0719e+02  51.3%  4.3710e+12  99.9%  1.541e+08  99.7%  1.417e+03       99.3%  3.700e+04  98.4% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 5.0769e-03195.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSided         12 1.0 8.1227e-03 3.7 0.00e+00 0.0 8.8e+03 4.0e+00 0.0e+00  0  0  0  0  0   0  0  2  0  0     0
BuildTwoSidedF        30 1.0 3.0751e+0114.9 0.00e+00 0.0 7.1e+03 1.0e+04 0.0e+00  4  0  0  0  0   8  0  2  5  0     0
KSPSetUp               9 1.0 5.4052e-03 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
KSPSolve               1 1.0 1.0153e+02 1.0 4.82e+07 1.2 4.2e+05 3.4e+03 5.7e+02 49  0  0  1  2 100100100100100    56
VecTDot               14 1.0 1.1638e-02 2.3 7.56e+05 1.0 0.0e+00 0.0e+00 1.4e+01  0  0  0  0  0   0  2  0  0  2  8120
VecNormBarrier         9 1.0 9.0270e-0333.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNorm                9 1.0 1.3406e-02 1.2 4.86e+05 1.0 0.0e+00 0.0e+00 9.0e+00  0  0  0  0  0   0  1  0  0  2  4532
VecScale              42 1.0 4.1047e-02154.1 9.47e+04 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   223
VecCopy                1 1.0 4.3828e-03109.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               178 1.0 2.3446e-03 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               14 1.0 5.2881e-0310.4 7.56e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  2  0  0  0 17870
VecAYPX               49 1.0 1.6847e-03 2.4 6.46e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 47354
VecAssemblyBegin       2 1.0 4.7922e-05 8.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         2 1.0 3.6955e-05 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBarrie     178 1.0 1.0980e-01 5.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      178 1.0 1.5828e-02 4.6 0.00e+00 0.0 1.5e+05 1.4e+03 0.0e+00  0  0  0  0  0   0  0 37 15  0     0
VecScatterEnd        178 1.0 2.3170e-02 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               50 1.0 5.3129e-02 1.8 1.05e+07 1.1 5.1e+04 2.1e+03 0.0e+00  0  0  0  0  0   0 22 12  7  0 23448
MatMultAdd            42 1.0 6.8014e-02 6.5 2.40e+06 1.3 2.8e+04 6.7e+02 0.0e+00  0  0  0  0  0   0  5  7  1  0  4051
MatMultTranspose      42 1.0 1.5433e-02 1.3 2.40e+06 1.3 2.8e+04 6.7e+02 0.0e+00  0  0  0  0  0   0  5  7  1  0 17852
MatSolve               7 0.0 8.2970e-05 0.0 8.40e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    10
MatSOR                84 1.0 1.0138e-01 3.2 1.90e+07 1.2 4.7e+04 1.6e+03 1.4e+01  0  0  0  0  0   0 40 11  5  2 22193
MatLUFactorSym         1 1.0 9.9010e-03769.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 7.3540e-031542.2 3.14e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatResidual           42 1.0 3.8404e-02 1.8 7.97e+06 1.2 4.7e+04 1.6e+03 0.0e+00  0  0  0  0  0   0 17 11  5  0 24290
MatAssemblyBegin      94 1.0 3.0754e+0114.9 0.00e+00 0.0 7.1e+03 1.0e+04 0.0e+00  4  0  0  0  0   8  0  2  5  0     0
MatAssemblyEnd        94 1.0 5.9230e-02 1.3 0.00e+00 0.0 6.3e+04 2.1e+02 2.3e+02  0  0  0  0  1   0  0 15  1 41     0
MatGetRow        3100250 1.2 4.8146e+01 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 20  0  0  0  0  42  0  0  0  0     0
MatGetRowIJ            1 0.0 2.9087e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMats       6 1.0 1.5237e-01 2.1 0.00e+00 0.0 5.5e+04 1.8e+04 1.2e+01  0  0  0  0  0   0  0 13 67  2     0
MatCreateSubMat        4 1.0 2.0799e-02 1.3 0.00e+00 0.0 2.8e+03 2.8e+02 6.4e+01  0  0  0  0  0   0  0  1  0 11     0
MatGetOrdering         1 0.0 7.9701e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       6 1.0 3.3583e-02 1.3 0.00e+00 0.0 2.7e+04 1.0e+03 1.2e+01  0  0  0  0  0   0  0  6  2  2     0
MatCoarsen             6 1.0 2.2035e-02 1.2 0.00e+00 0.0 5.4e+04 6.0e+02 3.4e+01  0  0  0  0  0   0  0 13  2  6     0
MatZeroEntries         6 1.0 1.8528e-03 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                6 1.0 1.4971e-01 1.1 1.13e+07 1.3 6.4e+04 2.7e+03 9.2e+01  0  0  0  0  0   0 23 15 12 16  8503
MatPtAPSymbolic        6 1.0 8.6746e-02 1.0 0.00e+00 0.0 3.4e+04 2.7e+03 4.2e+01  0  0  0  0  0   0  0  8  6  7     0
MatPtAPNumeric         6 1.0 5.5958e-02 1.0 1.13e+07 1.3 2.9e+04 2.6e+03 4.8e+01  0  0  0  0  0   0 23  7  5  8 22748
MatGetLocalMat         6 1.0 2.8403e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          6 1.0 9.4802e-03 3.1 0.00e+00 0.0 2.0e+04 3.6e+03 0.0e+00  0  0  0  0  0   0  0  5  5  0     0
SFSetGraph            12 1.0 1.0824e-04 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp               12 1.0 1.5241e-02 2.2 0.00e+00 0.0 2.6e+04 6.3e+02 0.0e+00  0  0  0  0  0   0  0  6  1  0     0
SFBcastBegin          46 1.0 3.5763e-03 6.3 0.00e+00 0.0 5.5e+04 7.0e+02 0.0e+00  0  0  0  0  0   0  0 13  3  0     0
SFBcastEnd            46 1.0 6.2499e-03 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
GAMG: createProl       6 1.0 1.0099e+02 1.0 0.00e+00 0.0 2.0e+05 5.3e+03 2.9e+02 48  0  0  0  1  99  0 47 73 51     0
GAMG: partLevel        6 1.0 2.0149e-01 1.1 1.13e+07 1.3 6.6e+04 2.6e+03 1.9e+02  0  0  0  0  1   0 23 16 12 34  6317
  repartition          2 1.0 1.9701e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  2     0
  Invert-Sort          2 1.0 1.9929e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  1     0
  Move A               2 1.0 2.1171e-02 1.7 0.00e+00 0.0 1.4e+03 5.4e+02 3.4e+01  0  0  0  0  0   0  0  0  0  6     0
  Move P               2 1.0 8.9321e-03 1.9 0.00e+00 0.0 1.4e+03 1.3e+01 3.4e+01  0  0  0  0  0   0  0  0  0  6     0
PCSetUp                2 1.0 1.0124e+02 1.0 1.13e+07 1.3 2.7e+05 4.6e+03 5.1e+02 48  0  0  1  1 100 23 63 85 91    13
PCSetUpOnBlocks        7 1.0 1.1790e-02 4.5 3.14e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply                7 1.0 1.6586e-01 1.0 3.18e+07 1.2 1.5e+05 1.2e+03 1.4e+01  0  0  0  0  0   0 66 35 13  2 22511

--- Event Stage 2: Remaining Solves

KSPSolve            1000 1.0 1.0703e+02 1.0 3.69e+10 1.2 1.5e+08 1.4e+03 3.7e+04 51100100 99 98 100100100100100 40840
VecTDot            14000 1.0 7.6976e+00 7.3 7.56e+08 1.0 0.0e+00 0.0e+00 1.4e+04  1  2  0  0 37   2  2  0  0 38 12276
VecNormBarrier      9000 1.0 9.7604e-01 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNorm             9000 1.0 6.1436e-01 1.2 4.86e+08 1.0 0.0e+00 0.0e+00 9.0e+03  0  1  0  0 24   1  1  0  0 24 98883
VecScale           42000 1.0 4.6654e-01 2.9 9.47e+07 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 19579
VecCopy             1000 1.0 8.0786e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            147000 1.0 2.1127e+00 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0
VecAXPY            14000 1.0 9.5190e-01 2.1 7.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   1  2  0  0  0 99276
VecAYPX            49000 1.0 1.6238e+00 2.5 6.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0 49131
VecScatterBarrie  176000 1.0 4.0948e+01 6.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  7  0  0  0  0  14  0  0  0  0     0
VecScatterBegin   176000 1.0 7.2618e+00 4.0 0.00e+00 0.0 1.5e+08 1.4e+03 0.0e+00  2  0100 99  0   4  0100100  0     0
VecScatterEnd     176000 1.0 1.8905e+01 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   7  0  0  0  0     0
MatMult            50000 1.0 3.4401e+01 1.7 1.05e+10 1.1 5.1e+07 2.1e+03 0.0e+00 11 28 33 49  0  22 29 33 49  0 36213
MatMultAdd         42000 1.0 1.8117e+01 1.8 2.40e+09 1.3 2.8e+07 6.7e+02 0.0e+00  6  6 18  9  0  11  6 18  9  0 15208
MatMultTranspose   42000 1.0 1.5244e+01 1.3 2.40e+09 1.3 2.8e+07 6.7e+02 0.0e+00  6  6 18  9  0  12  6 18  9  0 18073
MatSolve            7000 0.0 7.3662e-02 0.0 8.40e+05 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0    11
MatSOR             84000 1.0 5.1378e+01 2.0 1.90e+10 1.2 4.7e+07 1.6e+03 1.4e+04 22 51 30 33 37  43 51 30 33 38 43690
MatResidual        42000 1.0 2.9679e+01 1.7 7.97e+09 1.2 4.7e+07 1.6e+03 0.0e+00 10 21 30 33  0  19 21 30 33  0 31430
PCSetUpOnBlocks     7000 1.0 1.3489e-01 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             7000 1.0 9.7196e+01 1.1 3.18e+10 1.2 1.5e+08 1.2e+03 1.4e+04 46 85 97 84 37  90 85 97 84 38 38361
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              9        11424     0.
     DMKSP interface     1              1          656     0.
              Vector     4             52      2371888     0.
              Matrix     0             72     14160468     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2             12       133928     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1             14       233696     0.
      Preconditioner     1              9         9676     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     8              0            0     0.
              Vector   158            110      3181312     0.
              Matrix   140             68     21757144     0.
      Matrix Coarsen     6              6         3816     0.
           Index Set   110            100       543716     0.
   Star Forest Graph    12             12        10368     0.
         Vec Scatter    31             18        22752     0.
      Preconditioner     8              0            0     0.

--- Event Stage 2: Remaining Solves

              Vector 42000          42000   2276680000     0.
========================================================================================================================
Average time to get PetscTime(): 6.19888e-07
Average time for MPI_Barrier(): 7.58171e-06
Average time for zero size MPI_Send(): 6.96945e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_sync
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3" --CXXOPTFLAGS="-g -O3" --FOPTFLAGS="-g -O3" --with-openmp=1 --download-sowing --download-ptscotch=1 --download-fblaslapack=1 --download-scalapack=1 --download-strumpack=1 --download-superlu_dist=1 --download-metis=1 --download-parmetis=1 --download-mumps=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-04 18:36:31 on beboplogin1 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lstrumpack -lscalapack -lsuperlu_dist -lflapack -lfblas -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lrt -lm -lpthread -lz -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
using 1000 of 1000 processes
30^3 unknowns per processor
total system size: 300^3
mesh size: 0.0001

initsolve: 8 iterations
solve 1: 8 iterations
solve 2: 8 iterations
solve 3: 8 iterations
solve 4: 8 iterations
solve 5: 8 iterations
solve 6: 8 iterations
solve 7: 8 iterations
solve 8: 8 iterations
solve 9: 8 iterations
solve 10: 8 iterations
solve 20: 8 iterations
solve 30: 8 iterations
solve 40: 8 iterations
solve 50: 8 iterations
solve 60: 8 iterations
solve 70: 8 iterations
solve 80: 8 iterations
solve 90: 8 iterations
solve 100: 8 iterations
solve 200: 8 iterations
solve 300: 8 iterations
solve 400: 8 iterations
solve 500: 8 iterations
solve 600: 8 iterations
solve 700: 8 iterations
solve 800: 8 iterations
solve 900: 8 iterations
solve 1000: 8 iterations

Time in solve():      150.306 s
Time in KSPSolve():   150.062 s (99.838%)

Number of   KSP iterations (total): 8000
Number of solve iterations (total): 1000 (ratio: 8.00)

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./wstest on a intel-bdw-opt named bdw-0289 with 1000 processors, by jczhang Mon Jun  4 14:49:04 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90  GIT Date: 2018-06-04 15:39:16 +0200

                         Max       Max/Min        Avg      Total 
Time (sec):           2.578e+02      1.00003   2.578e+02
Objects:              4.854e+04      1.00002   4.854e+04
Flop:                 4.220e+10      1.15865   4.106e+10  4.106e+13
Flop/sec:            1.637e+08      1.15867   1.593e+08  1.593e+11
MPI Messages:         2.436e+06      3.97680   1.683e+06  1.683e+09
MPI Message Lengths:  2.592e+09      2.20360   1.364e+03  2.296e+12
MPI Reductions:       4.266e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.3517e-01   0.1%  0.0000e+00   0.0%  1.080e+04   0.0%  1.802e+03        0.0%  1.700e+01   0.0% 
 1:     First Solve: 1.0727e+02  41.6%  5.1626e+10   0.1%  4.348e+06   0.3%  3.241e+03        0.6%  6.340e+02   1.5% 
 2: Remaining Solves: 1.5032e+02  58.3%  4.1013e+13  99.9%  1.679e+09  99.7%  1.359e+03       99.4%  4.200e+04  98.5% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 2 1.0 5.7883e-03214.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: First Solve

BuildTwoSided         12 1.0 1.3386e-02 4.1 0.00e+00 0.0 8.9e+04 4.0e+00 0.0e+00  0  0  0  0  0   0  0  2  0  0     0
BuildTwoSidedF        30 1.0 2.0080e+01 7.8 0.00e+00 0.0 6.5e+04 1.1e+04 0.0e+00  2  0  0  0  0   5  0  2  5  0     0
KSPSetUp               9 1.0 6.3870e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
KSPSolve               1 1.0 1.0726e+02 1.0 5.33e+07 1.2 4.3e+06 3.2e+03 6.3e+02 42  0  0  1  1 100100100100100   481
VecTDot               16 1.0 1.1937e-02 1.9 8.64e+05 1.0 0.0e+00 0.0e+00 1.6e+01  0  0  0  0  0   0  2  0  0  3 72376
VecNormBarrier        10 1.0 2.8527e-0228.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNorm               10 1.0 2.5727e-02 1.0 5.40e+05 1.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  1  0  0  2 20990
VecScale              48 1.0 5.2900e-0313.4 1.08e+05 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 18123
VecCopy                1 1.0 5.3449e-03136.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               208 1.0 2.9714e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               16 1.0 2.4686e-0224.0 8.64e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  2  0  0  0 34999
VecAYPX               56 1.0 2.0964e-03 1.8 7.42e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  1  0  0  0 351907
VecAssemblyBegin       3 1.0 8.3923e-0512.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         3 1.0 6.8665e-0511.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBarrie     208 1.0 8.8101e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      208 1.0 1.9391e-02 3.3 0.00e+00 0.0 1.7e+06 1.4e+03 0.0e+00  0  0  0  0  0   0  0 39 16  0     0
VecScatterEnd        208 1.0 2.5424e-02 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               57 1.0 5.9834e-02 1.5 1.19e+07 1.1 5.6e+05 2.0e+03 0.0e+00  0  0  0  0  0   0 23 13  8  0 194402
MatMultAdd            48 1.0 3.5768e-02 2.1 2.75e+06 1.3 3.0e+05 6.6e+02 0.0e+00  0  0  0  0  0   0  5  7  1  0 73531
MatMultTranspose      48 1.0 2.1412e-02 1.4 2.75e+06 1.3 3.0e+05 6.6e+02 0.0e+00  0  0  0  0  0   0  5  7  1  0 122831
MatSolve               8 0.0 7.7009e-05 0.0 1.14e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   149
MatSOR                96 1.0 8.2143e-02 1.6 2.18e+07 1.2 5.2e+05 1.5e+03 1.6e+01  0  0  0  0  0   0 41 12  5  3 257563
MatLUFactorSym         1 1.0 9.9230e-03832.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 7.5920e-031990.2 1.28e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     2
MatResidual           48 1.0 4.2435e-02 1.6 9.11e+06 1.2 5.2e+05 1.5e+03 0.0e+00  0  0  0  0  0   0 17 12  5  0 207738
MatAssemblyBegin     102 1.0 2.0083e+01 7.8 0.00e+00 0.0 6.5e+04 1.1e+04 0.0e+00  2  0  0  0  0   5  0  2  5  0     0
MatAssemblyEnd       102 1.0 7.7547e-02 1.3 0.00e+00 0.0 6.3e+05 2.0e+02 2.5e+02  0  0  0  0  1   0  0 14  1 39     0
MatGetRow        3100266 1.2 5.0587e+01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18  0  0  0  0  44  0  0  0  0     0
MatGetRowIJ            1 0.0 1.3113e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatCreateSubMats       6 1.0 1.7499e-01 2.4 0.00e+00 0.0 5.7e+05 1.6e+04 1.2e+01  0  0  0  0  0   0  0 13 66  2     0
MatCreateSubMat        6 1.0 4.6081e-02 1.1 0.00e+00 0.0 2.2e+04 3.3e+02 9.4e+01  0  0  0  0  0   0  0  1  0 15     0
MatGetOrdering         1 0.0 2.5415e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       6 1.0 5.4010e-02 1.2 0.00e+00 0.0 2.6e+05 9.9e+02 1.2e+01  0  0  0  0  0   0  0  6  2  2     0
MatCoarsen             6 1.0 3.1411e-02 1.2 0.00e+00 0.0 5.4e+05 5.6e+02 4.8e+01  0  0  0  0  0   0  0 12  2  8     0
MatZeroEntries         6 1.0 1.7152e-03 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatPtAP                6 1.0 1.9647e-01 1.0 1.11e+07 1.3 6.3e+05 2.5e+03 9.2e+01  0  0  0  0  0   0 20 15 11 15 53808
MatPtAPSymbolic        6 1.0 1.1147e-01 1.0 0.00e+00 0.0 3.2e+05 2.7e+03 4.2e+01  0  0  0  0  0   0  0  7  6  7     0
MatPtAPNumeric         6 1.0 7.6733e-02 1.0 1.11e+07 1.3 3.1e+05 2.3e+03 4.8e+01  0  0  0  0  0   0 20  7  5  8 137771
MatGetLocalMat         6 1.0 2.9888e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetBrAoCol          6 1.0 9.0182e-03 3.4 0.00e+00 0.0 1.9e+05 3.5e+03 0.0e+00  0  0  0  0  0   0  0  4  5  0     0
SFSetGraph            12 1.0 1.3924e-04 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp               12 1.0 2.1541e-02 2.5 0.00e+00 0.0 2.7e+05 5.8e+02 0.0e+00  0  0  0  0  0   0  0  6  1  0     0
SFBcastBegin          60 1.0 4.6461e-03 5.5 0.00e+00 0.0 5.6e+05 6.5e+02 0.0e+00  0  0  0  0  0   0  0 13  3  0     0
SFBcastEnd            60 1.0 6.3133e-03 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
GAMG: createProl       6 1.0 1.0658e+02 1.0 0.00e+00 0.0 2.0e+06 5.1e+03 3.0e+02 41  0  0  0  1  99  0 46 72 47     0
GAMG: partLevel        6 1.0 2.9647e-01 1.0 1.11e+07 1.3 6.5e+05 2.4e+03 2.4e+02  0  0  0  0  1   0 20 15 11 38 35658
  repartition          3 1.0 1.2493e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01  0  0  0  0  0   0  0  0  0  3     0
  Invert-Sort          3 1.0 3.0791e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  2     0
  Move A               3 1.0 4.2872e-02 1.3 0.00e+00 0.0 9.5e+03 7.4e+02 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
  Move P               3 1.0 1.3327e-02 1.6 0.00e+00 0.0 1.2e+04 1.3e+01 5.0e+01  0  0  0  0  0   0  0  0  0  8     0
PCSetUp                2 1.0 1.0693e+02 1.0 1.11e+07 1.3 2.7e+06 4.4e+03 5.8e+02 41  0  0  1  1 100 20 61 84 91    99
PCSetUpOnBlocks        8 1.0 1.2192e-02 3.8 1.28e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     1
PCApply                8 1.0 1.6499e-01 1.1 3.64e+07 1.2 1.6e+06 1.2e+03 1.6e+01  0  0  0  0  0   0 68 37 14  3 213545

--- Event Stage 2: Remaining Solves

KSPSolve            1000 1.0 1.5010e+02 1.0 4.21e+10 1.2 1.7e+09 1.4e+03 4.2e+04 58100100 99 98 100100100100100 273235
VecTDot            16000 1.0 8.7654e+00 2.8 8.64e+08 1.0 0.0e+00 0.0e+00 1.6e+04  2  2  0  0 38   3  2  0  0 38 98568
VecNormBarrier     10000 1.0 1.4725e+00 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
VecNorm            10000 1.0 1.1930e+00 1.2 5.40e+08 1.0 0.0e+00 0.0e+00 1.0e+04  0  1  0  0 23   1  1  0  0 24 452622
VecScale           48000 1.0 6.3976e-01 2.4 1.08e+08 2.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 149856
VecCopy             1000 1.0 8.1365e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet            168000 1.0 2.6112e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   2  0  0  0  0     0
VecAXPY            16000 1.0 1.0987e+00 1.4 8.64e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   1  2  0  0  0 786401
VecAYPX            56000 1.0 2.0142e+00 1.7 7.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0 366271
VecScatterBarrie  201000 1.0 5.5632e+01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12  0  0  0  0  20  0  0  0  0     0
VecScatterBegin   201000 1.0 9.0613e+00 2.7 0.00e+00 0.0 1.7e+09 1.4e+03 0.0e+00  3  0100 99  0   5  0100100  0     0
VecScatterEnd     201000 1.0 1.7975e+01 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0   6  0  0  0  0     0
MatMult            57000 1.0 4.3542e+01 1.4 1.19e+10 1.1 5.6e+08 2.0e+03 0.0e+00 13 28 34 49  0  23 28 34 49  0 267140
MatMultAdd         48000 1.0 2.1388e+01 1.4 2.75e+09 1.3 3.0e+08 6.6e+02 0.0e+00  6  6 18  9  0  11  6 18  9  0 122969
MatMultTranspose   48000 1.0 2.3754e+01 1.3 2.75e+09 1.3 3.0e+08 6.6e+02 0.0e+00  8  6 18  9  0  13  6 18  9  0 110722
MatSolve            8000 0.0 8.0224e-02 0.0 1.14e+07 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   143
MatSOR             96000 1.0 6.6546e+01 1.5 2.17e+10 1.2 5.2e+08 1.5e+03 1.6e+04 24 51 31 33 38  42 51 31 34 38 317303
MatResidual        48000 1.0 3.8039e+01 1.4 9.11e+09 1.2 5.2e+08 1.5e+03 0.0e+00 11 21 31 33  0  19 21 31 34  0 231744
PCSetUpOnBlocks     8000 1.0 1.6057e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCApply             8000 1.0 1.3489e+02 1.0 3.63e+10 1.2 1.6e+09 1.2e+03 1.6e+04 52 86 97 84 38  89 86 97 85 38 260888
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              9        11424     0.
     DMKSP interface     1              1          656     0.
              Vector     4             52      2382272     0.
              Matrix     0             65     14796304     0.
    Distributed Mesh     1              1         5248     0.
           Index Set     2             18       171648     0.
   IS L to G Mapping     1              1       131728     0.
   Star Forest Graph     2              2         1728     0.
     Discrete System     1              1          932     0.
         Vec Scatter     1             14       233696     0.
      Preconditioner     1              9         9676     0.
              Viewer     1              0            0     0.

--- Event Stage 1: First Solve

       Krylov Solver     8              0            0     0.
              Vector   176            128      3546680     0.
              Matrix   148             83     22941288     0.
      Matrix Coarsen     6              6         3816     0.
           Index Set   128            112       590732     0.
   Star Forest Graph    12             12        10368     0.
         Vec Scatter    34             21        26544     0.
      Preconditioner     8              0            0     0.

--- Event Stage 2: Remaining Solves

              Vector 48000          48000   2616576000     0.
========================================================================================================================
Average time to get PetscTime(): 5.00679e-07
Average time for MPI_Barrier(): 1.30177e-05
Average time for zero size MPI_Send(): 7.15208e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_sync
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3" --CXXOPTFLAGS="-g -O3" --FOPTFLAGS="-g -O3" --with-openmp=1 --download-sowing --download-ptscotch=1 --download-fblaslapack=1 --download-scalapack=1 --download-strumpack=1 --download-superlu_dist=1 --download-metis=1 --download-parmetis=1 --download-mumps=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-04 18:36:31 on beboplogin1 
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------

Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -fopenmp  
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3  -fopenmp   
-----------------------------------------

Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lstrumpack -lscalapack -lsuperlu_dist -lflapack -lfblas -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lrt -lm -lpthread -lz -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MatSOR_SeqAIJ.png
Type: image/png
Size: 253560 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180606/fda3c121/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PAPI_TOT_CYC.png
Type: image/png
Size: 81097 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180606/fda3c121/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PAPI_DP_OPS.png
Type: image/png
Size: 80924 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180606/fda3c121/attachment-0007.png>


More information about the petsc-dev mailing list