[petsc-dev] [petsc-users] Poor weak scaling when solving successive linearsystems
Junchao Zhang
jczhang at mcs.anl.gov
Wed Jun 6 15:13:52 CDT 2018
Hi, PETSc developers,
I tested Michael Becker's code. The code calls the same KSPSolve 1000
times in the second stage and needs cubic number of processors to run. I
ran with 125 ranks and 1000 ranks, with or without -log_sync option. I
attach the log view output files and a scaling loss excel file.
I profiled the code with 125 processors. It looks {MatSOR, MatMult,
MatMultAdd, MatMultTranspose, MatMultTransposeAdd}_SeqAIJ in aij.c took
~50% of the time, The other half time was spent on waiting in MPI.
MatSOR_SeqAIJ
took 30%, mostly in PetscSparseDenseMinusDot().
I tested it on a 36 cores/node machine. I found 32 ranks/node gave better
performance (about 10%) than 36 ranks/node in the 125 ranks testing. I
guess it is because processors in the former had more balanced memory
bandwidth. I collected PAPI_DP_OPS (double precision operations) and
PAPI_TOT_CYC (total cycles) of the 125 ranks case (see the attached files).
It looks ranks at the two ends have less DP_OPS and TOT_CYC.
Does anyone familiar with the algorithm have quick explanations?
--Junchao Zhang
On Mon, Jun 4, 2018 at 11:59 AM, Michael Becker <
Michael.Becker at physik.uni-giessen.de> wrote:
> Hello again,
> this took me longer than I anticipated, but here we go.
> I did reruns of the cases where only half the processes per node were used
> (without -log_sync):
>
> 125 procs,1st 125 procs,2nd 1000
> procs,1st 1000 procs,2nd
> Max Ratio Max Ratio Max
> Ratio Max Ratio
> KSPSolve 1.203E+02 1.0 1.210E+02 1.0
> 1.399E+02 1.1 1.365E+02 1.0
> VecTDot 6.376E+00 3.7 6.551E+00 4.0
> 7.885E+00 2.9 7.175E+00 3.4
> VecNorm 4.579E+00 7.1 5.803E+00 10.2
> 8.534E+00 6.9 6.026E+00 4.9
> VecScale 1.070E-01 2.1 1.129E-01 2.2
> 1.301E-01 2.5 1.270E-01 2.4
> VecCopy 1.123E-01 1.3 1.149E-01 1.3
> 1.301E-01 1.6 1.359E-01 1.6
> VecSet 7.063E-01 1.7 6.968E-01 1.7
> 7.432E-01 1.8 7.425E-01 1.8
> VecAXPY 1.166E+00 1.4 1.167E+00 1.4
> 1.221E+00 1.5 1.279E+00 1.6
> VecAYPX 1.317E+00 1.6 1.290E+00 1.6
> 1.536E+00 1.9 1.499E+00 2.0
> VecScatterBegin 6.142E+00 3.2 5.974E+00 2.8
> 6.448E+00 3.0 6.472E+00 2.9
> VecScatterEnd 3.606E+01 4.2 3.551E+01 4.0
> 5.244E+01 2.7 4.995E+01 2.7
> MatMult 3.561E+01 1.6 3.403E+01 1.5
> 3.435E+01 1.4 3.332E+01 1.4
> MatMultAdd 1.124E+01 2.0 1.130E+01 2.1
> 2.093E+01 2.9 1.995E+01 2.7
> MatMultTranspose 1.372E+01 2.5 1.388E+01 2.6
> 1.477E+01 2.2 1.381E+01 2.1
> MatSolve 1.949E-02 0.0 1.653E-02 0.0
> 4.789E-02 0.0 4.466E-02 0.0
> MatSOR 6.610E+01 1.3 6.673E+01 1.3
> 7.111E+01 1.3 7.105E+01 1.3
> MatResidual 2.647E+01 1.7 2.667E+01 1.7
> 2.446E+01 1.4 2.467E+01 1.5
> PCSetUpOnBlocks 5.266E-03 1.4 5.295E-03 1.4
> 5.427E-03 1.5 5.289E-03 1.4
> PCApply 1.031E+02 1.0 1.035E+02 1.0
> 1.180E+02 1.0 1.164E+02 1.0
>
> I also slimmed down my code and basically wrote a simple weak scaling test
> (source files attached) so you can profile it yourself. I appreciate the
> offer Junchao, thank you.
> You can adjust the system size per processor at runtime via
> "-nodes_per_proc 30" and the number of repeated calls to the function
> containing KSPsolve() via "-iterations 1000". The physical problem is
> simply calculating the electric potential from a homogeneous charge
> distribution, done multiple times to accumulate time in KSPsolve().
> A job would be started using something like
>
> mpirun -n 125 ~/petsc_ws/ws_test -nodes_per_proc 30 -mesh_size 1E-4
> -iterations 1000 \\
> -ksp_rtol 1E-6 \
> -log_view -log_sync\
> -pc_type gamg -pc_gamg_type classical\
> -ksp_type cg \
> -ksp_norm_type unpreconditioned \
> -mg_levels_ksp_type richardson \
> -mg_levels_ksp_norm_type none \
> -mg_levels_pc_type sor \
> -mg_levels_ksp_max_it 1 \
> -mg_levels_pc_sor_its 1 \
> -mg_levels_esteig_ksp_type cg \
> -mg_levels_esteig_ksp_max_it 10 \
> -gamg_est_ksp_type cg
>
> , ideally started on a cube number of processes for a cubical process grid.
> Using 125 processes and 10.000 iterations I get the output in
> "log_view_125_new.txt", which shows the same imbalance for me.
>
> Michael
>
>
> Am 02.06.2018 um 13:40 schrieb Mark Adams:
>
>
>
> On Fri, Jun 1, 2018 at 11:20 PM, Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
>
>> Hi,Michael,
>> You can add -log_sync besides -log_view, which adds barriers to certain
>> events but measures barrier time separately from the events. I find this
>> option makes it easier to interpret log_view output.
>>
>
> That is great (good to know).
>
> This should give us a better idea if your large VecScatter costs are from
> slow communication or if it catching some sort of load imbalance.
>
>
>>
>> --Junchao Zhang
>>
>> On Wed, May 30, 2018 at 3:27 AM, Michael Becker <
>> Michael.Becker at physik.uni-giessen.de> wrote:
>>
>>> Barry: On its way. Could take a couple days again.
>>>
>>> Junchao: I unfortunately don't have access to a cluster with a faster
>>> network. This one has a mixed 4X QDR-FDR InfiniBand 2:1 blocking fat-tree
>>> network, which I realize causes parallel slowdown if the nodes are not
>>> connected to the same switch. Each node has 24 processors (2x12/socket) and
>>> four NUMA domains (two for each socket).
>>> The ranks are usually not distributed perfectly even, i.e. for 125
>>> processes, of the six required nodes, five would use 21 cores and one 20.
>>> Would using another CPU type make a difference communication-wise? I
>>> could switch to faster ones (on the same network), but I always assumed
>>> this would only improve performance of the stuff that is unrelated to
>>> communication.
>>>
>>> Michael
>>>
>>>
>>>
>>> The log files have something like "Average time for zero size
>>> MPI_Send(): 1.84231e-05". It looks you ran on a cluster with a very slow
>>> network. A typical machine should give less than 1/10 of the latency you
>>> have. An easy way to try is just running the code on a machine with a
>>> faster network and see what happens.
>>>
>>> Also, how many cores & numa domains does a compute node have? I could
>>> not figure out how you distributed the 125 MPI ranks evenly.
>>>
>>> --Junchao Zhang
>>>
>>> On Tue, May 29, 2018 at 6:18 AM, Michael Becker <
>>> Michael.Becker at physik.uni-giessen.de> wrote:
>>>
>>>> Hello again,
>>>>
>>>> here are the updated log_view files for 125 and 1000 processors. I ran
>>>> both problems twice, the first time with all processors per node allocated
>>>> ("-1.txt"), the second with only half on twice the number of nodes
>>>> ("-2.txt").
>>>>
>>>> On May 24, 2018, at 12:24 AM, Michael Becker <Michael.Becker at physik.uni-giessen.de> <Michael.Becker at physik.uni-giessen.de> wrote:
>>>>
>>>> I noticed that for every individual KSP iteration, six vector objects are created and destroyed (with CG, more with e.g. GMRES).
>>>>
>>>> Hmm, it is certainly not intended at vectors be created and destroyed within each KSPSolve() could you please point us to the code that makes you think they are being created and destroyed? We create all the work vectors at KSPSetUp() and destroy them in KSPReset() not during the solve. Not that this would be a measurable distance.
>>>>
>>>>
>>>> I mean this, right in the log_view output:
>>>>
>>>> Memory usage is given in bytes:
>>>>
>>>> Object Type Creations Destructions Memory Descendants' Mem.
>>>> Reports information only for process 0.
>>>>
>>>> --- Event Stage 0: Main Stage
>>>>
>>>> ...
>>>>
>>>> --- Event Stage 1: First Solve
>>>>
>>>> ...
>>>>
>>>> --- Event Stage 2: Remaining Solves
>>>>
>>>> Vector 23904 23904 1295501184 0.
>>>>
>>>> I logged the exact number of KSP iterations over the 999 timesteps and
>>>> its exactly 23904/6 = 3984.
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> Am 24.05.2018 um 19:50 schrieb Smith, Barry F.:
>>>>
>>>> Please send the log file for 1000 with cg as the solver.
>>>>
>>>> You should make a bar chart of each event for the two cases to see which ones are taking more time and which are taking less (we cannot tell with the two logs you sent us since they are for different solvers.)
>>>>
>>>>
>>>>
>>>>
>>>> On May 24, 2018, at 12:24 AM, Michael Becker <Michael.Becker at physik.uni-giessen.de> <Michael.Becker at physik.uni-giessen.de> wrote:
>>>>
>>>> I noticed that for every individual KSP iteration, six vector objects are created and destroyed (with CG, more with e.g. GMRES).
>>>>
>>>> Hmm, it is certainly not intended at vectors be created and destroyed within each KSPSolve() could you please point us to the code that makes you think they are being created and destroyed? We create all the work vectors at KSPSetUp() and destroy them in KSPReset() not during the solve. Not that this would be a measurable distance.
>>>>
>>>>
>>>>
>>>>
>>>> This seems kind of wasteful, is this supposed to be like this? Is this even the reason for my problems? Apart from that, everything seems quite normal to me (but I'm not the expert here).
>>>>
>>>>
>>>> Thanks in advance.
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> <log_view_125procs.txt><log_view_1000procs.txt>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180606/fda3c121/attachment-0001.html>
-------------- next part --------------
using 125 of 125 processes
30^3 unknowns per processor
total system size: 150^3
mesh size: 0.0001
initsolve: 7 iterations
solve 1: 7 iterations
solve 2: 7 iterations
solve 3: 7 iterations
solve 4: 7 iterations
solve 5: 7 iterations
solve 6: 7 iterations
solve 7: 7 iterations
solve 8: 7 iterations
solve 9: 7 iterations
solve 10: 7 iterations
solve 20: 7 iterations
solve 30: 7 iterations
solve 40: 7 iterations
solve 50: 7 iterations
solve 60: 7 iterations
solve 70: 7 iterations
solve 80: 7 iterations
solve 90: 7 iterations
solve 100: 7 iterations
solve 200: 7 iterations
solve 300: 7 iterations
solve 400: 7 iterations
solve 500: 7 iterations
solve 600: 7 iterations
solve 700: 7 iterations
solve 800: 7 iterations
solve 900: 7 iterations
solve 1000: 7 iterations
Time in solve(): 97.977 s
Time in KSPSolve(): 97.7361 s (99.7541%)
Number of KSP iterations (total): 7000
Number of solve iterations (total): 1000 (ratio: 7.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0373 with 125 processors, by jczhang Mon Jun 4 23:24:13 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 1.987e+02 1.00000 1.987e+02
Objects: 4.249e+04 1.00002 4.249e+04
Flop: 3.698e+10 1.15842 3.501e+10 4.377e+12
Flop/sec: 1.862e+08 1.15841 1.763e+08 2.203e+10
MPI Messages: 1.816e+06 3.38531 1.236e+06 1.545e+08
MPI Message Lengths: 2.275e+09 2.20338 1.423e+03 2.198e+11
MPI Reductions: 3.759e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 6.1684e-02 0.0% 0.0000e+00 0.0% 1.200e+03 0.0% 1.802e+03 0.0% 1.700e+01 0.0%
1: First Solve: 1.0060e+02 50.6% 5.6491e+09 0.1% 4.212e+05 0.3% 3.421e+03 0.7% 5.660e+02 1.5%
2: Remaining Solves: 9.7993e+01 49.3% 4.3710e+12 99.9% 1.541e+08 99.7% 1.417e+03 99.3% 3.700e+04 98.4%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 7.2002e-05 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 12 1.0 7.2002e-03 2.1 0.00e+00 0.0 8.8e+03 4.0e+00 0.0e+00 0 0 0 0 0 0 0 2 0 0 0
BuildTwoSidedF 30 1.0 2.9978e+0114.5 0.00e+00 0.0 7.1e+03 1.0e+04 0.0e+00 4 0 0 0 0 7 0 2 5 0 0
KSPSetUp 9 1.0 4.5559e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 1 1.0 1.0060e+02 1.0 4.82e+07 1.2 4.2e+05 3.4e+03 5.7e+02 51 0 0 1 2 100100100100100 56
VecTDot 14 1.0 2.5394e-02 2.2 7.56e+05 1.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 0 0 2 0 0 2 3721
VecNorm 9 1.0 6.1355e-03 9.9 4.86e+05 1.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 0 0 1 0 0 2 9901
VecScale 42 1.0 4.3249e-04 3.8 9.47e+04 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21120
VecCopy 1 1.0 1.5402e-04 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 178 1.0 2.1267e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 14 1.0 5.2488e-0311.6 7.56e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 18004
VecAYPX 49 1.0 1.5860e-03 2.7 6.46e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 50301
VecAssemblyBegin 2 1.0 2.8849e-05 7.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 2 1.0 2.6941e-05 9.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 178 1.0 6.5460e-03 4.1 0.00e+00 0.0 1.5e+05 1.4e+03 0.0e+00 0 0 0 0 0 0 0 37 15 0 0
VecScatterEnd 178 1.0 6.8015e-02 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 50 1.0 3.9924e-02 2.2 1.05e+07 1.1 5.1e+04 2.1e+03 0.0e+00 0 0 0 0 0 0 22 12 7 0 31204
MatMultAdd 42 1.0 3.1942e-02 5.7 2.40e+06 1.3 2.8e+04 6.7e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 8625
MatMultTranspose 42 1.0 1.7802e-02 2.1 2.40e+06 1.3 2.8e+04 6.7e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 15476
MatSolve 7 0.0 9.3460e-05 0.0 8.40e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 9
MatSOR 84 1.0 8.0777e-02 2.0 1.90e+07 1.2 4.7e+04 1.6e+03 1.4e+01 0 0 0 0 0 0 40 11 5 2 27852
MatLUFactorSym 1 1.0 2.8300e-0418.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 7.9155e-0519.5 3.14e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4
MatResidual 42 1.0 3.4999e-02 2.5 7.97e+06 1.2 4.7e+04 1.6e+03 0.0e+00 0 0 0 0 0 0 17 11 5 0 26653
MatAssemblyBegin 94 1.0 2.9981e+0114.4 0.00e+00 0.0 7.1e+03 1.0e+04 0.0e+00 4 0 0 0 0 7 0 2 5 0 0
MatAssemblyEnd 94 1.0 1.1404e-01 1.1 0.00e+00 0.0 6.3e+04 2.1e+02 2.3e+02 0 0 0 0 1 0 0 15 1 41 0
MatGetRow 3100250 1.2 4.7874e+01 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21 0 0 0 0 42 0 0 0 0 0
MatGetRowIJ 1 0.0 1.3828e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 6 1.0 1.5563e-01 2.0 0.00e+00 0.0 5.5e+04 1.8e+04 1.2e+01 0 0 0 0 0 0 0 13 67 2 0
MatCreateSubMat 4 1.0 8.4031e-02 1.1 0.00e+00 0.0 2.8e+03 2.8e+02 6.4e+01 0 0 0 0 0 0 0 1 0 11 0
MatGetOrdering 1 0.0 1.0204e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 6 1.0 3.1606e-02 1.2 0.00e+00 0.0 2.7e+04 1.0e+03 1.2e+01 0 0 0 0 0 0 0 6 2 2 0
MatCoarsen 6 1.0 8.2941e-03 1.0 0.00e+00 0.0 5.4e+04 6.0e+02 3.4e+01 0 0 0 0 0 0 0 13 2 6 0
MatZeroEntries 6 1.0 1.7359e-03 4.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 6 1.0 2.1538e-01 1.0 1.13e+07 1.3 6.4e+04 2.7e+03 9.2e+01 0 0 0 0 0 0 23 15 12 16 5910
MatPtAPSymbolic 6 1.0 1.4594e-01 1.0 0.00e+00 0.0 3.4e+04 2.7e+03 4.2e+01 0 0 0 0 0 0 0 8 6 7 0
MatPtAPNumeric 6 1.0 6.9003e-02 1.0 1.13e+07 1.3 2.9e+04 2.6e+03 4.8e+01 0 0 0 0 0 0 23 7 5 8 18448
MatGetLocalMat 6 1.0 2.7068e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 6 1.0 5.0828e-03 1.5 0.00e+00 0.0 2.0e+04 3.6e+03 0.0e+00 0 0 0 0 0 0 0 5 5 0 0
SFSetGraph 12 1.0 1.0014e-04 5.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 12 1.0 9.7704e-03 1.2 0.00e+00 0.0 2.6e+04 6.3e+02 0.0e+00 0 0 0 0 0 0 0 6 1 0 0
SFBcastBegin 46 1.0 1.5848e-03 2.7 0.00e+00 0.0 5.5e+04 7.0e+02 0.0e+00 0 0 0 0 0 0 0 13 3 0 0
SFBcastEnd 46 1.0 3.4416e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
GAMG: createProl 6 1.0 1.0010e+02 1.0 0.00e+00 0.0 2.0e+05 5.3e+03 2.9e+02 50 0 0 0 1 100 0 47 73 51 0
GAMG: partLevel 6 1.0 3.0237e-01 1.0 1.13e+07 1.3 6.6e+04 2.6e+03 1.9e+02 0 0 0 0 1 0 23 16 12 34 4210
repartition 2 1.0 6.8307e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Invert-Sort 2 1.0 7.5531e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 1 0
Move A 2 1.0 7.2515e-02 1.1 0.00e+00 0.0 1.4e+03 5.4e+02 3.4e+01 0 0 0 0 0 0 0 0 0 6 0
Move P 2 1.0 1.6035e-02 1.3 0.00e+00 0.0 1.4e+03 1.3e+01 3.4e+01 0 0 0 0 0 0 0 0 0 6 0
PCSetUp 2 1.0 1.0041e+02 1.0 1.13e+07 1.3 2.7e+05 4.6e+03 5.1e+02 51 0 0 1 1 100 23 63 85 91 13
PCSetUpOnBlocks 7 1.0 5.0783e-04 3.8 3.14e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1
PCApply 7 1.0 1.2734e-01 1.1 3.18e+07 1.2 1.5e+05 1.2e+03 1.4e+01 0 0 0 0 0 0 66 35 13 2 29321
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 9.7831e+01 1.0 3.69e+10 1.2 1.5e+08 1.4e+03 3.7e+04 49100100 99 98 100100100100100 44679
VecTDot 14000 1.0 8.3188e+00 5.1 7.56e+08 1.0 0.0e+00 0.0e+00 1.4e+04 2 2 0 0 37 3 2 0 0 38 11360
VecNorm 9000 1.0 1.4898e+00 2.0 4.86e+08 1.0 0.0e+00 0.0e+00 9.0e+03 0 1 0 0 24 1 1 0 0 24 40778
VecScale 42000 1.0 3.7866e-01 3.2 9.47e+07 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 24123
VecCopy 1000 1.0 8.0102e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 147000 1.0 2.0034e+00 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
VecAXPY 14000 1.0 9.5575e-01 2.1 7.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 1 2 0 0 0 98875
VecAYPX 49000 1.0 1.6010e+00 3.0 6.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 49830
VecScatterBegin 176000 1.0 5.7110e+00 4.0 0.00e+00 0.0 1.5e+08 1.4e+03 0.0e+00 2 0100 99 0 4 0100100 0 0
VecScatterEnd 176000 1.0 4.8250e+01 6.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 17 0 0 0 0 0
MatMult 50000 1.0 3.2538e+01 1.9 1.05e+10 1.1 5.1e+07 2.1e+03 0.0e+00 11 28 33 49 0 22 29 33 49 0 38287
MatMultAdd 42000 1.0 1.9521e+01 3.4 2.40e+09 1.3 2.8e+07 6.7e+02 0.0e+00 5 6 18 9 0 9 6 18 9 0 14113
MatMultTranspose 42000 1.0 1.5577e+01 2.1 2.40e+09 1.3 2.8e+07 6.7e+02 0.0e+00 5 6 18 9 0 10 6 18 9 0 17687
MatSolve 7000 0.0 1.0978e-01 0.0 8.40e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 8
MatSOR 84000 1.0 5.0067e+01 2.1 1.90e+10 1.2 4.7e+07 1.6e+03 1.4e+04 23 51 30 33 37 46 51 30 33 38 44834
MatResidual 42000 1.0 2.8185e+01 2.1 7.97e+09 1.2 4.7e+07 1.6e+03 0.0e+00 9 21 30 33 0 18 21 30 33 0 33097
PCSetUpOnBlocks 7000 1.0 1.3226e-01 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 7000 1.0 8.8024e+01 1.1 3.18e+10 1.2 1.5e+08 1.2e+03 1.4e+04 44 85 97 84 37 89 85 97 84 38 42358
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 9 11424 0.
DMKSP interface 1 1 656 0.
Vector 4 52 2371888 0.
Matrix 0 72 14160468 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 12 133928 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 14 233696 0.
Preconditioner 1 9 9676 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 8 0 0 0.
Vector 158 110 3181312 0.
Matrix 140 68 21757144 0.
Matrix Coarsen 6 6 3816 0.
Index Set 110 100 543716 0.
Star Forest Graph 12 12 10368 0.
Vec Scatter 31 18 22752 0.
Preconditioner 8 0 0 0.
--- Event Stage 2: Remaining Solves
Vector 42000 42000 2276680000 0.
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 0.00109181
Average time for zero size MPI_Send(): 6.45638e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3" --CXXOPTFLAGS="-g -O3" --FOPTFLAGS="-g -O3" --with-openmp=1 --download-sowing --download-ptscotch=1 --download-fblaslapack=1 --download-scalapack=1 --download-strumpack=1 --download-superlu_dist=1 --download-metis=1 --download-parmetis=1 --download-mumps=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 03:36:36 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lstrumpack -lscalapack -lsuperlu_dist -lflapack -lfblas -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lrt -lm -lpthread -lz -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Scaling-loss.png
Type: image/png
Size: 96894 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180606/fda3c121/attachment-0004.png>
-------------- next part --------------
using 1000 of 1000 processes
30^3 unknowns per processor
total system size: 300^3
mesh size: 0.0001
initsolve: 8 iterations
solve 1: 8 iterations
solve 2: 8 iterations
solve 3: 8 iterations
solve 4: 8 iterations
solve 5: 8 iterations
solve 6: 8 iterations
solve 7: 8 iterations
solve 8: 8 iterations
solve 9: 8 iterations
solve 10: 8 iterations
solve 20: 8 iterations
solve 30: 8 iterations
solve 40: 8 iterations
solve 50: 8 iterations
solve 60: 8 iterations
solve 70: 8 iterations
solve 80: 8 iterations
solve 90: 8 iterations
solve 100: 8 iterations
solve 200: 8 iterations
solve 300: 8 iterations
solve 400: 8 iterations
solve 500: 8 iterations
solve 600: 8 iterations
solve 700: 8 iterations
solve 800: 8 iterations
solve 900: 8 iterations
solve 1000: 8 iterations
Time in solve(): 127 s
Time in KSPSolve(): 126.753 s (99.8054%)
Number of KSP iterations (total): 8000
Number of solve iterations (total): 1000 (ratio: 8.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0385 with 1000 processors, by jczhang Mon Jun 4 15:33:32 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 2.339e+02 1.00002 2.339e+02
Objects: 4.854e+04 1.00002 4.854e+04
Flop: 4.220e+10 1.15865 4.106e+10 4.106e+13
Flop/sec: 1.805e+08 1.15865 1.756e+08 1.756e+11
MPI Messages: 2.436e+06 3.97680 1.683e+06 1.683e+09
MPI Message Lengths: 2.592e+09 2.20360 1.364e+03 2.296e+12
MPI Reductions: 4.266e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.3538e-01 0.1% 0.0000e+00 0.0% 1.080e+04 0.0% 1.802e+03 0.0% 1.700e+01 0.0%
1: First Solve: 1.0660e+02 45.6% 5.1626e+10 0.1% 4.348e+06 0.3% 3.241e+03 0.6% 6.340e+02 1.5%
2: Remaining Solves: 1.2702e+02 54.3% 4.1013e+13 99.9% 1.679e+09 99.7% 1.359e+03 99.4% 4.200e+04 98.5%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 5.6801e-03212.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 12 1.0 1.6171e-02 5.3 0.00e+00 0.0 8.9e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 2 0 0 0
BuildTwoSidedF 30 1.0 1.9437e+01 7.1 0.00e+00 0.0 6.5e+04 1.1e+04 0.0e+00 2 0 0 0 0 5 0 2 5 0 0
KSPSetUp 9 1.0 6.0689e-03 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 1 1.0 1.0660e+02 1.0 5.33e+07 1.2 4.3e+06 3.2e+03 6.3e+02 46 0 0 1 1 100100100100100 484
VecTDot 16 1.0 1.4536e-02 2.0 8.64e+05 1.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 0 0 2 0 0 3 59437
VecNorm 10 1.0 2.3881e-02 1.9 5.40e+05 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 1 0 0 2 22612
VecScale 48 1.0 4.1533e-0331.5 1.08e+05 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 23084
VecCopy 1 1.0 4.5199e-03119.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 208 1.0 2.6751e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 16 1.0 8.8606e-03 8.4 8.64e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 97510
VecAYPX 56 1.0 3.1478e-03 2.8 7.42e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 234365
VecAssemblyBegin 3 1.0 9.2983e-05 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 3 1.0 8.8930e-0512.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 208 1.0 1.5633e-02 5.7 0.00e+00 0.0 1.7e+06 1.4e+03 0.0e+00 0 0 0 0 0 0 0 39 16 0 0
VecScatterEnd 208 1.0 5.7859e-02 4.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 57 1.0 4.2757e-02 1.9 1.19e+07 1.1 5.6e+05 2.0e+03 0.0e+00 0 0 0 0 0 0 23 13 8 0 272044
MatMultAdd 48 1.0 2.7018e-02 3.1 2.75e+06 1.3 3.0e+05 6.6e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 97346
MatMultTranspose 48 1.0 2.4651e-02 2.7 2.75e+06 1.3 3.0e+05 6.6e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 106693
MatSolve 8 0.0 4.3631e-05 0.0 1.14e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 262
MatSOR 96 1.0 6.6015e-02 1.6 2.18e+07 1.2 5.2e+05 1.5e+03 1.6e+01 0 0 0 0 0 0 41 12 5 3 320486
MatLUFactorSym 1 1.0 1.0937e-02917.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 8.9009e-032333.3 1.28e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1
MatResidual 48 1.0 3.4651e-02 2.0 9.11e+06 1.2 5.2e+05 1.5e+03 0.0e+00 0 0 0 0 0 0 17 12 5 0 254403
MatAssemblyBegin 102 1.0 1.9440e+01 7.1 0.00e+00 0.0 6.5e+04 1.1e+04 0.0e+00 2 0 0 0 0 5 0 2 5 0 0
MatAssemblyEnd 102 1.0 8.1546e-02 1.3 0.00e+00 0.0 6.3e+05 2.0e+02 2.5e+02 0 0 0 0 1 0 0 14 1 39 0
MatGetRow 3100266 1.2 4.9949e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 20 0 0 0 0 44 0 0 0 0 0
MatGetRowIJ 1 0.0 2.5988e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 6 1.0 1.6449e-01 2.3 0.00e+00 0.0 5.7e+05 1.6e+04 1.2e+01 0 0 0 0 0 0 0 13 66 2 0
MatCreateSubMat 6 1.0 4.2039e-02 1.1 0.00e+00 0.0 2.2e+04 3.3e+02 9.4e+01 0 0 0 0 0 0 0 1 0 15 0
MatGetOrdering 1 0.0 9.8944e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 6 1.0 5.4760e-02 1.2 0.00e+00 0.0 2.6e+05 9.9e+02 1.2e+01 0 0 0 0 0 0 0 6 2 2 0
MatCoarsen 6 1.0 3.3504e-02 1.2 0.00e+00 0.0 5.4e+05 5.6e+02 4.8e+01 0 0 0 0 0 0 0 12 2 8 0
MatZeroEntries 6 1.0 1.6251e-03 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 6 1.0 1.9174e-01 1.1 1.11e+07 1.3 6.3e+05 2.5e+03 9.2e+01 0 0 0 0 0 0 20 15 11 15 55134
MatPtAPSymbolic 6 1.0 1.0971e-01 1.0 0.00e+00 0.0 3.2e+05 2.7e+03 4.2e+01 0 0 0 0 0 0 0 7 6 7 0
MatPtAPNumeric 6 1.0 7.3850e-02 1.0 1.11e+07 1.3 3.1e+05 2.3e+03 4.8e+01 0 0 0 0 0 0 20 7 5 8 143149
MatGetLocalMat 6 1.0 3.0208e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 6 1.0 1.0909e-02 4.0 0.00e+00 0.0 1.9e+05 3.5e+03 0.0e+00 0 0 0 0 0 0 0 4 5 0 0
SFSetGraph 12 1.0 1.4520e-04 5.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 12 1.0 2.5385e-02 1.9 0.00e+00 0.0 2.7e+05 5.8e+02 0.0e+00 0 0 0 0 0 0 0 6 1 0 0
SFBcastBegin 60 1.0 5.0180e-03 6.0 0.00e+00 0.0 5.6e+05 6.5e+02 0.0e+00 0 0 0 0 0 0 0 13 3 0 0
SFBcastEnd 60 1.0 9.5491e-03 6.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
GAMG: createProl 6 1.0 1.0600e+02 1.0 0.00e+00 0.0 2.0e+06 5.1e+03 3.0e+02 45 0 0 0 1 99 0 46 72 47 0
GAMG: partLevel 6 1.0 2.8349e-01 1.0 1.11e+07 1.3 6.5e+05 2.4e+03 2.4e+02 0 0 0 0 1 0 20 15 11 38 37292
repartition 3 1.0 1.2683e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
Invert-Sort 3 1.0 3.0384e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Move A 3 1.0 3.9142e-02 1.3 0.00e+00 0.0 9.5e+03 7.4e+02 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
Move P 3 1.0 1.2856e-02 1.6 0.00e+00 0.0 1.2e+04 1.3e+01 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
PCSetUp 2 1.0 1.0634e+02 1.0 1.11e+07 1.3 2.7e+06 4.4e+03 5.8e+02 45 0 0 1 1 100 20 61 84 91 99
PCSetUpOnBlocks 8 1.0 1.2212e-0291.8 1.28e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1
PCApply 8 1.0 1.2359e-01 1.1 3.64e+07 1.2 1.6e+06 1.2e+03 1.6e+01 0 0 0 0 0 0 68 37 14 3 285064
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 1.2680e+02 1.0 4.21e+10 1.2 1.7e+09 1.4e+03 4.2e+04 54100100 99 98 100100100100100 323446
VecTDot 16000 1.0 9.8928e+00 2.2 8.64e+08 1.0 0.0e+00 0.0e+00 1.6e+04 2 2 0 0 38 4 2 0 0 38 87335
VecNorm 10000 1.0 2.1671e+00 1.3 5.40e+08 1.0 0.0e+00 0.0e+00 1.0e+04 1 1 0 0 23 1 1 0 0 24 249178
VecScale 48000 1.0 5.0265e-01 4.3 1.08e+08 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 190734
VecCopy 1000 1.0 8.4914e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 168000 1.0 2.3076e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
VecAXPY 16000 1.0 1.1082e+00 1.4 8.64e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 1 2 0 0 0 779639
VecAYPX 56000 1.0 1.9399e+00 1.8 7.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 380298
VecScatterBegin 201000 1.0 6.9948e+00 2.9 0.00e+00 0.0 1.7e+09 1.4e+03 0.0e+00 2 0100 99 0 5 0100100 0 0
VecScatterEnd 201000 1.0 4.8907e+01 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 17 0 0 0 0 0
MatMult 57000 1.0 3.3672e+01 1.5 1.19e+10 1.1 5.6e+08 2.0e+03 0.0e+00 10 28 34 49 0 19 28 34 49 0 345446
MatMultAdd 48000 1.0 2.2597e+01 2.4 2.75e+09 1.3 3.0e+08 6.6e+02 0.0e+00 6 6 18 9 0 11 6 18 9 0 116391
MatMultTranspose 48000 1.0 1.8192e+01 1.9 2.75e+09 1.3 3.0e+08 6.6e+02 0.0e+00 5 6 18 9 0 9 6 18 9 0 144569
MatSolve 8000 0.0 2.8324e-02 0.0 1.14e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 404
MatSOR 96000 1.0 6.2678e+01 1.5 2.17e+10 1.2 5.2e+08 1.5e+03 1.6e+04 25 51 31 33 38 47 51 31 34 38 336886
MatResidual 48000 1.0 2.9771e+01 1.7 9.11e+09 1.2 5.2e+08 1.5e+03 0.0e+00 9 21 31 33 0 16 21 31 34 0 296101
PCSetUpOnBlocks 8000 1.0 1.5038e-0120.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 8000 1.0 1.1220e+02 1.0 3.63e+10 1.2 1.6e+09 1.2e+03 1.6e+04 48 86 97 84 38 88 86 97 85 38 313653
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 9 11424 0.
DMKSP interface 1 1 656 0.
Vector 4 52 2382272 0.
Matrix 0 65 14796304 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 18 171648 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 14 233696 0.
Preconditioner 1 9 9676 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 8 0 0 0.
Vector 176 128 3546680 0.
Matrix 148 83 22941288 0.
Matrix Coarsen 6 6 3816 0.
Index Set 128 112 590732 0.
Star Forest Graph 12 12 10368 0.
Vec Scatter 34 21 26544 0.
Preconditioner 8 0 0 0.
--- Event Stage 2: Remaining Solves
Vector 48000 48000 2616576000 0.
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 1.52111e-05
Average time for zero size MPI_Send(): 7.22694e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3" --CXXOPTFLAGS="-g -O3" --FOPTFLAGS="-g -O3" --with-openmp=1 --download-sowing --download-ptscotch=1 --download-fblaslapack=1 --download-scalapack=1 --download-strumpack=1 --download-superlu_dist=1 --download-metis=1 --download-parmetis=1 --download-mumps=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-04 18:36:31 on beboplogin1
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lstrumpack -lscalapack -lsuperlu_dist -lflapack -lfblas -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lrt -lm -lpthread -lz -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
using 125 of 125 processes
30^3 unknowns per processor
total system size: 150^3
mesh size: 0.0001
initsolve: 7 iterations
solve 1: 7 iterations
solve 2: 7 iterations
solve 3: 7 iterations
solve 4: 7 iterations
solve 5: 7 iterations
solve 6: 7 iterations
solve 7: 7 iterations
solve 8: 7 iterations
solve 9: 7 iterations
solve 10: 7 iterations
solve 20: 7 iterations
solve 30: 7 iterations
solve 40: 7 iterations
solve 50: 7 iterations
solve 60: 7 iterations
solve 70: 7 iterations
solve 80: 7 iterations
solve 90: 7 iterations
solve 100: 7 iterations
solve 200: 7 iterations
solve 300: 7 iterations
solve 400: 7 iterations
solve 500: 7 iterations
solve 600: 7 iterations
solve 700: 7 iterations
solve 800: 7 iterations
solve 900: 7 iterations
solve 1000: 7 iterations
Time in solve(): 107.17 s
Time in KSPSolve(): 106.928 s (99.7738%)
Number of KSP iterations (total): 7000
Number of solve iterations (total): 1000 (ratio: 7.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0247 with 125 processors, by jczhang Mon Jun 4 14:37:13 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 2.089e+02 1.00002 2.089e+02
Objects: 4.249e+04 1.00002 4.249e+04
Flop: 3.698e+10 1.15842 3.501e+10 4.377e+12
Flop/sec: 1.770e+08 1.15842 1.676e+08 2.095e+10
MPI Messages: 1.816e+06 3.38531 1.236e+06 1.545e+08
MPI Message Lengths: 2.275e+09 2.20338 1.423e+03 2.198e+11
MPI Reductions: 3.759e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.8523e-01 0.1% 0.0000e+00 0.0% 1.200e+03 0.0% 1.802e+03 0.0% 1.700e+01 0.0%
1: First Solve: 1.0152e+02 48.6% 5.6491e+09 0.1% 4.212e+05 0.3% 3.421e+03 0.7% 5.660e+02 1.5%
2: Remaining Solves: 1.0719e+02 51.3% 4.3710e+12 99.9% 1.541e+08 99.7% 1.417e+03 99.3% 3.700e+04 98.4%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 5.0769e-03195.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 12 1.0 8.1227e-03 3.7 0.00e+00 0.0 8.8e+03 4.0e+00 0.0e+00 0 0 0 0 0 0 0 2 0 0 0
BuildTwoSidedF 30 1.0 3.0751e+0114.9 0.00e+00 0.0 7.1e+03 1.0e+04 0.0e+00 4 0 0 0 0 8 0 2 5 0 0
KSPSetUp 9 1.0 5.4052e-03 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 1 1.0 1.0153e+02 1.0 4.82e+07 1.2 4.2e+05 3.4e+03 5.7e+02 49 0 0 1 2 100100100100100 56
VecTDot 14 1.0 1.1638e-02 2.3 7.56e+05 1.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 0 0 2 0 0 2 8120
VecNormBarrier 9 1.0 9.0270e-0333.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNorm 9 1.0 1.3406e-02 1.2 4.86e+05 1.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 0 0 1 0 0 2 4532
VecScale 42 1.0 4.1047e-02154.1 9.47e+04 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 223
VecCopy 1 1.0 4.3828e-03109.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 178 1.0 2.3446e-03 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 14 1.0 5.2881e-0310.4 7.56e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 17870
VecAYPX 49 1.0 1.6847e-03 2.4 6.46e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 47354
VecAssemblyBegin 2 1.0 4.7922e-05 8.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 2 1.0 3.6955e-05 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBarrie 178 1.0 1.0980e-01 5.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 178 1.0 1.5828e-02 4.6 0.00e+00 0.0 1.5e+05 1.4e+03 0.0e+00 0 0 0 0 0 0 0 37 15 0 0
VecScatterEnd 178 1.0 2.3170e-02 7.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 50 1.0 5.3129e-02 1.8 1.05e+07 1.1 5.1e+04 2.1e+03 0.0e+00 0 0 0 0 0 0 22 12 7 0 23448
MatMultAdd 42 1.0 6.8014e-02 6.5 2.40e+06 1.3 2.8e+04 6.7e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 4051
MatMultTranspose 42 1.0 1.5433e-02 1.3 2.40e+06 1.3 2.8e+04 6.7e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 17852
MatSolve 7 0.0 8.2970e-05 0.0 8.40e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 10
MatSOR 84 1.0 1.0138e-01 3.2 1.90e+07 1.2 4.7e+04 1.6e+03 1.4e+01 0 0 0 0 0 0 40 11 5 2 22193
MatLUFactorSym 1 1.0 9.9010e-03769.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 7.3540e-031542.2 3.14e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatResidual 42 1.0 3.8404e-02 1.8 7.97e+06 1.2 4.7e+04 1.6e+03 0.0e+00 0 0 0 0 0 0 17 11 5 0 24290
MatAssemblyBegin 94 1.0 3.0754e+0114.9 0.00e+00 0.0 7.1e+03 1.0e+04 0.0e+00 4 0 0 0 0 8 0 2 5 0 0
MatAssemblyEnd 94 1.0 5.9230e-02 1.3 0.00e+00 0.0 6.3e+04 2.1e+02 2.3e+02 0 0 0 0 1 0 0 15 1 41 0
MatGetRow 3100250 1.2 4.8146e+01 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 20 0 0 0 0 42 0 0 0 0 0
MatGetRowIJ 1 0.0 2.9087e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 6 1.0 1.5237e-01 2.1 0.00e+00 0.0 5.5e+04 1.8e+04 1.2e+01 0 0 0 0 0 0 0 13 67 2 0
MatCreateSubMat 4 1.0 2.0799e-02 1.3 0.00e+00 0.0 2.8e+03 2.8e+02 6.4e+01 0 0 0 0 0 0 0 1 0 11 0
MatGetOrdering 1 0.0 7.9701e-03 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 6 1.0 3.3583e-02 1.3 0.00e+00 0.0 2.7e+04 1.0e+03 1.2e+01 0 0 0 0 0 0 0 6 2 2 0
MatCoarsen 6 1.0 2.2035e-02 1.2 0.00e+00 0.0 5.4e+04 6.0e+02 3.4e+01 0 0 0 0 0 0 0 13 2 6 0
MatZeroEntries 6 1.0 1.8528e-03 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 6 1.0 1.4971e-01 1.1 1.13e+07 1.3 6.4e+04 2.7e+03 9.2e+01 0 0 0 0 0 0 23 15 12 16 8503
MatPtAPSymbolic 6 1.0 8.6746e-02 1.0 0.00e+00 0.0 3.4e+04 2.7e+03 4.2e+01 0 0 0 0 0 0 0 8 6 7 0
MatPtAPNumeric 6 1.0 5.5958e-02 1.0 1.13e+07 1.3 2.9e+04 2.6e+03 4.8e+01 0 0 0 0 0 0 23 7 5 8 22748
MatGetLocalMat 6 1.0 2.8403e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 6 1.0 9.4802e-03 3.1 0.00e+00 0.0 2.0e+04 3.6e+03 0.0e+00 0 0 0 0 0 0 0 5 5 0 0
SFSetGraph 12 1.0 1.0824e-04 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 12 1.0 1.5241e-02 2.2 0.00e+00 0.0 2.6e+04 6.3e+02 0.0e+00 0 0 0 0 0 0 0 6 1 0 0
SFBcastBegin 46 1.0 3.5763e-03 6.3 0.00e+00 0.0 5.5e+04 7.0e+02 0.0e+00 0 0 0 0 0 0 0 13 3 0 0
SFBcastEnd 46 1.0 6.2499e-03 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
GAMG: createProl 6 1.0 1.0099e+02 1.0 0.00e+00 0.0 2.0e+05 5.3e+03 2.9e+02 48 0 0 0 1 99 0 47 73 51 0
GAMG: partLevel 6 1.0 2.0149e-01 1.1 1.13e+07 1.3 6.6e+04 2.6e+03 1.9e+02 0 0 0 0 1 0 23 16 12 34 6317
repartition 2 1.0 1.9701e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Invert-Sort 2 1.0 1.9929e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 1 0
Move A 2 1.0 2.1171e-02 1.7 0.00e+00 0.0 1.4e+03 5.4e+02 3.4e+01 0 0 0 0 0 0 0 0 0 6 0
Move P 2 1.0 8.9321e-03 1.9 0.00e+00 0.0 1.4e+03 1.3e+01 3.4e+01 0 0 0 0 0 0 0 0 0 6 0
PCSetUp 2 1.0 1.0124e+02 1.0 1.13e+07 1.3 2.7e+05 4.6e+03 5.1e+02 48 0 0 1 1 100 23 63 85 91 13
PCSetUpOnBlocks 7 1.0 1.1790e-02 4.5 3.14e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 7 1.0 1.6586e-01 1.0 3.18e+07 1.2 1.5e+05 1.2e+03 1.4e+01 0 0 0 0 0 0 66 35 13 2 22511
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 1.0703e+02 1.0 3.69e+10 1.2 1.5e+08 1.4e+03 3.7e+04 51100100 99 98 100100100100100 40840
VecTDot 14000 1.0 7.6976e+00 7.3 7.56e+08 1.0 0.0e+00 0.0e+00 1.4e+04 1 2 0 0 37 2 2 0 0 38 12276
VecNormBarrier 9000 1.0 9.7604e-01 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNorm 9000 1.0 6.1436e-01 1.2 4.86e+08 1.0 0.0e+00 0.0e+00 9.0e+03 0 1 0 0 24 1 1 0 0 24 98883
VecScale 42000 1.0 4.6654e-01 2.9 9.47e+07 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 19579
VecCopy 1000 1.0 8.0786e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 147000 1.0 2.1127e+00 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
VecAXPY 14000 1.0 9.5190e-01 2.1 7.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 1 2 0 0 0 99276
VecAYPX 49000 1.0 1.6238e+00 2.5 6.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 49131
VecScatterBarrie 176000 1.0 4.0948e+01 6.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 7 0 0 0 0 14 0 0 0 0 0
VecScatterBegin 176000 1.0 7.2618e+00 4.0 0.00e+00 0.0 1.5e+08 1.4e+03 0.0e+00 2 0100 99 0 4 0100100 0 0
VecScatterEnd 176000 1.0 1.8905e+01 6.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 7 0 0 0 0 0
MatMult 50000 1.0 3.4401e+01 1.7 1.05e+10 1.1 5.1e+07 2.1e+03 0.0e+00 11 28 33 49 0 22 29 33 49 0 36213
MatMultAdd 42000 1.0 1.8117e+01 1.8 2.40e+09 1.3 2.8e+07 6.7e+02 0.0e+00 6 6 18 9 0 11 6 18 9 0 15208
MatMultTranspose 42000 1.0 1.5244e+01 1.3 2.40e+09 1.3 2.8e+07 6.7e+02 0.0e+00 6 6 18 9 0 12 6 18 9 0 18073
MatSolve 7000 0.0 7.3662e-02 0.0 8.40e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 11
MatSOR 84000 1.0 5.1378e+01 2.0 1.90e+10 1.2 4.7e+07 1.6e+03 1.4e+04 22 51 30 33 37 43 51 30 33 38 43690
MatResidual 42000 1.0 2.9679e+01 1.7 7.97e+09 1.2 4.7e+07 1.6e+03 0.0e+00 10 21 30 33 0 19 21 30 33 0 31430
PCSetUpOnBlocks 7000 1.0 1.3489e-01 3.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 7000 1.0 9.7196e+01 1.1 3.18e+10 1.2 1.5e+08 1.2e+03 1.4e+04 46 85 97 84 37 90 85 97 84 38 38361
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 9 11424 0.
DMKSP interface 1 1 656 0.
Vector 4 52 2371888 0.
Matrix 0 72 14160468 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 12 133928 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 14 233696 0.
Preconditioner 1 9 9676 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 8 0 0 0.
Vector 158 110 3181312 0.
Matrix 140 68 21757144 0.
Matrix Coarsen 6 6 3816 0.
Index Set 110 100 543716 0.
Star Forest Graph 12 12 10368 0.
Vec Scatter 31 18 22752 0.
Preconditioner 8 0 0 0.
--- Event Stage 2: Remaining Solves
Vector 42000 42000 2276680000 0.
========================================================================================================================
Average time to get PetscTime(): 6.19888e-07
Average time for MPI_Barrier(): 7.58171e-06
Average time for zero size MPI_Send(): 6.96945e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_sync
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3" --CXXOPTFLAGS="-g -O3" --FOPTFLAGS="-g -O3" --with-openmp=1 --download-sowing --download-ptscotch=1 --download-fblaslapack=1 --download-scalapack=1 --download-strumpack=1 --download-superlu_dist=1 --download-metis=1 --download-parmetis=1 --download-mumps=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-04 18:36:31 on beboplogin1
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lstrumpack -lscalapack -lsuperlu_dist -lflapack -lfblas -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lrt -lm -lpthread -lz -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
using 1000 of 1000 processes
30^3 unknowns per processor
total system size: 300^3
mesh size: 0.0001
initsolve: 8 iterations
solve 1: 8 iterations
solve 2: 8 iterations
solve 3: 8 iterations
solve 4: 8 iterations
solve 5: 8 iterations
solve 6: 8 iterations
solve 7: 8 iterations
solve 8: 8 iterations
solve 9: 8 iterations
solve 10: 8 iterations
solve 20: 8 iterations
solve 30: 8 iterations
solve 40: 8 iterations
solve 50: 8 iterations
solve 60: 8 iterations
solve 70: 8 iterations
solve 80: 8 iterations
solve 90: 8 iterations
solve 100: 8 iterations
solve 200: 8 iterations
solve 300: 8 iterations
solve 400: 8 iterations
solve 500: 8 iterations
solve 600: 8 iterations
solve 700: 8 iterations
solve 800: 8 iterations
solve 900: 8 iterations
solve 1000: 8 iterations
Time in solve(): 150.306 s
Time in KSPSolve(): 150.062 s (99.838%)
Number of KSP iterations (total): 8000
Number of solve iterations (total): 1000 (ratio: 8.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0289 with 1000 processors, by jczhang Mon Jun 4 14:49:04 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 2.578e+02 1.00003 2.578e+02
Objects: 4.854e+04 1.00002 4.854e+04
Flop: 4.220e+10 1.15865 4.106e+10 4.106e+13
Flop/sec: 1.637e+08 1.15867 1.593e+08 1.593e+11
MPI Messages: 2.436e+06 3.97680 1.683e+06 1.683e+09
MPI Message Lengths: 2.592e+09 2.20360 1.364e+03 2.296e+12
MPI Reductions: 4.266e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.3517e-01 0.1% 0.0000e+00 0.0% 1.080e+04 0.0% 1.802e+03 0.0% 1.700e+01 0.0%
1: First Solve: 1.0727e+02 41.6% 5.1626e+10 0.1% 4.348e+06 0.3% 3.241e+03 0.6% 6.340e+02 1.5%
2: Remaining Solves: 1.5032e+02 58.3% 4.1013e+13 99.9% 1.679e+09 99.7% 1.359e+03 99.4% 4.200e+04 98.5%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 5.7883e-03214.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 12 1.0 1.3386e-02 4.1 0.00e+00 0.0 8.9e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 2 0 0 0
BuildTwoSidedF 30 1.0 2.0080e+01 7.8 0.00e+00 0.0 6.5e+04 1.1e+04 0.0e+00 2 0 0 0 0 5 0 2 5 0 0
KSPSetUp 9 1.0 6.3870e-03 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 1 1.0 1.0726e+02 1.0 5.33e+07 1.2 4.3e+06 3.2e+03 6.3e+02 42 0 0 1 1 100100100100100 481
VecTDot 16 1.0 1.1937e-02 1.9 8.64e+05 1.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 0 0 2 0 0 3 72376
VecNormBarrier 10 1.0 2.8527e-0228.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNorm 10 1.0 2.5727e-02 1.0 5.40e+05 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 1 0 0 2 20990
VecScale 48 1.0 5.2900e-0313.4 1.08e+05 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 18123
VecCopy 1 1.0 5.3449e-03136.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 208 1.0 2.9714e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 16 1.0 2.4686e-0224.0 8.64e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 34999
VecAYPX 56 1.0 2.0964e-03 1.8 7.42e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 351907
VecAssemblyBegin 3 1.0 8.3923e-0512.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 3 1.0 6.8665e-0511.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBarrie 208 1.0 8.8101e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 208 1.0 1.9391e-02 3.3 0.00e+00 0.0 1.7e+06 1.4e+03 0.0e+00 0 0 0 0 0 0 0 39 16 0 0
VecScatterEnd 208 1.0 2.5424e-02 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 57 1.0 5.9834e-02 1.5 1.19e+07 1.1 5.6e+05 2.0e+03 0.0e+00 0 0 0 0 0 0 23 13 8 0 194402
MatMultAdd 48 1.0 3.5768e-02 2.1 2.75e+06 1.3 3.0e+05 6.6e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 73531
MatMultTranspose 48 1.0 2.1412e-02 1.4 2.75e+06 1.3 3.0e+05 6.6e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 122831
MatSolve 8 0.0 7.7009e-05 0.0 1.14e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 149
MatSOR 96 1.0 8.2143e-02 1.6 2.18e+07 1.2 5.2e+05 1.5e+03 1.6e+01 0 0 0 0 0 0 41 12 5 3 257563
MatLUFactorSym 1 1.0 9.9230e-03832.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 7.5920e-031990.2 1.28e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2
MatResidual 48 1.0 4.2435e-02 1.6 9.11e+06 1.2 5.2e+05 1.5e+03 0.0e+00 0 0 0 0 0 0 17 12 5 0 207738
MatAssemblyBegin 102 1.0 2.0083e+01 7.8 0.00e+00 0.0 6.5e+04 1.1e+04 0.0e+00 2 0 0 0 0 5 0 2 5 0 0
MatAssemblyEnd 102 1.0 7.7547e-02 1.3 0.00e+00 0.0 6.3e+05 2.0e+02 2.5e+02 0 0 0 0 1 0 0 14 1 39 0
MatGetRow 3100266 1.2 5.0587e+01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18 0 0 0 0 44 0 0 0 0 0
MatGetRowIJ 1 0.0 1.3113e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 6 1.0 1.7499e-01 2.4 0.00e+00 0.0 5.7e+05 1.6e+04 1.2e+01 0 0 0 0 0 0 0 13 66 2 0
MatCreateSubMat 6 1.0 4.6081e-02 1.1 0.00e+00 0.0 2.2e+04 3.3e+02 9.4e+01 0 0 0 0 0 0 0 1 0 15 0
MatGetOrdering 1 0.0 2.5415e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 6 1.0 5.4010e-02 1.2 0.00e+00 0.0 2.6e+05 9.9e+02 1.2e+01 0 0 0 0 0 0 0 6 2 2 0
MatCoarsen 6 1.0 3.1411e-02 1.2 0.00e+00 0.0 5.4e+05 5.6e+02 4.8e+01 0 0 0 0 0 0 0 12 2 8 0
MatZeroEntries 6 1.0 1.7152e-03 4.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 6 1.0 1.9647e-01 1.0 1.11e+07 1.3 6.3e+05 2.5e+03 9.2e+01 0 0 0 0 0 0 20 15 11 15 53808
MatPtAPSymbolic 6 1.0 1.1147e-01 1.0 0.00e+00 0.0 3.2e+05 2.7e+03 4.2e+01 0 0 0 0 0 0 0 7 6 7 0
MatPtAPNumeric 6 1.0 7.6733e-02 1.0 1.11e+07 1.3 3.1e+05 2.3e+03 4.8e+01 0 0 0 0 0 0 20 7 5 8 137771
MatGetLocalMat 6 1.0 2.9888e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 6 1.0 9.0182e-03 3.4 0.00e+00 0.0 1.9e+05 3.5e+03 0.0e+00 0 0 0 0 0 0 0 4 5 0 0
SFSetGraph 12 1.0 1.3924e-04 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 12 1.0 2.1541e-02 2.5 0.00e+00 0.0 2.7e+05 5.8e+02 0.0e+00 0 0 0 0 0 0 0 6 1 0 0
SFBcastBegin 60 1.0 4.6461e-03 5.5 0.00e+00 0.0 5.6e+05 6.5e+02 0.0e+00 0 0 0 0 0 0 0 13 3 0 0
SFBcastEnd 60 1.0 6.3133e-03 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
GAMG: createProl 6 1.0 1.0658e+02 1.0 0.00e+00 0.0 2.0e+06 5.1e+03 3.0e+02 41 0 0 0 1 99 0 46 72 47 0
GAMG: partLevel 6 1.0 2.9647e-01 1.0 1.11e+07 1.3 6.5e+05 2.4e+03 2.4e+02 0 0 0 0 1 0 20 15 11 38 35658
repartition 3 1.0 1.2493e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
Invert-Sort 3 1.0 3.0791e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Move A 3 1.0 4.2872e-02 1.3 0.00e+00 0.0 9.5e+03 7.4e+02 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
Move P 3 1.0 1.3327e-02 1.6 0.00e+00 0.0 1.2e+04 1.3e+01 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
PCSetUp 2 1.0 1.0693e+02 1.0 1.11e+07 1.3 2.7e+06 4.4e+03 5.8e+02 41 0 0 1 1 100 20 61 84 91 99
PCSetUpOnBlocks 8 1.0 1.2192e-02 3.8 1.28e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1
PCApply 8 1.0 1.6499e-01 1.1 3.64e+07 1.2 1.6e+06 1.2e+03 1.6e+01 0 0 0 0 0 0 68 37 14 3 213545
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 1.5010e+02 1.0 4.21e+10 1.2 1.7e+09 1.4e+03 4.2e+04 58100100 99 98 100100100100100 273235
VecTDot 16000 1.0 8.7654e+00 2.8 8.64e+08 1.0 0.0e+00 0.0e+00 1.6e+04 2 2 0 0 38 3 2 0 0 38 98568
VecNormBarrier 10000 1.0 1.4725e+00 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
VecNorm 10000 1.0 1.1930e+00 1.2 5.40e+08 1.0 0.0e+00 0.0e+00 1.0e+04 0 1 0 0 23 1 1 0 0 24 452622
VecScale 48000 1.0 6.3976e-01 2.4 1.08e+08 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 149856
VecCopy 1000 1.0 8.1365e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 168000 1.0 2.6112e+00 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
VecAXPY 16000 1.0 1.0987e+00 1.4 8.64e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 1 2 0 0 0 786401
VecAYPX 56000 1.0 2.0142e+00 1.7 7.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 366271
VecScatterBarrie 201000 1.0 5.5632e+01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 20 0 0 0 0 0
VecScatterBegin 201000 1.0 9.0613e+00 2.7 0.00e+00 0.0 1.7e+09 1.4e+03 0.0e+00 3 0100 99 0 5 0100100 0 0
VecScatterEnd 201000 1.0 1.7975e+01 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 6 0 0 0 0 0
MatMult 57000 1.0 4.3542e+01 1.4 1.19e+10 1.1 5.6e+08 2.0e+03 0.0e+00 13 28 34 49 0 23 28 34 49 0 267140
MatMultAdd 48000 1.0 2.1388e+01 1.4 2.75e+09 1.3 3.0e+08 6.6e+02 0.0e+00 6 6 18 9 0 11 6 18 9 0 122969
MatMultTranspose 48000 1.0 2.3754e+01 1.3 2.75e+09 1.3 3.0e+08 6.6e+02 0.0e+00 8 6 18 9 0 13 6 18 9 0 110722
MatSolve 8000 0.0 8.0224e-02 0.0 1.14e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 143
MatSOR 96000 1.0 6.6546e+01 1.5 2.17e+10 1.2 5.2e+08 1.5e+03 1.6e+04 24 51 31 33 38 42 51 31 34 38 317303
MatResidual 48000 1.0 3.8039e+01 1.4 9.11e+09 1.2 5.2e+08 1.5e+03 0.0e+00 11 21 31 33 0 19 21 31 34 0 231744
PCSetUpOnBlocks 8000 1.0 1.6057e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 8000 1.0 1.3489e+02 1.0 3.63e+10 1.2 1.6e+09 1.2e+03 1.6e+04 52 86 97 84 38 89 86 97 85 38 260888
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 9 11424 0.
DMKSP interface 1 1 656 0.
Vector 4 52 2382272 0.
Matrix 0 65 14796304 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 18 171648 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 14 233696 0.
Preconditioner 1 9 9676 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 8 0 0 0.
Vector 176 128 3546680 0.
Matrix 148 83 22941288 0.
Matrix Coarsen 6 6 3816 0.
Index Set 128 112 590732 0.
Star Forest Graph 12 12 10368 0.
Vec Scatter 34 21 26544 0.
Preconditioner 8 0 0 0.
--- Event Stage 2: Remaining Solves
Vector 48000 48000 2616576000 0.
========================================================================================================================
Average time to get PetscTime(): 5.00679e-07
Average time for MPI_Barrier(): 1.30177e-05
Average time for zero size MPI_Send(): 7.15208e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_sync
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3" --CXXOPTFLAGS="-g -O3" --FOPTFLAGS="-g -O3" --with-openmp=1 --download-sowing --download-ptscotch=1 --download-fblaslapack=1 --download-scalapack=1 --download-strumpack=1 --download-superlu_dist=1 --download-metis=1 --download-parmetis=1 --download-mumps=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-04 18:36:31 on beboplogin1
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mpi-2017.3-dfphq6kavje2olnichisvjjndtridrok/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -L/usr/lib/gcc/x86_64-redhat-linux/4.8.5 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lstrumpack -lscalapack -lsuperlu_dist -lflapack -lfblas -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lrt -lm -lpthread -lz -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MatSOR_SeqAIJ.png
Type: image/png
Size: 253560 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180606/fda3c121/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PAPI_TOT_CYC.png
Type: image/png
Size: 81097 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180606/fda3c121/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PAPI_DP_OPS.png
Type: image/png
Size: 80924 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180606/fda3c121/attachment-0007.png>
More information about the petsc-dev
mailing list