[petsc-dev] [petsc-users] Poor weak scaling when solving successive linearsystems
Junchao Zhang
jczhang at mcs.anl.gov
Sun Jun 10 23:46:39 CDT 2018
I used an LCRC machine named Bebop. I tested on its Intel Broadwell nodes.
Each nodes has 2 CPUs and 36 cores in total. I collected data using 36
cores in a node or 18 cores in a node. As you can see, 18 cores/node gave
much better performance, which is reasonable as routines like MatSOR,
MatMult, MatMultAdd are all bandwidth bound.
The code uses a DMDA 3D grid, 7-point stencil, and defines nodes(vertices)
at the surface or second to the surface as boundary nodes. Boundary nodes
only have a diagonal one in their row in the matrix. Interior nodes have 7
nonzeros in their row. Boundary processors in the processor grid has less
nonzero. This is one source of load-imbalance. Will load-imbalance get
severer at coarser grids of an MG level?
I attach a trace view figure that show activity of each ranks along the
time axis in one KSPSove. White color means MPI wait. You can see white
takes a large space.
I don't have a good explanation why at large scale (1728 cores), processors
wait longer time, as the communication pattern is still 7-point stencil in
a cubic processor gird.
--Junchao Zhang
On Sat, Jun 9, 2018 at 11:32 AM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>
> Junchao,
>
> Thanks, the load balance of matrix entries is remarkably similar for
> the two runs so it can't be a matter of worse work load imbalance for SOR
> for the larger case explaining why the SOR takes more time.
>
> Here is my guess (and I know no way to confirm it). In the smaller
> case the overlap of different processes on the same node running SOR at the
> same time is lower than the larger case hence the larger case is slower
> because there are more SOR processes fighting over the same memory
> bandwidth at the same time than in the smaller case. Ahh, here is
> something you can try, lets undersubscribe the memory bandwidth needs, run
> on say 16 processes per node with 8 nodes and 16 processes per node with 64
> nodes and send the two -log_view output files. I assume this is an LCRC
> machine and NOT a KNL system?
>
> Thanks
>
>
> Barry
>
>
> > On Jun 9, 2018, at 8:29 AM, Mark Adams <mfadams at lbl.gov> wrote:
> >
> > -pc_gamg_type classical
> >
> > FYI, we only support smoothed aggregation "agg" (the default). (This
> thread started by saying you were using GAMG.)
> >
> > It is not clear how much this will make a difference for you, but you
> don't want to use classical because we do not support it. It is meant as a
> reference implementation for developers.
> >
> > First, how did you get the idea to use classical? If the documentation
> lead you to believe this was a good thing to do then we need to fix that!
> >
> > Anyway, here is a generic input for GAMG:
> >
> > -pc_type gamg
> > -pc_gamg_type agg
> > -pc_gamg_agg_nsmooths 1
> > -pc_gamg_coarse_eq_limit 1000
> > -pc_gamg_reuse_interpolation true
> > -pc_gamg_square_graph 1
> > -pc_gamg_threshold 0.05
> > -pc_gamg_threshold_scale .0
> >
> >
> >
> >
> > On Thu, Jun 7, 2018 at 6:52 PM, Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
> > OK, I have thought that space was a typo. btw, this option does not show
> up in -h.
> > I changed number of ranks to use all cores on each node to avoid
> misleading ratio in -log_view. Since one node has 36 cores, I ran with
> 6^3=216 ranks, and 12^3=1728 ranks. I also found call counts of MatSOR etc
> in the two tests were different. So they are not strict weak scaling tests.
> I tried to add -ksp_max_it 6 -pc_mg_levels 6, but still could not make the
> two have the same MatSOR count. Anyway, I attached the load balance output.
> >
> > I find PCApply_MG calls PCMGMCycle_Private, which is recursive and
> indirectly calls MatSOR_MPIAIJ. I believe the following code in
> MatSOR_MPIAIJ practically syncs {MatSOR, MatMultAdd}_SeqAIJ between
> processors through VecScatter at each MG level. If SOR and MatMultAdd are
> imbalanced, the cost is accumulated along MG levels and shows up as large
> VecScatter cost.
> > 1460: while
> > (its--) {
> >
> > 1461: VecScatterBegin(mat->Mvctx,xx,mat->lvec,INSERT_VALUES,
> SCATTER_FORWARD
> > );
> >
> > 1462: VecScatterEnd(mat->Mvctx,xx,mat->lvec,INSERT_VALUES,
> SCATTER_FORWARD
> > );
> >
> >
> > 1464: /* update rhs: bb1 = bb - B*x */
> > 1465: VecScale
> > (mat->lvec,-1.0);
> >
> > 1466: (*mat->B->ops->multadd)(mat->
> > B,mat->lvec,bb,bb1);
> >
> >
> > 1468: /* local sweep */
> > 1469: (*mat->A->ops->sor)(mat->A,bb1,omega,SOR_SYMMETRIC_SWEEP,
> > fshift,lits,1,xx);
> >
> > 1470: }
> >
> >
> >
> > --Junchao Zhang
> >
> > On Thu, Jun 7, 2018 at 3:11 PM, Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> >
> >
> > > On Jun 7, 2018, at 12:27 PM, Zhang, Junchao <jczhang at mcs.anl.gov>
> wrote:
> > >
> > > Searched but could not find this option, -mat_view::load_balance
> >
> > There is a space between the view and the : load_balance is a
> particular viewer format that causes the printing of load balance
> information about number of nonzeros in the matrix.
> >
> > Barry
> >
> > >
> > > --Junchao Zhang
> > >
> > > On Thu, Jun 7, 2018 at 10:46 AM, Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> > > So the only surprise in the results is the SOR. It is embarrassingly
> parallel and normally one would not see a jump.
> > >
> > > The load balance for SOR time 1.5 is better at 1000 processes than
> for 125 processes of 2.1 not worse so this number doesn't easily explain
> it.
> > >
> > > Could you run the 125 and 1000 with -mat_view ::load_balance and see
> what you get out?
> > >
> > > Thanks
> > >
> > > Barry
> > >
> > > Notice that the MatSOR time jumps a lot about 5 secs when the
> -log_sync is on. My only guess is that the MatSOR is sharing memory
> bandwidth (or some other resource? cores?) with the VecScatter and for some
> reason this is worse for 1000 cores but I don't know why.
> > >
> > > > On Jun 6, 2018, at 9:13 PM, Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
> > > >
> > > > Hi, PETSc developers,
> > > > I tested Michael Becker's code. The code calls the same KSPSolve
> 1000 times in the second stage and needs cubic number of processors to run.
> I ran with 125 ranks and 1000 ranks, with or without -log_sync option. I
> attach the log view output files and a scaling loss excel file.
> > > > I profiled the code with 125 processors. It looks {MatSOR, MatMult,
> MatMultAdd, MatMultTranspose, MatMultTransposeAdd}_SeqAIJ in aij.c took
> ~50% of the time, The other half time was spent on waiting in MPI.
> MatSOR_SeqAIJ took 30%, mostly in PetscSparseDenseMinusDot().
> > > > I tested it on a 36 cores/node machine. I found 32 ranks/node gave
> better performance (about 10%) than 36 ranks/node in the 125 ranks
> testing. I guess it is because processors in the former had more balanced
> memory bandwidth. I collected PAPI_DP_OPS (double precision operations) and
> PAPI_TOT_CYC (total cycles) of the 125 ranks case (see the attached files).
> It looks ranks at the two ends have less DP_OPS and TOT_CYC.
> > > > Does anyone familiar with the algorithm have quick explanations?
> > > >
> > > > --Junchao Zhang
> > > >
> > > > On Mon, Jun 4, 2018 at 11:59 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de> wrote:
> > > > Hello again,
> > > >
> > > > this took me longer than I anticipated, but here we go.
> > > > I did reruns of the cases where only half the processes per node
> were used (without -log_sync):
> > > >
> > > > 125 procs,1st 125 procs,2nd
> 1000 procs,1st 1000 procs,2nd
> > > > Max Ratio Max Ratio
> Max Ratio Max Ratio
> > > > KSPSolve 1.203E+02 1.0 1.210E+02 1.0
> 1.399E+02 1.1 1.365E+02 1.0
> > > > VecTDot 6.376E+00 3.7 6.551E+00 4.0
> 7.885E+00 2.9 7.175E+00 3.4
> > > > VecNorm 4.579E+00 7.1 5.803E+00 10.2
> 8.534E+00 6.9 6.026E+00 4.9
> > > > VecScale 1.070E-01 2.1 1.129E-01 2.2
> 1.301E-01 2.5 1.270E-01 2.4
> > > > VecCopy 1.123E-01 1.3 1.149E-01 1.3
> 1.301E-01 1.6 1.359E-01 1.6
> > > > VecSet 7.063E-01 1.7 6.968E-01 1.7
> 7.432E-01 1.8 7.425E-01 1.8
> > > > VecAXPY 1.166E+00 1.4 1.167E+00 1.4
> 1.221E+00 1.5 1.279E+00 1.6
> > > > VecAYPX 1.317E+00 1.6 1.290E+00 1.6
> 1.536E+00 1.9 1.499E+00 2.0
> > > > VecScatterBegin 6.142E+00 3.2 5.974E+00 2.8
> 6.448E+00 3.0 6.472E+00 2.9
> > > > VecScatterEnd 3.606E+01 4.2 3.551E+01 4.0
> 5.244E+01 2.7 4.995E+01 2.7
> > > > MatMult 3.561E+01 1.6 3.403E+01 1.5
> 3.435E+01 1.4 3.332E+01 1.4
> > > > MatMultAdd 1.124E+01 2.0 1.130E+01 2.1
> 2.093E+01 2.9 1.995E+01 2.7
> > > > MatMultTranspose 1.372E+01 2.5 1.388E+01 2.6
> 1.477E+01 2.2 1.381E+01 2.1
> > > > MatSolve 1.949E-02 0.0 1.653E-02 0.0
> 4.789E-02 0.0 4.466E-02 0.0
> > > > MatSOR 6.610E+01 1.3 6.673E+01 1.3
> 7.111E+01 1.3 7.105E+01 1.3
> > > > MatResidual 2.647E+01 1.7 2.667E+01 1.7
> 2.446E+01 1.4 2.467E+01 1.5
> > > > PCSetUpOnBlocks 5.266E-03 1.4 5.295E-03 1.4
> 5.427E-03 1.5 5.289E-03 1.4
> > > > PCApply 1.031E+02 1.0 1.035E+02 1.0
> 1.180E+02 1.0 1.164E+02 1.0
> > > >
> > > > I also slimmed down my code and basically wrote a simple weak
> scaling test (source files attached) so you can profile it yourself. I
> appreciate the offer Junchao, thank you.
> > > > You can adjust the system size per processor at runtime via
> "-nodes_per_proc 30" and the number of repeated calls to the function
> containing KSPsolve() via "-iterations 1000". The physical problem is
> simply calculating the electric potential from a homogeneous charge
> distribution, done multiple times to accumulate time in KSPsolve().
> > > > A job would be started using something like
> > > > mpirun -n 125 ~/petsc_ws/ws_test -nodes_per_proc 30 -mesh_size 1E-4
> -iterations 1000 \\
> > > > -ksp_rtol 1E-6 \
> > > > -log_view -log_sync\
> > > > -pc_type gamg -pc_gamg_type classical\
> > > > -ksp_type cg \
> > > > -ksp_norm_type unpreconditioned \
> > > > -mg_levels_ksp_type richardson \
> > > > -mg_levels_ksp_norm_type none \
> > > > -mg_levels_pc_type sor \
> > > > -mg_levels_ksp_max_it 1 \
> > > > -mg_levels_pc_sor_its 1 \
> > > > -mg_levels_esteig_ksp_type cg \
> > > > -mg_levels_esteig_ksp_max_it 10 \
> > > > -gamg_est_ksp_type cg
> > > > , ideally started on a cube number of processes for a cubical
> process grid.
> > > > Using 125 processes and 10.000 iterations I get the output in
> "log_view_125_new.txt", which shows the same imbalance for me.
> > > > Michael
> > > >
> > > >
> > > > Am 02.06.2018 um 13:40 schrieb Mark Adams:
> > > >>
> > > >>
> > > >> On Fri, Jun 1, 2018 at 11:20 PM, Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
> > > >> Hi,Michael,
> > > >> You can add -log_sync besides -log_view, which adds barriers to
> certain events but measures barrier time separately from the events. I find
> this option makes it easier to interpret log_view output.
> > > >>
> > > >> That is great (good to know).
> > > >>
> > > >> This should give us a better idea if your large VecScatter costs
> are from slow communication or if it catching some sort of load imbalance.
> > > >>
> > > >>
> > > >> --Junchao Zhang
> > > >>
> > > >> On Wed, May 30, 2018 at 3:27 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de> wrote:
> > > >> Barry: On its way. Could take a couple days again.
> > > >>
> > > >> Junchao: I unfortunately don't have access to a cluster with a
> faster network. This one has a mixed 4X QDR-FDR InfiniBand 2:1 blocking
> fat-tree network, which I realize causes parallel slowdown if the nodes are
> not connected to the same switch. Each node has 24 processors (2x12/socket)
> and four NUMA domains (two for each socket).
> > > >> The ranks are usually not distributed perfectly even, i.e. for 125
> processes, of the six required nodes, five would use 21 cores and one 20.
> > > >> Would using another CPU type make a difference communication-wise?
> I could switch to faster ones (on the same network), but I always assumed
> this would only improve performance of the stuff that is unrelated to
> communication.
> > > >>
> > > >> Michael
> > > >>
> > > >>
> > > >>
> > > >>> The log files have something like "Average time for zero size
> MPI_Send(): 1.84231e-05". It looks you ran on a cluster with a very slow
> network. A typical machine should give less than 1/10 of the latency you
> have. An easy way to try is just running the code on a machine with a
> faster network and see what happens.
> > > >>>
> > > >>> Also, how many cores & numa domains does a compute node have? I
> could not figure out how you distributed the 125 MPI ranks evenly.
> > > >>>
> > > >>> --Junchao Zhang
> > > >>>
> > > >>> On Tue, May 29, 2018 at 6:18 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de> wrote:
> > > >>> Hello again,
> > > >>>
> > > >>> here are the updated log_view files for 125 and 1000 processors. I
> ran both problems twice, the first time with all processors per node
> allocated ("-1.txt"), the second with only half on twice the number of
> nodes ("-2.txt").
> > > >>>
> > > >>>>> On May 24, 2018, at 12:24 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>> I noticed that for every individual KSP iteration, six vector
> objects are created and destroyed (with CG, more with e.g. GMRES).
> > > >>>>>
> > > >>>> Hmm, it is certainly not intended at vectors be created and
> destroyed within each KSPSolve() could you please point us to the code that
> makes you think they are being created and destroyed? We create all the
> work vectors at KSPSetUp() and destroy them in KSPReset() not during the
> solve. Not that this would be a measurable distance.
> > > >>>>
> > > >>>
> > > >>> I mean this, right in the log_view output:
> > > >>>
> > > >>>> Memory usage is given in bytes:
> > > >>>>
> > > >>>> Object Type Creations Destructions Memory Descendants' Mem.
> > > >>>> Reports information only for process 0.
> > > >>>>
> > > >>>> --- Event Stage 0: Main Stage
> > > >>>>
> > > >>>> ...
> > > >>>>
> > > >>>> --- Event Stage 1: First Solve
> > > >>>>
> > > >>>> ...
> > > >>>>
> > > >>>> --- Event Stage 2: Remaining Solves
> > > >>>>
> > > >>>> Vector 23904 23904 1295501184 0.
> > > >>> I logged the exact number of KSP iterations over the 999 timesteps
> and its exactly 23904/6 = 3984.
> > > >>> Michael
> > > >>>
> > > >>>
> > > >>> Am 24.05.2018 um 19:50 schrieb Smith, Barry F.:
> > > >>>>
> > > >>>> Please send the log file for 1000 with cg as the solver.
> > > >>>>
> > > >>>> You should make a bar chart of each event for the two cases to
> see which ones are taking more time and which are taking less (we cannot
> tell with the two logs you sent us since they are for different solvers.)
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>> On May 24, 2018, at 12:24 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>> I noticed that for every individual KSP iteration, six vector
> objects are created and destroyed (with CG, more with e.g. GMRES).
> > > >>>>>
> > > >>>> Hmm, it is certainly not intended at vectors be created and
> destroyed within each KSPSolve() could you please point us to the code that
> makes you think they are being created and destroyed? We create all the
> work vectors at KSPSetUp() and destroy them in KSPReset() not during the
> solve. Not that this would be a measurable distance.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>> This seems kind of wasteful, is this supposed to be like this?
> Is this even the reason for my problems? Apart from that, everything seems
> quite normal to me (but I'm not the expert here).
> > > >>>>>
> > > >>>>>
> > > >>>>> Thanks in advance.
> > > >>>>>
> > > >>>>> Michael
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> <log_view_125procs.txt><log_vi
> > > >>>>> ew_1000procs.txt>
> > > >>>>>
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > > > <o-wstest-125.txt><Scaling-loss.png><o-wstest-1000.txt><
> o-wstest-sync-125.txt><o-wstest-sync-1000.txt><MatSOR_
> SeqAIJ.png><PAPI_TOT_CYC.png><PAPI_DP_OPS.png>
> > >
> > >
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180610/2e23a692/attachment-0001.html>
-------------- next part --------------
using 216 of 216 processes
30^3 unknowns per processor
total system size: 180^3
mesh size: 0.0001
initsolve: 7 iterations
solve 1: 7 iterations
solve 2: 7 iterations
solve 3: 7 iterations
solve 4: 7 iterations
solve 5: 7 iterations
solve 6: 7 iterations
solve 7: 7 iterations
solve 8: 7 iterations
solve 9: 7 iterations
solve 10: 7 iterations
solve 20: 7 iterations
solve 30: 7 iterations
solve 40: 7 iterations
solve 50: 7 iterations
solve 60: 7 iterations
solve 70: 7 iterations
solve 80: 7 iterations
solve 90: 7 iterations
solve 100: 7 iterations
solve 200: 7 iterations
solve 300: 7 iterations
solve 400: 7 iterations
solve 500: 7 iterations
solve 600: 7 iterations
solve 700: 7 iterations
solve 800: 7 iterations
solve 900: 7 iterations
solve 1000: 7 iterations
Time in solve(): 65.0281 s
Time in KSPSolve(): 64.8664 s (99.7513%)
Number of KSP iterations (total): 7000
Number of solve iterations (total): 1000 (ratio: 7.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0491 with 216 processors, by jczhang Sat Jun 9 16:31:13 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 1.171e+02 1.00000 1.171e+02
Objects: 4.253e+04 1.00002 4.253e+04
Flop: 3.698e+10 1.15842 3.534e+10 7.632e+12
Flop/sec: 3.157e+08 1.15842 3.017e+08 6.516e+10
MPI Messages: 1.858e+06 3.50879 1.262e+06 2.725e+08
MPI Message Lengths: 2.275e+09 2.20338 1.459e+03 3.975e+11
MPI Reductions: 3.764e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.7386e-02 0.0% 0.0000e+00 0.0% 2.160e+03 0.0% 1.802e+03 0.0% 1.700e+01 0.0%
1: First Solve: 5.2061e+01 44.4% 9.8678e+09 0.1% 7.475e+05 0.3% 3.517e+03 0.7% 6.130e+02 1.6%
2: Remaining Solves: 6.5040e+01 55.5% 7.6225e+12 99.9% 2.718e+08 99.7% 1.453e+03 99.3% 3.700e+04 98.3%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 9.1314e-05 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 12 1.0 3.7413e-03 1.6 0.00e+00 0.0 1.6e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 2 0 0 0
BuildTwoSidedF 30 1.0 4.7839e+00 3.4 0.00e+00 0.0 1.2e+04 1.1e+04 0.0e+00 2 0 0 0 0 5 0 2 5 0 0
KSPSetUp 9 1.0 1.2829e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 1 1.0 5.2060e+01 1.0 4.82e+07 1.2 7.5e+05 3.5e+03 6.1e+02 44 0 0 1 2 100100100100100 190
VecTDot 14 1.0 1.8892e-03 1.6 7.56e+05 1.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 0 0 2 0 0 2 86434
VecNorm 9 1.0 1.1718e-03 1.8 4.86e+05 1.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 0 0 1 0 0 1 89583
VecScale 42 1.0 2.2149e-04 2.6 9.47e+04 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 74579
VecCopy 1 1.0 1.2589e-04 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 187 1.0 1.3061e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 14 1.0 5.4979e-04 1.2 7.56e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 297013
VecAYPX 49 1.0 8.3947e-04 1.3 6.46e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 164549
VecAssemblyBegin 3 1.0 5.1260e-0512.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 3 1.0 2.4080e-05 8.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 183 1.0 3.7851e-03 2.1 0.00e+00 0.0 2.7e+05 1.5e+03 0.0e+00 0 0 0 0 0 0 0 36 15 0 0
VecScatterEnd 183 1.0 1.2801e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 50 1.0 1.2769e-02 1.2 1.05e+07 1.1 9.2e+04 2.1e+03 0.0e+00 0 0 0 0 0 0 22 12 7 0 170012
MatMultAdd 42 1.0 6.6104e-03 1.5 2.40e+06 1.3 4.8e+04 7.1e+02 0.0e+00 0 0 0 0 0 0 5 6 1 0 73068
MatMultTranspose 42 1.0 7.5936e-03 1.4 2.40e+06 1.3 4.8e+04 7.1e+02 0.0e+00 0 0 0 0 0 0 5 6 1 0 63607
MatSolve 7 0.0 4.7922e-05 0.0 1.62e+03 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 34
MatSOR 84 1.0 2.6659e-02 1.1 1.90e+07 1.2 8.3e+04 1.6e+03 1.4e+01 0 0 0 0 0 0 40 11 5 2 147225
MatLUFactorSym 1 1.0 1.1301e-04 8.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 8.5831e-0527.7 8.32e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 10
MatResidual 42 1.0 1.0384e-02 1.3 7.97e+06 1.2 8.3e+04 1.6e+03 0.0e+00 0 0 0 0 0 0 17 11 5 0 156947
MatAssemblyBegin 102 1.0 4.7864e+00 3.4 0.00e+00 0.0 1.2e+04 1.1e+04 0.0e+00 2 0 0 0 0 5 0 2 5 0 0
MatAssemblyEnd 102 1.0 4.3458e-02 1.1 0.00e+00 0.0 1.1e+05 2.2e+02 2.5e+02 0 0 0 0 1 0 0 15 1 40 0
MatGetRow 3100265 1.2 2.4304e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 20 0 0 0 0 44 0 0 0 0 0
MatGetRowIJ 1 0.0 9.0599e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 6 1.0 1.8371e-01 2.4 0.00e+00 0.0 1.0e+05 1.8e+04 1.2e+01 0 0 0 0 0 0 0 13 67 2 0
MatCreateSubMat 6 1.0 7.5428e-03 1.2 0.00e+00 0.0 3.2e+03 4.3e+02 9.4e+01 0 0 0 0 0 0 0 0 0 15 0
MatGetOrdering 1 0.0 6.1035e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 6 1.0 3.0780e-02 1.2 0.00e+00 0.0 4.8e+04 1.0e+03 1.2e+01 0 0 0 0 0 0 0 6 2 2 0
MatCoarsen 6 1.0 8.8811e-03 1.2 0.00e+00 0.0 9.2e+04 6.3e+02 3.2e+01 0 0 0 0 0 0 0 12 2 5 0
MatZeroEntries 6 1.0 9.8872e-04 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 6 1.0 1.2688e-01 1.0 1.13e+07 1.3 1.2e+05 2.7e+03 9.2e+01 0 0 0 0 0 0 23 15 12 15 17626
MatPtAPSymbolic 6 1.0 8.0804e-02 1.0 0.00e+00 0.0 6.1e+04 2.8e+03 4.2e+01 0 0 0 0 0 0 0 8 6 7 0
MatPtAPNumeric 6 1.0 4.6607e-02 1.0 1.13e+07 1.3 5.5e+04 2.6e+03 4.8e+01 0 0 0 0 0 0 23 7 5 8 47984
MatGetLocalMat 6 1.0 2.1553e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 6 1.0 4.4849e-03 1.7 0.00e+00 0.0 3.6e+04 3.7e+03 0.0e+00 0 0 0 0 0 0 0 5 5 0 0
SFSetGraph 12 1.0 6.7472e-05 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 12 1.0 6.0279e-03 1.1 0.00e+00 0.0 4.8e+04 6.4e+02 0.0e+00 0 0 0 0 0 0 0 6 1 0 0
SFBcastBegin 44 1.0 1.1101e-03 2.0 0.00e+00 0.0 9.4e+04 7.4e+02 0.0e+00 0 0 0 0 0 0 0 13 3 0 0
SFBcastEnd 44 1.0 1.5342e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
GAMG: createProl 6 1.0 5.1834e+01 1.0 0.00e+00 0.0 3.6e+05 5.4e+03 2.8e+02 44 0 0 0 1 100 0 48 73 46 0
GAMG: partLevel 6 1.0 1.3778e-01 1.0 1.13e+07 1.3 1.2e+05 2.6e+03 2.4e+02 0 0 0 0 1 0 23 16 12 40 16232
repartition 3 1.0 8.8692e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
Invert-Sort 3 1.0 8.7905e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Move A 3 1.0 5.3742e-03 1.4 0.00e+00 0.0 1.5e+03 9.0e+02 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
Move P 3 1.0 4.1795e-03 1.5 0.00e+00 0.0 1.7e+03 1.7e+01 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
PCSetUp 2 1.0 5.1975e+01 1.0 1.13e+07 1.3 4.7e+05 4.7e+03 5.6e+02 44 0 0 1 1 100 23 63 85 91 43
PCSetUpOnBlocks 7 1.0 3.1543e-04 2.6 8.32e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3
PCApply 7 1.0 5.0538e-02 1.0 3.18e+07 1.2 2.6e+05 1.3e+03 1.4e+01 0 0 0 0 0 0 66 35 13 2 129023
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 6.4867e+01 1.0 3.69e+10 1.2 2.7e+08 1.5e+03 3.7e+04 55100100 99 98 100100100100100 117510
VecTDot 14000 1.0 4.1821e+00 1.2 7.56e+08 1.0 0.0e+00 0.0e+00 1.4e+04 3 2 0 0 37 6 2 0 0 38 39046
VecNorm 9000 1.0 1.8765e+00 1.1 4.86e+08 1.0 0.0e+00 0.0e+00 9.0e+03 2 1 0 0 24 3 1 0 0 24 55943
VecScale 42000 1.0 1.6005e-01 1.9 9.47e+07 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 103207
VecCopy 1000 1.0 5.0528e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 147000 1.0 1.0271e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 14000 1.0 5.3448e-01 1.1 7.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 1 2 0 0 0 305521
VecAYPX 49000 1.0 7.6484e-01 1.4 6.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 180606
VecScatterBegin 176000 1.0 3.5663e+00 2.2 0.00e+00 0.0 2.7e+08 1.5e+03 0.0e+00 2 0100 99 0 4 0100100 0 0
VecScatterEnd 176000 1.0 1.6369e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 11 0 0 0 0 21 0 0 0 0 0
MatMult 50000 1.0 1.2707e+01 1.2 1.05e+10 1.1 9.2e+07 2.1e+03 0.0e+00 10 28 34 49 0 18 28 34 49 0 170833
MatMultAdd 42000 1.0 9.3107e+00 1.8 2.40e+09 1.3 4.8e+07 7.1e+02 0.0e+00 7 6 18 9 0 12 6 18 9 0 51877
MatMultTranspose 42000 1.0 8.5215e+00 1.5 2.40e+09 1.3 4.8e+07 7.1e+02 0.0e+00 6 6 18 9 0 10 6 18 9 0 56681
MatSolve 7000 0.0 5.1229e-02 0.0 1.62e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 32
MatSOR 84000 1.0 2.9626e+01 1.1 1.90e+10 1.2 8.3e+07 1.6e+03 1.4e+04 24 51 31 33 37 44 51 31 33 38 132181
MatResidual 42000 1.0 1.1003e+01 1.2 7.97e+09 1.2 8.3e+07 1.6e+03 0.0e+00 8 21 31 33 0 15 21 31 33 0 148118
PCSetUpOnBlocks 7000 1.0 5.6956e-02 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 7000 1.0 5.5983e+01 1.0 3.18e+10 1.2 2.6e+08 1.3e+03 1.4e+04 48 85 97 84 37 86 85 97 84 38 116316
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 9 11424 0.
DMKSP interface 1 1 656 0.
Vector 4 52 2374752 0.
Matrix 0 65 14268388 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 18 142504 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 14 233696 0.
Preconditioner 1 9 9676 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 8 0 0 0.
Vector 170 122 3205936 0.
Matrix 148 83 22245544 0.
Matrix Coarsen 6 6 3816 0.
Index Set 128 112 559424 0.
Star Forest Graph 12 12 10368 0.
Vec Scatter 34 21 26544 0.
Preconditioner 8 0 0 0.
--- Event Stage 2: Remaining Solves
Vector 42000 42000 2279704000 0.
========================================================================================================================
Average time to get PetscTime(): 8.10623e-07
Average time for MPI_Barrier(): 8.63075e-06
Average time for zero size MPI_Send(): 6.39315e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff.png
Type: image/png
Size: 123397 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180610/2e23a692/attachment-0002.png>
-------------- next part --------------
using 216 of 216 processes
30^3 unknowns per processor
total system size: 180^3
mesh size: 0.0001
initsolve: 7 iterations
solve 1: 7 iterations
solve 2: 7 iterations
solve 3: 7 iterations
solve 4: 7 iterations
solve 5: 7 iterations
solve 6: 7 iterations
solve 7: 7 iterations
solve 8: 7 iterations
solve 9: 7 iterations
solve 10: 7 iterations
solve 20: 7 iterations
solve 30: 7 iterations
solve 40: 7 iterations
solve 50: 7 iterations
solve 60: 7 iterations
solve 70: 7 iterations
solve 80: 7 iterations
solve 90: 7 iterations
solve 100: 7 iterations
solve 200: 7 iterations
solve 300: 7 iterations
solve 400: 7 iterations
solve 500: 7 iterations
solve 600: 7 iterations
solve 700: 7 iterations
solve 800: 7 iterations
solve 900: 7 iterations
solve 1000: 7 iterations
Time in solve(): 107.655 s
Time in KSPSolve(): 107.398 s (99.7609%)
Number of KSP iterations (total): 7000
Number of solve iterations (total): 1000 (ratio: 7.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdwd-0001 with 216 processors, by jczhang Sat Jun 9 16:36:21 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 2.108e+02 1.00001 2.108e+02
Objects: 4.253e+04 1.00002 4.253e+04
Flop: 3.698e+10 1.15842 3.534e+10 7.632e+12
Flop/sec: 1.754e+08 1.15842 1.676e+08 3.621e+10
MPI Messages: 1.858e+06 3.50879 1.262e+06 2.725e+08
MPI Message Lengths: 2.275e+09 2.20338 1.459e+03 3.975e+11
MPI Reductions: 3.764e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 9.0703e-02 0.0% 0.0000e+00 0.0% 2.160e+03 0.0% 1.802e+03 0.0% 1.700e+01 0.0%
1: First Solve: 1.0304e+02 48.9% 9.8678e+09 0.1% 7.475e+05 0.3% 3.517e+03 0.7% 6.130e+02 1.6%
2: Remaining Solves: 1.0767e+02 51.1% 7.6225e+12 99.9% 2.718e+08 99.7% 1.453e+03 99.3% 3.700e+04 98.3%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 1.4281e-04 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 12 1.0 3.5601e-03 1.6 0.00e+00 0.0 1.6e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 2 0 0 0
BuildTwoSidedF 30 1.0 9.1126e+00 3.7 0.00e+00 0.0 1.2e+04 1.1e+04 0.0e+00 2 0 0 0 0 5 0 2 5 0 0
KSPSetUp 9 1.0 1.7250e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 1 1.0 1.0304e+02 1.0 4.82e+07 1.2 7.5e+05 3.5e+03 6.1e+02 49 0 0 1 2 100100100100100 96
VecTDot 14 1.0 3.1445e-03 2.3 7.56e+05 1.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 0 0 2 0 0 2 51930
VecNorm 9 1.0 1.2560e-03 1.8 4.86e+05 1.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 0 0 1 0 0 1 83580
VecScale 42 1.0 4.3011e-04 3.2 9.47e+04 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 38406
VecCopy 1 1.0 1.2112e-04 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 187 1.0 2.3854e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 14 1.0 9.7251e-04 1.2 7.56e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 167912
VecAYPX 49 1.0 1.6963e-03 1.5 6.46e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 81430
VecAssemblyBegin 3 1.0 8.1062e-0511.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 3 1.0 6.2227e-0516.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 183 1.0 6.0692e-03 1.9 0.00e+00 0.0 2.7e+05 1.5e+03 0.0e+00 0 0 0 0 0 0 0 36 15 0 0
VecScatterEnd 183 1.0 2.1380e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 50 1.0 2.2952e-02 1.2 1.05e+07 1.1 9.2e+04 2.1e+03 0.0e+00 0 0 0 0 0 0 22 12 7 0 94582
MatMultAdd 42 1.0 1.1678e-02 1.7 2.40e+06 1.3 4.8e+04 7.1e+02 0.0e+00 0 0 0 0 0 0 5 6 1 0 41362
MatMultTranspose 42 1.0 1.1627e-02 1.5 2.40e+06 1.3 4.8e+04 7.1e+02 0.0e+00 0 0 0 0 0 0 5 6 1 0 41542
MatSolve 7 0.0 1.3185e-04 0.0 1.62e+03 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 12
MatSOR 84 1.0 5.1574e-02 1.1 1.90e+07 1.2 8.3e+04 1.6e+03 1.4e+01 0 0 0 0 0 0 40 11 5 2 76102
MatLUFactorSym 1 1.0 7.2956e-05 4.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 1.1492e-0424.1 8.32e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 7
MatResidual 42 1.0 1.9120e-02 1.3 7.97e+06 1.2 8.3e+04 1.6e+03 0.0e+00 0 0 0 0 0 0 17 11 5 0 85240
MatAssemblyBegin 102 1.0 9.1159e+00 3.7 0.00e+00 0.0 1.2e+04 1.1e+04 0.0e+00 2 0 0 0 0 5 0 2 5 0 0
MatAssemblyEnd 102 1.0 5.0428e-02 1.1 0.00e+00 0.0 1.1e+05 2.2e+02 2.5e+02 0 0 0 0 1 0 0 15 1 40 0
MatGetRow 3100265 1.2 4.8632e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 22 0 0 0 0 45 0 0 0 0 0
MatGetRowIJ 1 0.0 1.0014e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 6 1.0 1.8569e-01 2.3 0.00e+00 0.0 1.0e+05 1.8e+04 1.2e+01 0 0 0 0 0 0 0 13 67 2 0
MatCreateSubMat 6 1.0 1.2963e-02 1.7 0.00e+00 0.0 3.2e+03 4.3e+02 9.4e+01 0 0 0 0 0 0 0 0 0 15 0
MatGetOrdering 1 0.0 1.0300e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 6 1.0 3.1326e-02 1.2 0.00e+00 0.0 4.8e+04 1.0e+03 1.2e+01 0 0 0 0 0 0 0 6 2 2 0
MatCoarsen 6 1.0 1.5702e-02 1.6 0.00e+00 0.0 9.2e+04 6.3e+02 3.2e+01 0 0 0 0 0 0 0 12 2 5 0
MatZeroEntries 6 1.0 1.6329e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 6 1.0 1.3658e-01 1.0 1.13e+07 1.3 1.2e+05 2.7e+03 9.2e+01 0 0 0 0 0 0 23 15 12 15 16374
MatPtAPSymbolic 6 1.0 8.5818e-02 1.0 0.00e+00 0.0 6.1e+04 2.8e+03 4.2e+01 0 0 0 0 0 0 0 8 6 7 0
MatPtAPNumeric 6 1.0 5.0997e-02 1.0 1.13e+07 1.3 5.5e+04 2.6e+03 4.8e+01 0 0 0 0 0 0 23 7 5 8 43853
MatGetLocalMat 6 1.0 2.9812e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 6 1.0 5.5532e-03 1.8 0.00e+00 0.0 3.6e+04 3.7e+03 0.0e+00 0 0 0 0 0 0 0 5 5 0 0
SFSetGraph 12 1.0 1.1659e-04 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 12 1.0 6.0501e-03 1.1 0.00e+00 0.0 4.8e+04 6.4e+02 0.0e+00 0 0 0 0 0 0 0 6 1 0 0
SFBcastBegin 44 1.0 1.5197e-03 1.8 0.00e+00 0.0 9.4e+04 7.4e+02 0.0e+00 0 0 0 0 0 0 0 13 3 0 0
SFBcastEnd 44 1.0 2.7242e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
GAMG: createProl 6 1.0 1.0273e+02 1.0 0.00e+00 0.0 3.6e+05 5.4e+03 2.8e+02 49 0 0 0 1 100 0 48 73 46 0
GAMG: partLevel 6 1.0 1.5403e-01 1.0 1.13e+07 1.3 1.2e+05 2.6e+03 2.4e+02 0 0 0 0 1 0 23 16 12 40 14519
repartition 3 1.0 1.0488e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
Invert-Sort 3 1.0 1.2918e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Move A 3 1.0 1.0398e-02 2.1 0.00e+00 0.0 1.5e+03 9.0e+02 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
Move P 3 1.0 8.6331e-03 2.7 0.00e+00 0.0 1.7e+03 1.7e+01 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
PCSetUp 2 1.0 1.0290e+02 1.0 1.13e+07 1.3 4.7e+05 4.7e+03 5.6e+02 49 0 0 1 1 100 23 63 85 91 22
PCSetUpOnBlocks 7 1.0 5.5504e-04 2.8 8.32e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1
PCApply 7 1.0 9.0100e-02 1.0 3.18e+07 1.2 2.6e+05 1.3e+03 1.4e+01 0 0 0 0 0 0 66 35 13 2 72371
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 1.0742e+02 1.0 3.69e+10 1.2 2.7e+08 1.5e+03 3.7e+04 51100100 99 98 100100100100100 70961
VecTDot 14000 1.0 5.2095e+00 1.4 7.56e+08 1.0 0.0e+00 0.0e+00 1.4e+04 2 2 0 0 37 4 2 0 0 38 31345
VecNorm 9000 1.0 1.9601e+00 1.1 4.86e+08 1.0 0.0e+00 0.0e+00 9.0e+03 1 1 0 0 24 2 1 0 0 24 53556
VecScale 42000 1.0 3.1761e-01 2.3 9.47e+07 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 52009
VecCopy 1000 1.0 8.1215e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 147000 1.0 1.9388e+00 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
VecAXPY 14000 1.0 9.6349e-01 1.2 7.56e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 1 2 0 0 0 169484
VecAYPX 49000 1.0 1.4718e+00 1.3 6.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 93851
VecScatterBegin 176000 1.0 5.5100e+00 2.0 0.00e+00 0.0 2.7e+08 1.5e+03 0.0e+00 2 0100 99 0 4 0100100 0 0
VecScatterEnd 176000 1.0 2.6342e+01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 18 0 0 0 0 0
MatMult 50000 1.0 2.2700e+01 1.2 1.05e+10 1.1 9.2e+07 2.1e+03 0.0e+00 10 28 34 49 0 19 28 34 49 0 95632
MatMultAdd 42000 1.0 1.4646e+01 1.7 2.40e+09 1.3 4.8e+07 7.1e+02 0.0e+00 5 6 18 9 0 10 6 18 9 0 32979
MatMultTranspose 42000 1.0 1.3086e+01 1.6 2.40e+09 1.3 4.8e+07 7.1e+02 0.0e+00 5 6 18 9 0 9 6 18 9 0 36911
MatSolve 7000 0.0 8.7209e-02 0.0 1.62e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 19
MatSOR 84000 1.0 5.3951e+01 1.1 1.90e+10 1.2 8.3e+07 1.6e+03 1.4e+04 24 51 31 33 37 47 51 31 33 38 72584
MatResidual 42000 1.0 1.9184e+01 1.2 7.97e+09 1.2 8.3e+07 1.6e+03 0.0e+00 8 21 31 33 0 16 21 31 33 0 84954
PCSetUpOnBlocks 7000 1.0 1.1972e-01 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 7000 1.0 9.4526e+01 1.0 3.18e+10 1.2 2.6e+08 1.3e+03 1.4e+04 45 85 97 84 37 87 85 97 84 38 68888
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 9 11424 0.
DMKSP interface 1 1 656 0.
Vector 4 52 2374752 0.
Matrix 0 65 14268388 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 18 142504 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 14 233696 0.
Preconditioner 1 9 9676 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 8 0 0 0.
Vector 170 122 3205936 0.
Matrix 148 83 22245544 0.
Matrix Coarsen 6 6 3816 0.
Index Set 128 112 559424 0.
Star Forest Graph 12 12 10368 0.
Vec Scatter 34 21 26544 0.
Preconditioner 8 0 0 0.
--- Event Stage 2: Remaining Solves
Vector 42000 42000 2279704000 0.
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 1.1158e-05
Average time for zero size MPI_Send(): 6.42626e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
srun: Warning: can't honor --ntasks-per-node set to 18 which doesn't match the requested tasks 96 with the number of requested nodes 96. Ignoring --ntasks-per-node.
using 1728 of 1728 processes
30^3 unknowns per processor
total system size: 360^3
mesh size: 0.0001
initsolve: 8 iterations
solve 1: 8 iterations
solve 2: 8 iterations
solve 3: 8 iterations
solve 4: 8 iterations
solve 5: 8 iterations
solve 6: 8 iterations
solve 7: 8 iterations
solve 8: 8 iterations
solve 9: 8 iterations
solve 10: 8 iterations
solve 20: 8 iterations
solve 30: 8 iterations
solve 40: 8 iterations
solve 50: 8 iterations
solve 60: 8 iterations
solve 70: 8 iterations
solve 80: 8 iterations
solve 90: 8 iterations
solve 100: 8 iterations
solve 200: 8 iterations
solve 300: 8 iterations
solve 400: 8 iterations
solve 500: 8 iterations
solve 600: 8 iterations
solve 700: 8 iterations
solve 800: 8 iterations
solve 900: 8 iterations
solve 1000: 8 iterations
Time in solve(): 114.7 s
Time in KSPSolve(): 114.54 s (99.8607%)
Number of KSP iterations (total): 8000
Number of solve iterations (total): 1000 (ratio: 8.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0129 with 1728 processors, by jczhang Sat Jun 9 16:39:40 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 1.698e+02 1.00003 1.698e+02
Objects: 4.854e+04 1.00002 4.854e+04
Flop: 4.220e+10 1.15865 4.125e+10 7.129e+13
Flop/sec: 2.485e+08 1.15864 2.429e+08 4.197e+11
MPI Messages: 2.548e+06 4.16034 1.730e+06 2.989e+09
MPI Message Lengths: 2.592e+09 2.20360 1.356e+03 4.053e+12
MPI Reductions: 4.266e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 5.2859e-02 0.0% 0.0000e+00 0.0% 1.901e+04 0.0% 1.802e+03 0.0% 1.700e+01 0.0%
1: First Solve: 5.5079e+01 32.4% 8.9686e+10 0.1% 7.756e+06 0.3% 3.230e+03 0.6% 6.410e+02 1.5%
2: Remaining Solves: 1.1471e+02 67.5% 7.1197e+13 99.9% 2.981e+09 99.7% 1.351e+03 99.4% 4.200e+04 98.4%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 7.0095e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 12 1.0 1.4518e-02 4.5 0.00e+00 0.0 1.6e+05 4.0e+00 0.0e+00 0 0 0 0 0 0 0 2 0 0 0
BuildTwoSidedF 30 1.0 6.1632e+00 3.4 0.00e+00 0.0 1.2e+05 1.1e+04 0.0e+00 2 0 0 0 0 5 0 1 5 0 0
KSPSetUp 9 1.0 8.9891e-03 5.7 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 1 1.0 5.5078e+01 1.0 5.33e+07 1.2 7.8e+06 3.2e+03 6.4e+02 32 0 0 1 2 100100100100100 1628
VecTDot 16 1.0 4.4720e-03 2.3 8.64e+05 1.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 0 0 2 0 0 2 333846
VecNorm 10 1.0 9.3472e-0310.8 5.40e+05 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 1 0 0 2 99829
VecScale 48 1.0 1.4431e-0316.4 1.08e+05 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 117314
VecCopy 1 1.0 7.4291e-0419.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 208 1.0 5.8553e-03 7.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 16 1.0 6.4611e-04 1.3 8.64e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 2310724
VecAYPX 56 1.0 1.0061e-03 1.5 7.42e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 1268367
VecAssemblyBegin 3 1.0 4.9114e-0517.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 3 1.0 3.6001e-0518.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 208 1.0 1.1714e-02 5.9 0.00e+00 0.0 3.0e+06 1.4e+03 0.0e+00 0 0 0 0 0 0 0 38 16 0 0
VecScatterEnd 208 1.0 5.5308e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 57 1.0 3.4521e-02 2.8 1.19e+07 1.1 1.0e+06 2.0e+03 0.0e+00 0 0 0 0 0 0 23 13 8 0 584742
MatMultAdd 48 1.0 4.0031e-02 6.8 2.75e+06 1.3 5.3e+05 6.6e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 114344
MatMultTranspose 48 1.0 3.1601e-02 4.8 2.75e+06 1.3 5.3e+05 6.6e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 144847
MatSolve 8 0.0 5.8174e-05 0.0 2.90e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 498
MatSOR 96 1.0 6.4496e-02 1.5 2.18e+07 1.2 9.2e+05 1.5e+03 1.6e+01 0 0 0 0 0 0 41 12 5 2 569549
MatLUFactorSym 1 1.0 3.5360e-03296.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 3.3751e-031179.7 5.07e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 15
MatResidual 48 1.0 3.2003e-02 3.4 9.11e+06 1.2 9.2e+05 1.5e+03 0.0e+00 0 0 0 0 0 0 17 12 5 0 478619
MatAssemblyBegin 102 1.0 6.1656e+00 3.4 0.00e+00 0.0 1.2e+05 1.1e+04 0.0e+00 2 0 0 0 0 5 0 1 5 0 0
MatAssemblyEnd 102 1.0 8.5799e-02 1.1 0.00e+00 0.0 1.1e+06 2.0e+02 2.5e+02 0 0 0 0 1 0 0 14 1 39 0
MatGetRow 3100266 1.2 2.5176e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 44 0 0 0 0 0
MatGetRowIJ 1 0.0 1.2875e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 6 1.0 2.0034e-01 2.2 0.00e+00 0.0 1.0e+06 1.6e+04 1.2e+01 0 0 0 0 0 0 0 13 67 2 0
MatCreateSubMat 6 1.0 4.2367e-02 1.0 0.00e+00 0.0 3.7e+04 3.4e+02 9.4e+01 0 0 0 0 0 0 0 0 0 15 0
MatGetOrdering 1 0.0 6.2943e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 6 1.0 6.1879e-02 1.2 0.00e+00 0.0 4.6e+05 9.9e+02 1.2e+01 0 0 0 0 0 0 0 6 2 2 0
MatCoarsen 6 1.0 2.6180e-02 1.1 0.00e+00 0.0 9.7e+05 5.5e+02 5.4e+01 0 0 0 0 0 0 0 12 2 8 0
MatZeroEntries 6 1.0 1.7581e-03 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 6 1.0 2.1420e-01 1.0 1.11e+07 1.3 1.1e+06 2.5e+03 9.3e+01 0 0 0 0 0 0 21 15 11 15 85980
MatPtAPSymbolic 6 1.0 1.2393e-01 1.1 0.00e+00 0.0 5.8e+05 2.7e+03 4.2e+01 0 0 0 0 0 0 0 7 6 7 0
MatPtAPNumeric 6 1.0 9.3141e-02 1.1 1.11e+07 1.3 5.5e+05 2.3e+03 4.8e+01 0 0 0 0 0 0 21 7 5 7 197734
MatGetLocalMat 6 1.0 2.1052e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 6 1.0 6.8512e-03 3.0 0.00e+00 0.0 3.4e+05 3.4e+03 0.0e+00 0 0 0 0 0 0 0 4 5 0 0
SFSetGraph 12 1.0 7.7248e-05 5.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 12 1.0 1.7301e-02 2.7 0.00e+00 0.0 4.8e+05 5.8e+02 0.0e+00 0 0 0 0 0 0 0 6 1 0 0
SFBcastBegin 66 1.0 3.3102e-03 4.7 0.00e+00 0.0 1.0e+06 6.4e+02 0.0e+00 0 0 0 0 0 0 0 13 3 0 0
SFBcastEnd 66 1.0 1.0333e-02 9.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
GAMG: createProl 6 1.0 5.4556e+01 1.0 0.00e+00 0.0 3.6e+06 5.1e+03 3.1e+02 32 0 0 0 1 99 0 46 73 48 0
GAMG: partLevel 6 1.0 3.1220e-01 1.0 1.11e+07 1.3 1.2e+06 2.4e+03 2.4e+02 0 0 0 0 1 1 21 15 11 38 58992
repartition 3 1.0 9.0489e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
Invert-Sort 3 1.0 3.5654e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Move A 3 1.0 3.5768e-02 1.0 0.00e+00 0.0 1.6e+04 7.9e+02 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
Move P 3 1.0 1.2372e-02 1.2 0.00e+00 0.0 2.2e+04 1.3e+01 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
PCSetUp 2 1.0 5.4924e+01 1.0 1.11e+07 1.3 4.8e+06 4.4e+03 5.8e+02 32 0 0 1 1 100 21 61 84 91 335
PCSetUpOnBlocks 8 1.0 3.7484e-0337.5 5.07e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 14
PCApply 8 1.0 1.1110e-01 1.0 3.64e+07 1.2 2.9e+06 1.2e+03 1.6e+01 0 0 0 0 0 0 68 37 14 2 550891
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 1.1454e+02 1.0 4.21e+10 1.2 3.0e+09 1.4e+03 4.2e+04 67100100 99 98 100100100100100 621610
VecTDot 16000 1.0 1.2048e+01 1.1 8.64e+08 1.0 0.0e+00 0.0e+00 1.6e+04 7 2 0 0 38 10 2 0 0 38 123923
VecNorm 10000 1.0 4.1293e+00 1.0 5.40e+08 1.0 0.0e+00 0.0e+00 1.0e+04 2 1 0 0 23 4 1 0 0 24 225976
VecScale 48000 1.0 7.3612e-01 9.5 1.08e+08 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 229992
VecCopy 1000 1.0 6.3032e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 168000 1.0 1.5064e+00 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 16000 1.0 6.2166e-01 1.2 8.64e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2401639
VecAYPX 56000 1.0 1.6279e+00 2.6 7.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 1 2 0 0 0 783928
VecScatterBegin 201000 1.0 4.6709e+00 2.6 0.00e+00 0.0 3.0e+09 1.4e+03 0.0e+00 2 0100 99 0 3 0100100 0 0
VecScatterEnd 201000 1.0 3.9701e+01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21 0 0 0 0 31 0 0 0 0 0
MatMult 57000 1.0 1.9548e+01 1.5 1.19e+10 1.1 1.0e+09 2.0e+03 0.0e+00 9 28 34 49 0 13 28 34 49 0 1032633
MatMultAdd 48000 1.0 2.8641e+01 3.6 2.75e+09 1.3 5.3e+08 6.6e+02 0.0e+00 14 6 18 9 0 20 6 18 9 0 159815
MatMultTranspose 48000 1.0 1.9239e+01 2.9 2.75e+09 1.3 5.3e+08 6.6e+02 0.0e+00 5 6 18 9 0 8 6 18 9 0 237923
MatSolve 8000 0.0 6.0686e-02 0.0 2.90e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 477
MatSOR 96000 1.0 5.0867e+01 1.2 2.17e+10 1.2 9.2e+08 1.5e+03 1.6e+04 28 51 31 33 38 41 51 31 34 38 720735
MatResidual 48000 1.0 1.7601e+01 1.7 9.11e+09 1.2 9.2e+08 1.5e+03 0.0e+00 8 21 31 33 0 11 22 31 34 0 870258
PCSetUpOnBlocks 8000 1.0 6.8316e-02 9.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 8000 1.0 9.5367e+01 1.0 3.63e+10 1.2 2.9e+09 1.2e+03 1.6e+04 56 86 97 84 38 83 86 97 85 38 641033
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 9 11424 0.
DMKSP interface 1 1 656 0.
Vector 4 52 2388736 0.
Matrix 0 65 15239012 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 18 197872 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 14 233696 0.
Preconditioner 1 9 9676 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 8 0 0 0.
Vector 176 128 3559744 0.
Matrix 148 83 23608464 0.
Matrix Coarsen 6 6 3816 0.
Index Set 128 112 619344 0.
Star Forest Graph 12 12 10368 0.
Vec Scatter 34 21 26544 0.
Preconditioner 8 0 0 0.
--- Event Stage 2: Remaining Solves
Vector 48000 48000 2625920000 0.
========================================================================================================================
Average time to get PetscTime(): 8.82149e-07
Average time for MPI_Barrier(): 1.65939e-05
Average time for zero size MPI_Send(): 6.39467e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
srun: Warning: can't honor --ntasks-per-node set to 36 which doesn't match the requested tasks 48 with the number of requested nodes 48. Ignoring --ntasks-per-node.
using 1728 of 1728 processes
30^3 unknowns per processor
total system size: 360^3
mesh size: 0.0001
initsolve: 8 iterations
solve 1: 8 iterations
solve 2: 8 iterations
solve 3: 8 iterations
solve 4: 8 iterations
solve 5: 8 iterations
solve 6: 8 iterations
solve 7: 8 iterations
solve 8: 8 iterations
solve 9: 8 iterations
solve 10: 8 iterations
solve 20: 8 iterations
solve 30: 8 iterations
solve 40: 8 iterations
solve 50: 8 iterations
solve 60: 8 iterations
solve 70: 8 iterations
solve 80: 8 iterations
solve 90: 8 iterations
solve 100: 8 iterations
solve 200: 8 iterations
solve 300: 8 iterations
solve 400: 8 iterations
solve 500: 8 iterations
solve 600: 8 iterations
solve 700: 8 iterations
solve 800: 8 iterations
solve 900: 8 iterations
solve 1000: 8 iterations
Time in solve(): 161.945 s
Time in KSPSolve(): 161.693 s (99.8445%)
Number of KSP iterations (total): 8000
Number of solve iterations (total): 1000 (ratio: 8.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0065 with 1728 processors, by jczhang Sat Jun 9 16:42:23 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 2.700e+02 1.00002 2.699e+02
Objects: 4.854e+04 1.00002 4.854e+04
Flop: 4.220e+10 1.15865 4.125e+10 7.129e+13
Flop/sec: 1.563e+08 1.15865 1.528e+08 2.641e+11
MPI Messages: 2.548e+06 4.16034 1.730e+06 2.989e+09
MPI Message Lengths: 2.592e+09 2.20360 1.356e+03 4.053e+12
MPI Reductions: 4.266e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.2022e-01 0.0% 0.0000e+00 0.0% 1.901e+04 0.0% 1.802e+03 0.0% 1.700e+01 0.0%
1: First Solve: 1.0787e+02 40.0% 8.9686e+10 0.1% 7.756e+06 0.3% 3.230e+03 0.6% 6.410e+02 1.5%
2: Remaining Solves: 1.6196e+02 60.0% 7.1197e+13 99.9% 2.981e+09 99.7% 1.351e+03 99.4% 4.200e+04 98.4%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 1.8120e-04 6.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 12 1.0 5.7149e-03 1.6 0.00e+00 0.0 1.6e+05 4.0e+00 0.0e+00 0 0 0 0 0 0 0 2 0 0 0
BuildTwoSidedF 30 1.0 1.1775e+01 4.0 0.00e+00 0.0 1.2e+05 1.1e+04 0.0e+00 2 0 0 0 0 5 0 1 5 0 0
KSPSetUp 9 1.0 2.5797e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 1 1.0 1.0786e+02 1.0 5.33e+07 1.2 7.8e+06 3.2e+03 6.4e+02 40 0 0 1 2 100100100100100 831
VecTDot 16 1.0 6.9978e-03 2.0 8.64e+05 1.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 0 0 2 0 0 2 213347
VecNorm 10 1.0 3.2461e-03 2.9 5.40e+05 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 1 0 0 2 287462
VecScale 48 1.0 5.1904e-04 4.7 1.08e+05 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 326185
VecCopy 1 1.0 1.3995e-04 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 208 1.0 3.8517e-03 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 16 1.0 1.2281e-03 1.4 8.64e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 1215698
VecAYPX 56 1.0 1.9097e-03 1.7 7.42e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 668228
VecAssemblyBegin 3 1.0 4.3488e-04140.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 3 1.0 6.6996e-0535.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 208 1.0 7.2007e-03 2.1 0.00e+00 0.0 3.0e+06 1.4e+03 0.0e+00 0 0 0 0 0 0 0 38 16 0 0
VecScatterEnd 208 1.0 4.7876e-02 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 57 1.0 3.8546e-02 1.8 1.19e+07 1.1 1.0e+06 2.0e+03 0.0e+00 0 0 0 0 0 0 23 13 8 0 523683
MatMultAdd 48 1.0 2.8487e-02 2.9 2.75e+06 1.3 5.3e+05 6.6e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 160680
MatMultTranspose 48 1.0 2.5251e-02 2.7 2.75e+06 1.3 5.3e+05 6.6e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 181274
MatSolve 8 0.0 5.1498e-05 0.0 2.90e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 562
MatSOR 96 1.0 6.6756e-02 1.2 2.18e+07 1.2 9.2e+05 1.5e+03 1.6e+01 0 0 0 0 0 0 41 12 5 2 550259
MatLUFactorSym 1 1.0 1.5211e-0412.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 1.2302e-0443.0 5.07e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 412
MatResidual 48 1.0 3.3680e-02 2.0 9.11e+06 1.2 9.2e+05 1.5e+03 0.0e+00 0 0 0 0 0 0 17 12 5 0 454784
MatAssemblyBegin 102 1.0 1.1779e+01 4.0 0.00e+00 0.0 1.2e+05 1.1e+04 0.0e+00 2 0 0 0 0 5 0 1 5 0 0
MatAssemblyEnd 102 1.0 6.4854e-02 1.1 0.00e+00 0.0 1.1e+06 2.0e+02 2.5e+02 0 0 0 0 1 0 0 14 1 39 0
MatGetRow 3100266 1.2 5.0476e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 18 0 0 0 0 45 0 0 0 0 0
MatGetRowIJ 1 0.0 1.2159e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 6 1.0 1.8868e-01 2.4 0.00e+00 0.0 1.0e+06 1.6e+04 1.2e+01 0 0 0 0 0 0 0 13 67 2 0
MatCreateSubMat 6 1.0 3.1791e-02 1.2 0.00e+00 0.0 3.7e+04 3.4e+02 9.4e+01 0 0 0 0 0 0 0 0 0 15 0
MatGetOrdering 1 0.0 5.8889e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 6 1.0 7.8569e-02 1.2 0.00e+00 0.0 4.6e+05 9.9e+02 1.2e+01 0 0 0 0 0 0 0 6 2 2 0
MatCoarsen 6 1.0 1.7872e-02 1.5 0.00e+00 0.0 9.7e+05 5.5e+02 5.4e+01 0 0 0 0 0 0 0 12 2 8 0
MatZeroEntries 6 1.0 1.6198e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatPtAP 6 1.0 1.8265e-01 1.0 1.11e+07 1.3 1.1e+06 2.5e+03 9.3e+01 0 0 0 0 0 0 21 15 11 15 100831
MatPtAPSymbolic 6 1.0 1.1047e-01 1.0 0.00e+00 0.0 5.8e+05 2.7e+03 4.2e+01 0 0 0 0 0 0 0 7 6 7 0
MatPtAPNumeric 6 1.0 7.1542e-02 1.0 1.11e+07 1.3 5.5e+05 2.3e+03 4.8e+01 0 0 0 0 0 0 21 7 5 7 257431
MatGetLocalMat 6 1.0 2.9891e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 6 1.0 4.4601e-03 1.6 0.00e+00 0.0 3.4e+05 3.4e+03 0.0e+00 0 0 0 0 0 0 0 4 5 0 0
SFSetGraph 12 1.0 1.3447e-04 7.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 12 1.0 8.3802e-03 1.2 0.00e+00 0.0 4.8e+05 5.8e+02 0.0e+00 0 0 0 0 0 0 0 6 1 0 0
SFBcastBegin 66 1.0 2.1842e-03 2.1 0.00e+00 0.0 1.0e+06 6.4e+02 0.0e+00 0 0 0 0 0 0 0 13 3 0 0
SFBcastEnd 66 1.0 3.4332e-03 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
GAMG: createProl 6 1.0 1.0743e+02 1.0 0.00e+00 0.0 3.6e+06 5.1e+03 3.1e+02 40 0 0 0 1 100 0 46 73 48 0
GAMG: partLevel 6 1.0 2.3124e-01 1.0 1.11e+07 1.3 1.2e+06 2.4e+03 2.4e+02 0 0 0 0 1 0 21 15 11 38 79645
repartition 3 1.0 2.2879e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 0 0 0 0 0 0 0 0 0 3 0
Invert-Sort 3 1.0 1.0046e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Move A 3 1.0 2.2692e-02 1.4 0.00e+00 0.0 1.6e+04 7.9e+02 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
Move P 3 1.0 1.7002e-02 1.5 0.00e+00 0.0 2.2e+04 1.3e+01 5.0e+01 0 0 0 0 0 0 0 0 0 8 0
PCSetUp 2 1.0 1.0768e+02 1.0 1.11e+07 1.3 4.8e+06 4.4e+03 5.8e+02 40 0 0 1 1 100 21 61 84 91 171
PCSetUpOnBlocks 8 1.0 5.9175e-04 6.5 5.07e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 86
PCApply 8 1.0 1.2410e-01 1.0 3.64e+07 1.2 2.9e+06 1.2e+03 1.6e+01 0 0 0 0 0 0 68 37 14 2 493181
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 1.6171e+02 1.0 4.21e+10 1.2 3.0e+09 1.4e+03 4.2e+04 60100100 99 98 100100100100100 440278
VecTDot 16000 1.0 1.4152e+01 1.3 8.64e+08 1.0 0.0e+00 0.0e+00 1.6e+04 5 2 0 0 38 8 2 0 0 38 105498
VecNorm 10000 1.0 5.2611e+00 1.2 5.40e+08 1.0 0.0e+00 0.0e+00 1.0e+04 2 1 0 0 23 3 1 0 0 24 177362
VecScale 48000 1.0 3.4804e-01 3.2 1.08e+08 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 486441
VecCopy 1000 1.0 8.5613e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 168000 1.0 2.5884e+00 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 16000 1.0 1.0942e+00 1.2 8.64e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 1 2 0 0 0 1364458
VecAYPX 56000 1.0 2.1396e+00 1.8 7.42e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 596431
VecScatterBegin 201000 1.0 7.2699e+00 2.3 0.00e+00 0.0 3.0e+09 1.4e+03 0.0e+00 2 0100 99 0 3 0100100 0 0
VecScatterEnd 201000 1.0 5.0307e+01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 24 0 0 0 0 0
MatMult 57000 1.0 3.0696e+01 1.4 1.19e+10 1.1 1.0e+09 2.0e+03 0.0e+00 9 28 34 49 0 15 28 34 49 0 657590
MatMultAdd 48000 1.0 3.1671e+01 2.8 2.75e+09 1.3 5.3e+08 6.6e+02 0.0e+00 10 6 18 9 0 16 6 18 9 0 144528
MatMultTranspose 48000 1.0 2.4791e+01 2.5 2.75e+09 1.3 5.3e+08 6.6e+02 0.0e+00 5 6 18 9 0 8 6 18 9 0 184633
MatSolve 8000 0.0 5.5241e-02 0.0 2.90e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 524
MatSOR 96000 1.0 7.5631e+01 1.1 2.17e+10 1.2 9.2e+08 1.5e+03 1.6e+04 26 51 31 33 38 44 51 31 34 38 484742
MatResidual 48000 1.0 2.6641e+01 1.5 9.11e+09 1.2 9.2e+08 1.5e+03 0.0e+00 8 21 31 33 0 13 22 31 34 0 574955
PCSetUpOnBlocks 8000 1.0 1.2669e-0116.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 8000 1.0 1.3677e+02 1.0 3.63e+10 1.2 2.9e+09 1.2e+03 1.6e+04 50 86 97 84 38 84 86 97 85 38 446993
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 9 11424 0.
DMKSP interface 1 1 656 0.
Vector 4 52 2388736 0.
Matrix 0 65 15239012 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 18 197872 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 14 233696 0.
Preconditioner 1 9 9676 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 8 0 0 0.
Vector 176 128 3559744 0.
Matrix 148 83 23608464 0.
Matrix Coarsen 6 6 3816 0.
Index Set 128 112 619344 0.
Star Forest Graph 12 12 10368 0.
Vec Scatter 34 21 26544 0.
Preconditioner 8 0 0 0.
--- Event Stage 2: Remaining Solves
Vector 48000 48000 2625920000 0.
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 0.000298405
Average time for zero size MPI_Send(): 7.23203e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: trace.png
Type: image/png
Size: 317483 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180610/2e23a692/attachment-0003.png>
More information about the petsc-dev
mailing list