[petsc-dev] [petsc-users] Poor weak scaling when solving successive linearsystems
Junchao Zhang
jczhang at mcs.anl.gov
Tue Jun 12 11:32:28 CDT 2018
Mark,
I tried "-pc_gamg_type agg ..." options you mentioned, and also -ksp_type
cg + PETSc's default PC bjacobi. In the latter case, to reduce execution
time I called KSPSolve 100 times instead of 1000, and also used -ksp_max_it
100. In the 36x48=1728 ranks case, I also did a test with -log_sync. From
there you can see a lot of time is spent on VecNormBarrier, which implies
load imbalance. Note VecScatterBarrie time is misleading, since it barriers
ALL ranks, but in reality VecScatter sort of syncs in a small neighborhood.
Barry suggested trying periodic boundary condition so that the nonzeros
are perfectly balanced across processes. I will try that to see what
happens.
--Junchao Zhang
On Mon, Jun 11, 2018 at 8:09 AM, Mark Adams <mfadams at lbl.gov> wrote:
>
>
> On Mon, Jun 11, 2018 at 12:46 AM, Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
>
>> I used an LCRC machine named Bebop. I tested on its Intel Broadwell
>> nodes. Each nodes has 2 CPUs and 36 cores in total. I collected data using
>> 36 cores in a node or 18 cores in a node. As you can see, 18 cores/node
>> gave much better performance, which is reasonable as routines like MatSOR,
>> MatMult, MatMultAdd are all bandwidth bound.
>>
>> The code uses a DMDA 3D grid, 7-point stencil, and defines
>> nodes(vertices) at the surface or second to the surface as boundary nodes.
>> Boundary nodes only have a diagonal one in their row in the matrix.
>> Interior nodes have 7 nonzeros in their row. Boundary processors in the
>> processor grid has less nonzero. This is one source of load-imbalance. Will
>> load-imbalance get severer at coarser grids of an MG level?
>>
>
> Yes.
>
> You can use a simple Jacobi solver to see the basic performance of your
> operator and machine. Do you see as much time spent in Vec Scatters?
> VecAXPY? etc.
>
>
>>
>> I attach a trace view figure that show activity of each ranks along the
>> time axis in one KSPSove. White color means MPI wait. You can see white
>> takes a large space.
>>
>> I don't have a good explanation why at large scale (1728 cores),
>> processors wait longer time, as the communication pattern is still 7-point
>> stencil in a cubic processor gird.
>>
>> --Junchao Zhang
>>
>> On Sat, Jun 9, 2018 at 11:32 AM, Smith, Barry F. <bsmith at mcs.anl.gov>
>> wrote:
>>
>>>
>>> Junchao,
>>>
>>> Thanks, the load balance of matrix entries is remarkably similar
>>> for the two runs so it can't be a matter of worse work load imbalance for
>>> SOR for the larger case explaining why the SOR takes more time.
>>>
>>> Here is my guess (and I know no way to confirm it). In the smaller
>>> case the overlap of different processes on the same node running SOR at the
>>> same time is lower than the larger case hence the larger case is slower
>>> because there are more SOR processes fighting over the same memory
>>> bandwidth at the same time than in the smaller case. Ahh, here is
>>> something you can try, lets undersubscribe the memory bandwidth needs, run
>>> on say 16 processes per node with 8 nodes and 16 processes per node with 64
>>> nodes and send the two -log_view output files. I assume this is an LCRC
>>> machine and NOT a KNL system?
>>>
>>> Thanks
>>>
>>>
>>> Barry
>>>
>>>
>>> > On Jun 9, 2018, at 8:29 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>> >
>>> > -pc_gamg_type classical
>>> >
>>> > FYI, we only support smoothed aggregation "agg" (the default). (This
>>> thread started by saying you were using GAMG.)
>>> >
>>> > It is not clear how much this will make a difference for you, but you
>>> don't want to use classical because we do not support it. It is meant as a
>>> reference implementation for developers.
>>> >
>>> > First, how did you get the idea to use classical? If the documentation
>>> lead you to believe this was a good thing to do then we need to fix that!
>>> >
>>> > Anyway, here is a generic input for GAMG:
>>> >
>>> > -pc_type gamg
>>> > -pc_gamg_type agg
>>> > -pc_gamg_agg_nsmooths 1
>>> > -pc_gamg_coarse_eq_limit 1000
>>> > -pc_gamg_reuse_interpolation true
>>> > -pc_gamg_square_graph 1
>>> > -pc_gamg_threshold 0.05
>>> > -pc_gamg_threshold_scale .0
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Jun 7, 2018 at 6:52 PM, Junchao Zhang <jczhang at mcs.anl.gov>
>>> wrote:
>>> > OK, I have thought that space was a typo. btw, this option does not
>>> show up in -h.
>>> > I changed number of ranks to use all cores on each node to avoid
>>> misleading ratio in -log_view. Since one node has 36 cores, I ran with
>>> 6^3=216 ranks, and 12^3=1728 ranks. I also found call counts of MatSOR etc
>>> in the two tests were different. So they are not strict weak scaling tests.
>>> I tried to add -ksp_max_it 6 -pc_mg_levels 6, but still could not make the
>>> two have the same MatSOR count. Anyway, I attached the load balance output.
>>> >
>>> > I find PCApply_MG calls PCMGMCycle_Private, which is recursive and
>>> indirectly calls MatSOR_MPIAIJ. I believe the following code in
>>> MatSOR_MPIAIJ practically syncs {MatSOR, MatMultAdd}_SeqAIJ between
>>> processors through VecScatter at each MG level. If SOR and MatMultAdd are
>>> imbalanced, the cost is accumulated along MG levels and shows up as large
>>> VecScatter cost.
>>> > 1460: while
>>> > (its--) {
>>> >
>>> > 1461: VecScatterBegin(mat->Mvctx,xx
>>> ,mat->lvec,INSERT_VALUES,SCATTER_FORWARD
>>> > );
>>> >
>>> > 1462: VecScatterEnd(mat->Mvctx,xx,m
>>> at->lvec,INSERT_VALUES,SCATTER_FORWARD
>>> > );
>>> >
>>> >
>>> > 1464: /* update rhs: bb1 = bb - B*x */
>>> > 1465: VecScale
>>> > (mat->lvec,-1.0);
>>> >
>>> > 1466: (*mat->B->ops->multadd)(mat->
>>> > B,mat->lvec,bb,bb1);
>>> >
>>> >
>>> > 1468: /* local sweep */
>>> > 1469: (*mat->A->ops->sor)(mat->A,bb1,omega,SOR_SYMMETRIC_SWEEP,
>>> > fshift,lits,1,xx);
>>> >
>>> > 1470: }
>>> >
>>> >
>>> >
>>> > --Junchao Zhang
>>> >
>>> > On Thu, Jun 7, 2018 at 3:11 PM, Smith, Barry F. <bsmith at mcs.anl.gov>
>>> wrote:
>>> >
>>> >
>>> > > On Jun 7, 2018, at 12:27 PM, Zhang, Junchao <jczhang at mcs.anl.gov>
>>> wrote:
>>> > >
>>> > > Searched but could not find this option, -mat_view::load_balance
>>> >
>>> > There is a space between the view and the : load_balance is a
>>> particular viewer format that causes the printing of load balance
>>> information about number of nonzeros in the matrix.
>>> >
>>> > Barry
>>> >
>>> > >
>>> > > --Junchao Zhang
>>> > >
>>> > > On Thu, Jun 7, 2018 at 10:46 AM, Smith, Barry F. <bsmith at mcs.anl.gov>
>>> wrote:
>>> > > So the only surprise in the results is the SOR. It is
>>> embarrassingly parallel and normally one would not see a jump.
>>> > >
>>> > > The load balance for SOR time 1.5 is better at 1000 processes than
>>> for 125 processes of 2.1 not worse so this number doesn't easily explain
>>> it.
>>> > >
>>> > > Could you run the 125 and 1000 with -mat_view ::load_balance and
>>> see what you get out?
>>> > >
>>> > > Thanks
>>> > >
>>> > > Barry
>>> > >
>>> > > Notice that the MatSOR time jumps a lot about 5 secs when the
>>> -log_sync is on. My only guess is that the MatSOR is sharing memory
>>> bandwidth (or some other resource? cores?) with the VecScatter and for some
>>> reason this is worse for 1000 cores but I don't know why.
>>> > >
>>> > > > On Jun 6, 2018, at 9:13 PM, Junchao Zhang <jczhang at mcs.anl.gov>
>>> wrote:
>>> > > >
>>> > > > Hi, PETSc developers,
>>> > > > I tested Michael Becker's code. The code calls the same KSPSolve
>>> 1000 times in the second stage and needs cubic number of processors to run.
>>> I ran with 125 ranks and 1000 ranks, with or without -log_sync option. I
>>> attach the log view output files and a scaling loss excel file.
>>> > > > I profiled the code with 125 processors. It looks {MatSOR,
>>> MatMult, MatMultAdd, MatMultTranspose, MatMultTransposeAdd}_SeqAIJ in aij.c
>>> took ~50% of the time, The other half time was spent on waiting in MPI.
>>> MatSOR_SeqAIJ took 30%, mostly in PetscSparseDenseMinusDot().
>>> > > > I tested it on a 36 cores/node machine. I found 32 ranks/node
>>> gave better performance (about 10%) than 36 ranks/node in the 125 ranks
>>> testing. I guess it is because processors in the former had more balanced
>>> memory bandwidth. I collected PAPI_DP_OPS (double precision operations) and
>>> PAPI_TOT_CYC (total cycles) of the 125 ranks case (see the attached files).
>>> It looks ranks at the two ends have less DP_OPS and TOT_CYC.
>>> > > > Does anyone familiar with the algorithm have quick explanations?
>>> > > >
>>> > > > --Junchao Zhang
>>> > > >
>>> > > > On Mon, Jun 4, 2018 at 11:59 AM, Michael Becker <
>>> Michael.Becker at physik.uni-giessen.de> wrote:
>>> > > > Hello again,
>>> > > >
>>> > > > this took me longer than I anticipated, but here we go.
>>> > > > I did reruns of the cases where only half the processes per node
>>> were used (without -log_sync):
>>> > > >
>>> > > > 125 procs,1st 125 procs,2nd
>>> 1000 procs,1st 1000 procs,2nd
>>> > > > Max Ratio Max Ratio
>>> Max Ratio Max Ratio
>>> > > > KSPSolve 1.203E+02 1.0 1.210E+02 1.0
>>> 1.399E+02 1.1 1.365E+02 1.0
>>> > > > VecTDot 6.376E+00 3.7 6.551E+00 4.0
>>> 7.885E+00 2.9 7.175E+00 3.4
>>> > > > VecNorm 4.579E+00 7.1 5.803E+00 10.2
>>> 8.534E+00 6.9 6.026E+00 4.9
>>> > > > VecScale 1.070E-01 2.1 1.129E-01 2.2
>>> 1.301E-01 2.5 1.270E-01 2.4
>>> > > > VecCopy 1.123E-01 1.3 1.149E-01 1.3
>>> 1.301E-01 1.6 1.359E-01 1.6
>>> > > > VecSet 7.063E-01 1.7 6.968E-01 1.7
>>> 7.432E-01 1.8 7.425E-01 1.8
>>> > > > VecAXPY 1.166E+00 1.4 1.167E+00 1.4
>>> 1.221E+00 1.5 1.279E+00 1.6
>>> > > > VecAYPX 1.317E+00 1.6 1.290E+00 1.6
>>> 1.536E+00 1.9 1.499E+00 2.0
>>> > > > VecScatterBegin 6.142E+00 3.2 5.974E+00 2.8
>>> 6.448E+00 3.0 6.472E+00 2.9
>>> > > > VecScatterEnd 3.606E+01 4.2 3.551E+01 4.0
>>> 5.244E+01 2.7 4.995E+01 2.7
>>> > > > MatMult 3.561E+01 1.6 3.403E+01 1.5
>>> 3.435E+01 1.4 3.332E+01 1.4
>>> > > > MatMultAdd 1.124E+01 2.0 1.130E+01 2.1
>>> 2.093E+01 2.9 1.995E+01 2.7
>>> > > > MatMultTranspose 1.372E+01 2.5 1.388E+01 2.6
>>> 1.477E+01 2.2 1.381E+01 2.1
>>> > > > MatSolve 1.949E-02 0.0 1.653E-02 0.0
>>> 4.789E-02 0.0 4.466E-02 0.0
>>> > > > MatSOR 6.610E+01 1.3 6.673E+01 1.3
>>> 7.111E+01 1.3 7.105E+01 1.3
>>> > > > MatResidual 2.647E+01 1.7 2.667E+01 1.7
>>> 2.446E+01 1.4 2.467E+01 1.5
>>> > > > PCSetUpOnBlocks 5.266E-03 1.4 5.295E-03 1.4
>>> 5.427E-03 1.5 5.289E-03 1.4
>>> > > > PCApply 1.031E+02 1.0 1.035E+02 1.0
>>> 1.180E+02 1.0 1.164E+02 1.0
>>> > > >
>>> > > > I also slimmed down my code and basically wrote a simple weak
>>> scaling test (source files attached) so you can profile it yourself. I
>>> appreciate the offer Junchao, thank you.
>>> > > > You can adjust the system size per processor at runtime via
>>> "-nodes_per_proc 30" and the number of repeated calls to the function
>>> containing KSPsolve() via "-iterations 1000". The physical problem is
>>> simply calculating the electric potential from a homogeneous charge
>>> distribution, done multiple times to accumulate time in KSPsolve().
>>> > > > A job would be started using something like
>>> > > > mpirun -n 125 ~/petsc_ws/ws_test -nodes_per_proc 30 -mesh_size
>>> 1E-4 -iterations 1000 \\
>>> > > > -ksp_rtol 1E-6 \
>>> > > > -log_view -log_sync\
>>> > > > -pc_type gamg -pc_gamg_type classical\
>>> > > > -ksp_type cg \
>>> > > > -ksp_norm_type unpreconditioned \
>>> > > > -mg_levels_ksp_type richardson \
>>> > > > -mg_levels_ksp_norm_type none \
>>> > > > -mg_levels_pc_type sor \
>>> > > > -mg_levels_ksp_max_it 1 \
>>> > > > -mg_levels_pc_sor_its 1 \
>>> > > > -mg_levels_esteig_ksp_type cg \
>>> > > > -mg_levels_esteig_ksp_max_it 10 \
>>> > > > -gamg_est_ksp_type cg
>>> > > > , ideally started on a cube number of processes for a cubical
>>> process grid.
>>> > > > Using 125 processes and 10.000 iterations I get the output in
>>> "log_view_125_new.txt", which shows the same imbalance for me.
>>> > > > Michael
>>> > > >
>>> > > >
>>> > > > Am 02.06.2018 um 13:40 schrieb Mark Adams:
>>> > > >>
>>> > > >>
>>> > > >> On Fri, Jun 1, 2018 at 11:20 PM, Junchao Zhang <
>>> jczhang at mcs.anl.gov> wrote:
>>> > > >> Hi,Michael,
>>> > > >> You can add -log_sync besides -log_view, which adds barriers to
>>> certain events but measures barrier time separately from the events. I find
>>> this option makes it easier to interpret log_view output.
>>> > > >>
>>> > > >> That is great (good to know).
>>> > > >>
>>> > > >> This should give us a better idea if your large VecScatter costs
>>> are from slow communication or if it catching some sort of load imbalance.
>>> > > >>
>>> > > >>
>>> > > >> --Junchao Zhang
>>> > > >>
>>> > > >> On Wed, May 30, 2018 at 3:27 AM, Michael Becker <
>>> Michael.Becker at physik.uni-giessen.de> wrote:
>>> > > >> Barry: On its way. Could take a couple days again.
>>> > > >>
>>> > > >> Junchao: I unfortunately don't have access to a cluster with a
>>> faster network. This one has a mixed 4X QDR-FDR InfiniBand 2:1 blocking
>>> fat-tree network, which I realize causes parallel slowdown if the nodes are
>>> not connected to the same switch. Each node has 24 processors (2x12/socket)
>>> and four NUMA domains (two for each socket).
>>> > > >> The ranks are usually not distributed perfectly even, i.e. for
>>> 125 processes, of the six required nodes, five would use 21 cores and one
>>> 20.
>>> > > >> Would using another CPU type make a difference
>>> communication-wise? I could switch to faster ones (on the same network),
>>> but I always assumed this would only improve performance of the stuff that
>>> is unrelated to communication.
>>> > > >>
>>> > > >> Michael
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >>> The log files have something like "Average time for zero size
>>> MPI_Send(): 1.84231e-05". It looks you ran on a cluster with a very slow
>>> network. A typical machine should give less than 1/10 of the latency you
>>> have. An easy way to try is just running the code on a machine with a
>>> faster network and see what happens.
>>> > > >>>
>>> > > >>> Also, how many cores & numa domains does a compute node have? I
>>> could not figure out how you distributed the 125 MPI ranks evenly.
>>> > > >>>
>>> > > >>> --Junchao Zhang
>>> > > >>>
>>> > > >>> On Tue, May 29, 2018 at 6:18 AM, Michael Becker <
>>> Michael.Becker at physik.uni-giessen.de> wrote:
>>> > > >>> Hello again,
>>> > > >>>
>>> > > >>> here are the updated log_view files for 125 and 1000 processors.
>>> I ran both problems twice, the first time with all processors per node
>>> allocated ("-1.txt"), the second with only half on twice the number of
>>> nodes ("-2.txt").
>>> > > >>>
>>> > > >>>>> On May 24, 2018, at 12:24 AM, Michael Becker <
>>> Michael.Becker at physik.uni-giessen.de>
>>> > > >>>>> wrote:
>>> > > >>>>>
>>> > > >>>>> I noticed that for every individual KSP iteration, six vector
>>> objects are created and destroyed (with CG, more with e.g. GMRES).
>>> > > >>>>>
>>> > > >>>> Hmm, it is certainly not intended at vectors be created and
>>> destroyed within each KSPSolve() could you please point us to the code that
>>> makes you think they are being created and destroyed? We create all the
>>> work vectors at KSPSetUp() and destroy them in KSPReset() not during the
>>> solve. Not that this would be a measurable distance.
>>> > > >>>>
>>> > > >>>
>>> > > >>> I mean this, right in the log_view output:
>>> > > >>>
>>> > > >>>> Memory usage is given in bytes:
>>> > > >>>>
>>> > > >>>> Object Type Creations Destructions Memory Descendants' Mem.
>>> > > >>>> Reports information only for process 0.
>>> > > >>>>
>>> > > >>>> --- Event Stage 0: Main Stage
>>> > > >>>>
>>> > > >>>> ...
>>> > > >>>>
>>> > > >>>> --- Event Stage 1: First Solve
>>> > > >>>>
>>> > > >>>> ...
>>> > > >>>>
>>> > > >>>> --- Event Stage 2: Remaining Solves
>>> > > >>>>
>>> > > >>>> Vector 23904 23904 1295501184 0.
>>> > > >>> I logged the exact number of KSP iterations over the 999
>>> timesteps and its exactly 23904/6 = 3984.
>>> > > >>> Michael
>>> > > >>>
>>> > > >>>
>>> > > >>> Am 24.05.2018 um 19:50 schrieb Smith, Barry F.:
>>> > > >>>>
>>> > > >>>> Please send the log file for 1000 with cg as the solver.
>>> > > >>>>
>>> > > >>>> You should make a bar chart of each event for the two cases
>>> to see which ones are taking more time and which are taking less (we cannot
>>> tell with the two logs you sent us since they are for different solvers.)
>>> > > >>>>
>>> > > >>>>
>>> > > >>>>
>>> > > >>>>
>>> > > >>>>> On May 24, 2018, at 12:24 AM, Michael Becker <
>>> Michael.Becker at physik.uni-giessen.de>
>>> > > >>>>> wrote:
>>> > > >>>>>
>>> > > >>>>> I noticed that for every individual KSP iteration, six vector
>>> objects are created and destroyed (with CG, more with e.g. GMRES).
>>> > > >>>>>
>>> > > >>>> Hmm, it is certainly not intended at vectors be created and
>>> destroyed within each KSPSolve() could you please point us to the code that
>>> makes you think they are being created and destroyed? We create all the
>>> work vectors at KSPSetUp() and destroy them in KSPReset() not during the
>>> solve. Not that this would be a measurable distance.
>>> > > >>>>
>>> > > >>>>
>>> > > >>>>
>>> > > >>>>
>>> > > >>>>> This seems kind of wasteful, is this supposed to be like this?
>>> Is this even the reason for my problems? Apart from that, everything seems
>>> quite normal to me (but I'm not the expert here).
>>> > > >>>>>
>>> > > >>>>>
>>> > > >>>>> Thanks in advance.
>>> > > >>>>>
>>> > > >>>>> Michael
>>> > > >>>>>
>>> > > >>>>>
>>> > > >>>>>
>>> > > >>>>> <log_view_125procs.txt><log_vi
>>> > > >>>>> ew_1000procs.txt>
>>> > > >>>>>
>>> > > >>>
>>> > > >>>
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >
>>> > > >
>>> > > > <o-wstest-125.txt><Scaling-loss.png><o-wstest-1000.txt><o-ws
>>> test-sync-125.txt><o-wstest-sync-1000.txt><MatSOR_SeqAIJ.png
>>> ><PAPI_TOT_CYC.png><PAPI_DP_OPS.png>
>>> > >
>>> > >
>>> >
>>> >
>>> >
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180612/cf768443/attachment-0001.html>
-------------- next part --------------
using 216 of 216 processes
30^3 unknowns per processor
total system size: 180^3
mesh size: 0.0001
initsolve: 10 iterations
solve 1: 10 iterations
solve 2: 10 iterations
solve 3: 10 iterations
solve 4: 10 iterations
solve 5: 10 iterations
solve 6: 10 iterations
solve 7: 10 iterations
solve 8: 10 iterations
solve 9: 10 iterations
solve 10: 10 iterations
solve 20: 10 iterations
solve 30: 10 iterations
solve 40: 10 iterations
solve 50: 10 iterations
solve 60: 10 iterations
solve 70: 10 iterations
solve 80: 10 iterations
solve 90: 10 iterations
solve 100: 10 iterations
solve 200: 10 iterations
solve 300: 10 iterations
solve 400: 10 iterations
solve 500: 10 iterations
solve 600: 10 iterations
solve 700: 10 iterations
solve 800: 10 iterations
solve 900: 10 iterations
solve 1000: 10 iterations
Time in solve(): 109.012 s
Time in KSPSolve(): 108.771 s (99.7789%)
Number of KSP iterations (total): 10000
Number of solve iterations (total): 1000 (ratio: 10.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdwd-0003 with 216 processors, by jczhang Sun Jun 10 23:54:41 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 1.126e+02 1.00002 1.126e+02
Objects: 4.045e+04 1.00002 4.045e+04
Flop: 3.198e+10 1.10193 3.098e+10 6.691e+12
Flop/sec: 2.839e+08 1.10195 2.751e+08 5.941e+10
MPI Messages: 2.571e+06 4.18338 1.529e+06 3.302e+08
MPI Message Lengths: 2.182e+09 2.17075 1.164e+03 3.844e+11
MPI Reductions: 5.260e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.7742e-01 0.2% 0.0000e+00 0.0% 2.160e+03 0.0% 1.802e+03 0.0% 1.700e+01 0.0%
1: First Solve: 3.4111e+00 3.0% 1.0911e+10 0.2% 7.694e+05 0.2% 1.496e+03 0.3% 5.760e+02 1.1%
2: Remaining Solves: 1.0903e+02 96.8% 6.6798e+12 99.8% 3.294e+08 99.8% 1.163e+03 99.7% 5.200e+04 98.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 7.1764e-05 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 4 1.0 2.3761e-03 2.9 0.00e+00 0.0 5.1e+03 4.0e+00 0.0e+00 0 0 0 0 0 0 0 1 0 0 0
BuildTwoSidedF 40 1.0 7.3972e-02 2.2 0.00e+00 0.0 2.2e+04 6.4e+03 0.0e+00 0 0 0 0 0 2 0 3 12 0 0
KSPSetUp 11 1.0 1.9271e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 0 0 0 0 0 2 0
KSPSolve 1 1.0 3.4107e+00 1.0 5.31e+07 1.2 7.7e+05 1.5e+03 5.8e+02 3 0 0 0 1 100100100100100 3199
VecTDot 104 1.0 1.0672e-02 1.7 2.32e+06 1.0 0.0e+00 0.0e+00 1.0e+02 0 0 0 0 0 0 5 0 0 18 46901
VecNorm 12 1.0 1.6296e-03 1.8 6.48e+05 1.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 1 0 0 2 85891
VecScale 40 1.0 3.9005e-04 4.7 9.01e+04 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 40510
VecCopy 9 1.0 3.4094e-04 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 197 1.0 2.0678e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 100 1.0 3.4885e-03 1.2 2.26e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 4 0 0 0 139836
VecAYPX 86 1.0 2.7862e-03 1.3 1.34e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 3 0 0 0 103723
VecAssemblyBegin 14 1.0 2.4695e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 14 1.0 2.8300e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 44 1.0 1.2159e-03 1.3 3.26e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 57569
VecScatterBegin 230 1.0 7.8228e-03 2.3 0.00e+00 0.0 4.6e+05 1.4e+03 0.0e+00 0 0 0 0 0 0 0 59 55 0 0
VecScatterEnd 230 1.0 2.2854e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSetRandom 4 1.0 1.1837e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 91 1.0 3.2917e-02 1.2 1.52e+07 1.1 1.8e+05 1.9e+03 0.0e+00 0 0 0 0 0 1 29 24 29 0 96787
MatMultAdd 40 1.0 1.4560e-02 2.1 2.38e+06 1.3 7.4e+04 3.0e+02 0.0e+00 0 0 0 0 0 0 4 10 2 0 32696
MatMultTranspose 40 1.0 1.3394e-02 1.7 2.38e+06 1.3 7.4e+04 3.0e+02 0.0e+00 0 0 0 0 0 0 4 10 2 0 35544
MatSolve 10 0.0 6.1989e-05 0.0 1.22e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 198
MatSOR 80 1.0 4.1121e-02 1.1 1.40e+07 1.1 8.5e+04 1.5e+03 2.0e+01 0 0 0 0 0 1 27 11 11 3 71623
MatLUFactorSym 1 1.0 7.9155e-05 6.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 7.7009e-0519.0 1.01e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 131
MatConvert 4 1.0 4.0369e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatScale 12 1.0 2.5949e-03 1.7 8.36e+05 1.2 8.5e+03 1.5e+03 0.0e+00 0 0 0 0 0 0 2 1 1 0 66198
MatResidual 40 1.0 1.5780e-02 1.5 5.98e+06 1.1 8.5e+04 1.5e+03 0.0e+00 0 0 0 0 0 0 11 11 11 0 78689
MatAssemblyBegin 83 1.0 7.5437e-02 2.0 0.00e+00 0.0 2.2e+04 6.4e+03 0.0e+00 0 0 0 0 0 2 0 3 12 0 0
MatAssemblyEnd 83 1.0 7.3102e-02 1.3 0.00e+00 0.0 1.0e+05 2.2e+02 1.8e+02 0 0 0 0 0 2 0 13 2 31 0
MatGetRow 88851 1.0 1.4340e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 42 0 0 0 0 0
MatGetRowIJ 1 0.0 1.8120e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMat 4 1.0 1.1639e-02 1.9 0.00e+00 0.0 4.5e+03 5.5e+02 6.4e+01 0 0 0 0 0 0 0 1 0 11 0
MatGetOrdering 1 0.0 9.7036e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 4 1.0 9.6412e-03 1.1 0.00e+00 0.0 9.4e+04 8.0e+02 2.5e+01 0 0 0 0 0 0 0 12 6 4 0
MatZeroEntries 4 1.0 4.4084e-04 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAXPY 4 1.0 9.6810e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 28 0 0 0 0 0
MatMatMult 4 1.0 4.4704e-02 1.0 5.98e+05 1.3 5.4e+04 7.0e+02 4.9e+01 0 0 0 0 0 1 1 7 3 9 2667
MatMatMultSym 4 1.0 3.8034e-02 1.0 0.00e+00 0.0 4.6e+04 5.6e+02 4.8e+01 0 0 0 0 0 1 0 6 2 8 0
MatMatMultNum 4 1.0 6.2041e-03 1.0 5.98e+05 1.3 8.5e+03 1.5e+03 0.0e+00 0 0 0 0 0 0 1 1 1 0 19219
MatPtAP 4 1.0 1.0061e-01 1.0 8.06e+06 1.5 1.2e+05 2.5e+03 6.1e+01 0 0 0 0 0 3 14 15 26 11 14973
MatPtAPSymbolic 4 1.0 6.1238e-02 1.0 0.00e+00 0.0 5.6e+04 2.8e+03 2.8e+01 0 0 0 0 0 2 0 7 14 5 0
MatPtAPNumeric 4 1.0 3.9100e-02 1.0 8.06e+06 1.5 6.1e+04 2.2e+03 3.2e+01 0 0 0 0 0 1 14 8 12 6 38528
MatTrnMatMult 1 1.0 8.1200e-02 1.0 2.72e+06 1.3 1.0e+04 9.2e+03 1.6e+01 0 0 0 0 0 2 5 1 8 3 6689
MatTrnMatMultSym 1 1.0 5.5003e-02 1.0 0.00e+00 0.0 9.0e+03 4.6e+03 1.6e+01 0 0 0 0 0 2 0 1 4 3 0
MatTrnMatMultNum 1 1.0 2.6250e-02 1.0 2.72e+06 1.3 1.1e+03 4.8e+04 0.0e+00 0 0 0 0 0 1 5 0 5 0 20693
MatGetLocalMat 14 1.0 7.8259e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 12 1.0 5.9967e-03 1.5 0.00e+00 0.0 5.9e+04 2.7e+03 0.0e+00 0 0 0 0 0 0 0 8 14 0 0
SFSetGraph 4 1.0 3.9816e-0510.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 4 1.0 2.9175e-03 1.8 0.00e+00 0.0 1.5e+04 6.6e+02 0.0e+00 0 0 0 0 0 0 0 2 1 0 0
SFBcastBegin 33 1.0 1.0576e-03 2.5 0.00e+00 0.0 7.8e+04 8.2e+02 0.0e+00 0 0 0 0 0 0 0 10 6 0 0
SFBcastEnd 33 1.0 1.6055e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCGAMGGraph_AGG 4 1.0 1.9680e+00 1.0 5.98e+05 1.1 2.5e+04 7.4e+02 4.8e+01 2 0 0 0 0 58 1 3 2 8 63
PCGAMGCoarse_AGG 4 1.0 1.0160e-01 1.0 2.72e+06 1.3 1.2e+05 2.0e+03 4.5e+01 0 0 0 0 0 3 5 15 21 8 5346
PCGAMGProl_AGG 4 1.0 2.3153e-02 1.1 0.00e+00 0.0 3.5e+04 1.3e+03 6.4e+01 0 0 0 0 0 1 0 5 4 11 0
PCGAMGPOpt_AGG 4 1.0 1.0441e+00 1.0 9.81e+06 1.1 1.4e+05 1.2e+03 1.6e+02 1 0 0 0 0 31 19 18 14 29 1965
GAMG: createProl 4 1.0 3.1363e+00 1.0 1.31e+07 1.1 3.2e+05 1.5e+03 3.2e+02 3 0 0 0 1 92 25 41 40 56 867
Graph 8 1.0 1.9667e+00 1.0 5.98e+05 1.1 2.5e+04 7.4e+02 4.8e+01 2 0 0 0 0 58 1 3 2 8 63
MIS/Agg 4 1.0 9.7492e-03 1.1 0.00e+00 0.0 9.4e+04 8.0e+02 2.5e+01 0 0 0 0 0 0 0 12 6 4 0
SA: col data 4 1.0 5.0085e-03 1.1 0.00e+00 0.0 2.1e+04 2.0e+03 1.6e+01 0 0 0 0 0 0 0 3 4 3 0
SA: frmProl0 4 1.0 1.5262e-02 1.0 0.00e+00 0.0 1.4e+04 3.7e+02 3.2e+01 0 0 0 0 0 0 0 2 0 6 0
SA: smooth 4 1.0 1.0149e+00 1.0 8.36e+05 1.3 5.4e+04 7.0e+02 5.7e+01 1 0 0 0 0 30 2 7 3 10 164
GAMG: partLevel 4 1.0 1.1628e-01 1.0 8.06e+06 1.5 1.2e+05 2.4e+03 1.6e+02 0 0 0 0 0 3 14 16 26 28 12956
repartition 2 1.0 8.7881e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Invert-Sort 2 1.0 1.3301e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 1 0
Move A 2 1.0 9.6662e-03 2.4 0.00e+00 0.0 1.9e+03 1.3e+03 3.4e+01 0 0 0 0 0 0 0 0 0 6 0
Move P 2 1.0 8.2741e-03 3.1 0.00e+00 0.0 2.6e+03 2.1e+01 3.4e+01 0 0 0 0 0 0 0 0 0 6 0
PCSetUp 2 1.0 3.2645e+00 1.0 2.12e+07 1.3 4.4e+05 1.7e+03 5.1e+02 3 0 0 0 1 96 39 57 66 88 1294
PCSetUpOnBlocks 10 1.0 4.4870e-04 3.5 1.01e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 23
PCApply 10 1.0 7.8002e-02 1.0 2.48e+07 1.1 3.2e+05 9.4e+02 2.0e+01 0 0 0 0 0 2 47 41 26 3 65884
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 1.0878e+02 1.0 3.19e+10 1.1 3.3e+08 1.2e+03 5.2e+04 97100100100 99 100100100100100 61407
VecTDot 20000 1.0 7.9760e+00 1.4 1.08e+09 1.0 0.0e+00 0.0e+00 2.0e+04 6 3 0 0 38 6 3 0 0 38 29247
VecNorm 12000 1.0 2.9418e+00 1.1 6.48e+08 1.0 0.0e+00 0.0e+00 1.2e+04 3 2 0 0 23 3 2 0 0 23 47579
VecScale 40000 1.0 2.1746e-01 1.7 9.01e+07 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 72661
VecCopy 1000 1.0 8.4139e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 150000 1.0 1.4097e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 20000 1.0 1.3505e+00 1.1 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 172735
VecAYPX 50000 1.0 1.6715e+00 1.3 8.09e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 104366
VecScatterBegin 171000 1.0 5.6701e+00 2.4 0.00e+00 0.0 3.3e+08 1.2e+03 0.0e+00 4 0100100 0 4 0100100 0 0
VecScatterEnd 171000 1.0 2.8652e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21 0 0 0 0 21 0 0 0 0 0
MatMult 51000 1.0 2.2225e+01 1.2 9.55e+09 1.1 9.7e+07 2.2e+03 0.0e+00 17 30 29 55 0 18 30 29 55 0 90340
MatMultAdd 40000 1.0 1.9086e+01 2.1 2.38e+09 1.3 7.4e+07 3.0e+02 0.0e+00 13 7 22 6 0 13 7 22 6 0 24944
MatMultTranspose 40000 1.0 1.5344e+01 1.8 2.38e+09 1.3 7.4e+07 3.0e+02 0.0e+00 10 7 22 6 0 11 7 22 6 0 31027
MatSolve 10000 0.0 5.1090e-02 0.0 1.22e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 240
MatSOR 80000 1.0 4.7799e+01 1.2 1.40e+10 1.1 8.5e+07 1.5e+03 2.0e+04 40 44 26 33 38 41 44 26 33 38 61484
MatResidual 40000 1.0 1.6702e+01 1.4 5.98e+09 1.1 8.5e+07 1.5e+03 0.0e+00 13 19 26 33 0 13 19 26 33 0 74349
PCSetUpOnBlocks 10000 1.0 1.4676e-01 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 10000 1.0 9.0367e+01 1.0 2.47e+10 1.1 3.2e+08 9.4e+02 2.0e+04 80 77 96 77 38 82 77 96 78 38 56799
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 7 8816 0.
DMKSP interface 1 1 656 0.
Vector 4 38 1915568 0.
Matrix 0 50 9208336 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 12 137820 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 10 228640 0.
Preconditioner 1 7 7448 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 10 4 6400 0.
Vector 171 137 6010696 0.
Matrix 122 72 21592712 0.
Matrix Coarsen 4 4 2544 0.
Index Set 76 66 118624 0.
Star Forest Graph 4 4 3456 0.
Vec Scatter 28 19 24048 0.
Preconditioner 10 4 3424 0.
PetscRandom 8 8 5168 0.
--- Event Stage 2: Remaining Solves
Vector 40000 40000 2398240000 0.
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 8.96454e-06
Average time for zero size MPI_Send(): 1.3921e-05
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_agg_nsmooths 1
-pc_gamg_coarse_eq_limit 1000
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 1
-pc_gamg_threshold 0.05
-pc_gamg_threshold_scale .0
-pc_gamg_type agg
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
srun: Warning: can't honor --ntasks-per-node set to 36 which doesn't match the requested tasks 48 with the number of requested nodes 48. Ignoring --ntasks-per-node.
using 1728 of 1728 processes
30^3 unknowns per processor
total system size: 360^3
mesh size: 0.0001
initsolve: 12 iterations
solve 1: 12 iterations
solve 2: 12 iterations
solve 3: 12 iterations
solve 4: 12 iterations
solve 5: 12 iterations
solve 6: 12 iterations
solve 7: 12 iterations
solve 8: 12 iterations
solve 9: 12 iterations
solve 10: 12 iterations
solve 20: 12 iterations
solve 30: 12 iterations
solve 40: 12 iterations
solve 50: 12 iterations
solve 60: 12 iterations
solve 70: 12 iterations
solve 80: 12 iterations
solve 90: 12 iterations
solve 100: 12 iterations
solve 200: 12 iterations
solve 300: 12 iterations
solve 400: 12 iterations
solve 500: 12 iterations
solve 600: 12 iterations
solve 700: 12 iterations
solve 800: 12 iterations
solve 900: 12 iterations
solve 1000: 12 iterations
Time in solve(): 183.41 s
Time in KSPSolve(): 183.154 s (99.8607%)
Number of KSP iterations (total): 12000
Number of solve iterations (total): 1000 (ratio: 12.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0065 with 1728 processors, by jczhang Sun Jun 10 23:56:56 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 1.873e+02 1.00001 1.873e+02
Objects: 4.846e+04 1.00002 4.846e+04
Flop: 3.894e+10 1.11735 3.767e+10 6.509e+13
Flop/sec: 2.080e+08 1.11735 2.012e+08 3.476e+11
MPI Messages: 4.708e+06 6.19009 2.297e+06 3.969e+09
MPI Message Lengths: 2.889e+09 2.40120 1.037e+03 4.115e+12
MPI Reductions: 6.264e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.1424e-01 0.1% 0.0000e+00 0.0% 1.901e+04 0.0% 1.802e+03 0.0% 1.700e+01 0.0%
1: First Solve: 3.7160e+00 2.0% 1.0002e+11 0.2% 8.597e+06 0.2% 1.305e+03 0.3% 6.140e+02 1.0%
2: Remaining Solves: 1.8343e+02 98.0% 6.4988e+13 99.8% 3.961e+09 99.8% 1.036e+03 99.7% 6.200e+04 99.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 1.5020e-04 5.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 4 1.0 2.8999e-03 2.2 0.00e+00 0.0 5.1e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 1 0 0 0
BuildTwoSidedF 40 1.0 1.2657e-01 2.3 0.00e+00 0.0 2.1e+05 6.2e+03 0.0e+00 0 0 0 0 0 2 0 2 12 0 0
KSPSetUp 11 1.0 5.3284e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 0 0 0 0 0 2 0
KSPSolve 1 1.0 3.7157e+00 1.0 6.06e+07 1.2 8.6e+06 1.3e+03 6.1e+02 2 0 0 0 1 100100100100100 26919
VecTDot 108 1.0 2.7509e-02 1.4 2.54e+06 1.0 0.0e+00 0.0e+00 1.1e+02 0 0 0 0 0 1 4 0 0 18 159329
VecNorm 14 1.0 2.2557e-03 1.7 7.56e+05 1.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 0 0 1 0 0 2 579146
VecScale 48 1.0 4.7922e-04 3.6 1.24e+05 2.5 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 355623
VecCopy 9 1.0 4.0150e-04 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 227 1.0 2.5015e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 104 1.0 5.3051e-03 1.7 2.48e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 4 0 0 0 806970
VecAYPX 96 1.0 3.2096e-03 1.4 1.51e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 3 0 0 0 811392
VecAssemblyBegin 14 1.0 3.1312e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 14 1.0 3.3569e-04 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 44 1.0 1.2739e-03 1.3 3.26e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 440721
VecScatterBegin 264 1.0 1.2353e-02 2.9 0.00e+00 0.0 5.2e+06 1.2e+03 0.0e+00 0 0 0 0 0 0 0 61 57 0 0
VecScatterEnd 264 1.0 6.0780e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
VecSetRandom 4 1.0 1.1327e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 101 1.0 4.6448e-02 1.5 1.76e+07 1.1 2.0e+06 1.7e+03 0.0e+00 0 0 0 0 0 1 29 23 30 0 625296
MatMultAdd 48 1.0 5.0853e-02 4.2 2.86e+06 1.3 9.1e+05 2.7e+02 0.0e+00 0 0 0 0 0 1 5 11 2 0 93269
MatMultTranspose 48 1.0 3.3805e-02 3.4 2.86e+06 1.3 9.1e+05 2.7e+02 0.0e+00 0 0 0 0 0 0 5 11 2 0 140302
MatSolve 12 0.0 6.8688e-04 0.0 8.45e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1230
MatSOR 96 1.0 5.9859e-02 1.2 1.72e+07 1.1 1.0e+06 1.4e+03 2.4e+01 0 0 0 0 0 1 29 12 12 4 478874
MatLUFactorSym 1 1.0 3.2248e-03294.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 2.8222e-03986.4 4.22e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1494
MatConvert 4 1.0 1.2332e-02 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatScale 12 1.0 3.8207e-03 2.6 8.60e+05 1.2 8.4e+04 1.4e+03 0.0e+00 0 0 0 0 0 0 1 1 1 0 368774
MatResidual 48 1.0 2.7284e-02 2.0 7.46e+06 1.2 1.0e+06 1.4e+03 0.0e+00 0 0 0 0 0 0 12 12 12 0 445843
MatAssemblyBegin 83 1.0 1.2762e-01 2.1 0.00e+00 0.0 2.1e+05 6.2e+03 0.0e+00 0 0 0 0 0 2 0 2 12 0 0
MatAssemblyEnd 83 1.0 1.2948e-01 1.6 0.00e+00 0.0 1.1e+06 1.8e+02 1.8e+02 0 0 0 0 0 3 0 13 2 29 0
MatGetRow 89034 1.0 1.4585e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 38 0 0 0 0 0
MatGetRowIJ 1 0.0 1.0991e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMat 4 1.0 2.8479e-02 1.3 0.00e+00 0.0 5.9e+04 4.9e+02 6.4e+01 0 0 0 0 0 1 0 1 0 10 0
MatGetOrdering 1 0.0 2.7084e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCoarsen 4 1.0 2.0675e-02 1.1 0.00e+00 0.0 1.1e+06 6.2e+02 5.1e+01 0 0 0 0 0 1 0 13 6 8 0
MatZeroEntries 4 1.0 3.2139e-04 3.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAXPY 4 1.0 9.9720e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 27 0 0 0 0 0
MatMatMult 4 1.0 8.0693e-02 1.1 6.22e+05 1.3 5.4e+05 6.4e+02 5.0e+01 0 0 0 0 0 2 1 6 3 8 12315
MatMatMultSym 4 1.0 6.5204e-02 1.0 0.00e+00 0.0 4.6e+05 5.1e+02 4.8e+01 0 0 0 0 0 2 0 5 2 8 0
MatMatMultNum 4 1.0 9.7268e-03 1.0 6.22e+05 1.3 8.4e+04 1.3e+03 0.0e+00 0 0 0 0 0 0 1 1 1 0 102165
MatPtAP 4 1.0 1.7622e-01 1.0 8.31e+06 1.6 1.2e+06 2.3e+03 6.2e+01 0 0 0 0 0 5 13 14 25 10 72429
MatPtAPSymbolic 4 1.0 9.2023e-02 1.0 0.00e+00 0.0 5.6e+05 2.7e+03 2.8e+01 0 0 0 0 0 2 0 7 14 5 0
MatPtAPNumeric 4 1.0 8.2069e-02 1.0 8.31e+06 1.6 6.7e+05 1.9e+03 3.2e+01 0 0 0 0 0 2 13 8 12 5 155523
MatTrnMatMult 1 1.0 9.1041e-02 1.0 2.72e+06 1.3 9.2e+04 9.1e+03 1.6e+01 0 0 0 0 0 2 5 1 7 3 49668
MatTrnMatMultSym 1 1.0 6.3481e-02 1.0 0.00e+00 0.0 8.2e+04 4.5e+03 1.6e+01 0 0 0 0 0 2 0 1 3 3 0
MatTrnMatMultNum 1 1.0 2.7639e-02 1.0 2.72e+06 1.3 9.5e+03 4.9e+04 0.0e+00 0 0 0 0 0 1 5 0 4 0 163601
MatGetLocalMat 14 1.0 9.3658e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 12 1.0 9.1021e-03 2.3 0.00e+00 0.0 5.9e+05 2.6e+03 0.0e+00 0 0 0 0 0 0 0 7 14 0 0
SFSetGraph 4 1.0 7.6056e-0526.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 4 1.0 3.6201e-03 1.7 0.00e+00 0.0 1.5e+05 6.0e+02 0.0e+00 0 0 0 0 0 0 0 2 1 0 0
SFBcastBegin 59 1.0 2.5663e-03 3.4 0.00e+00 0.0 9.8e+05 6.2e+02 0.0e+00 0 0 0 0 0 0 0 11 5 0 0
SFBcastEnd 59 1.0 7.1716e-03 7.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCGAMGGraph_AGG 4 1.0 2.0067e+00 1.0 6.22e+05 1.2 2.5e+05 6.7e+02 4.8e+01 1 0 0 0 0 54 1 3 2 8 505
PCGAMGCoarse_AGG 4 1.0 1.2379e-01 1.0 2.72e+06 1.3 1.3e+06 1.6e+03 7.1e+01 0 0 0 0 0 3 5 16 19 12 36527
PCGAMGProl_AGG 4 1.0 2.7567e-02 1.1 0.00e+00 0.0 3.3e+05 1.2e+03 6.4e+01 0 0 0 0 0 1 0 4 4 10 0
PCGAMGPOpt_AGG 4 1.0 1.1260e+00 1.0 1.01e+07 1.1 1.4e+06 1.1e+03 1.7e+02 1 0 0 0 0 30 17 16 13 27 14814
GAMG: createProl 4 1.0 3.2832e+00 1.0 1.34e+07 1.2 3.3e+06 1.3e+03 3.5e+02 2 0 0 0 1 88 22 39 38 57 6767
Graph 8 1.0 2.0053e+00 1.0 6.22e+05 1.2 2.5e+05 6.7e+02 4.8e+01 1 0 0 0 0 54 1 3 2 8 506
MIS/Agg 4 1.0 2.0879e-02 1.1 0.00e+00 0.0 1.1e+06 6.2e+02 5.1e+01 0 0 0 0 0 1 0 13 6 8 0
SA: col data 4 1.0 5.9071e-03 1.1 0.00e+00 0.0 2.0e+05 1.8e+03 1.6e+01 0 0 0 0 0 0 0 2 3 3 0
SA: frmProl0 4 1.0 1.8157e-02 1.0 0.00e+00 0.0 1.3e+05 3.7e+02 3.2e+01 0 0 0 0 0 0 0 2 0 5 0
SA: smooth 4 1.0 1.0804e+00 1.0 8.60e+05 1.3 5.4e+05 6.4e+02 5.8e+01 1 0 0 0 0 29 1 6 3 9 1286
GAMG: partLevel 4 1.0 2.1361e-01 1.0 8.31e+06 1.6 1.3e+06 2.2e+03 1.6e+02 0 0 0 0 0 6 13 15 25 27 59752
repartition 2 1.0 1.5340e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Invert-Sort 2 1.0 4.6380e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 1 0
Move A 2 1.0 2.1691e-02 1.4 0.00e+00 0.0 2.4e+04 1.2e+03 3.4e+01 0 0 0 0 0 0 0 0 0 6 0
Move P 2 1.0 1.3658e-02 1.8 0.00e+00 0.0 3.5e+04 1.7e+01 3.4e+01 0 0 0 0 0 0 0 0 0 6 0
PCSetUp 2 1.0 3.5135e+00 1.0 2.17e+07 1.3 4.6e+06 1.5e+03 5.4e+02 2 0 0 0 1 94 35 54 63 87 9957
PCSetUpOnBlocks 12 1.0 6.3727e-0342.4 4.22e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 662
PCApply 12 1.0 1.3277e-01 1.0 3.14e+07 1.2 3.8e+06 8.4e+02 2.4e+01 0 0 0 0 0 4 50 45 29 4 379007
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 1.8317e+02 1.0 3.89e+10 1.1 4.0e+09 1.0e+03 6.2e+04 98100100100 99 100100100100100 354794
VecTDot 24000 1.0 2.3516e+01 1.2 1.30e+09 1.0 0.0e+00 0.0e+00 2.4e+04 11 3 0 0 38 12 3 0 0 39 95232
VecNorm 14000 1.0 6.8848e+00 1.1 7.56e+08 1.0 0.0e+00 0.0e+00 1.4e+04 4 2 0 0 22 4 2 0 0 23 189748
VecScale 48000 1.0 2.9046e-01 2.5 1.24e+08 2.5 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 586732
VecCopy 1000 1.0 8.8122e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 180000 1.0 1.8199e+00 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 24000 1.0 1.6535e+00 1.2 1.30e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 1354423
VecAYPX 60000 1.0 1.9924e+00 1.3 9.77e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 846000
VecScatterBegin 205000 1.0 9.1513e+00 3.1 0.00e+00 0.0 4.0e+09 1.0e+03 0.0e+00 3 0100100 0 3 0100100 0 0
VecScatterEnd 205000 1.0 5.6576e+01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 26 0 0 0 0 27 0 0 0 0 0
MatMult 61000 1.0 3.2314e+01 1.5 1.17e+10 1.1 1.1e+09 2.0e+03 0.0e+00 13 30 29 55 0 14 30 29 55 0 600882
MatMultAdd 48000 1.0 4.4993e+01 2.9 2.86e+09 1.3 9.1e+08 2.7e+02 0.0e+00 20 7 23 6 0 20 7 23 6 0 105415
MatMultTranspose 48000 1.0 3.4573e+01 3.1 2.86e+09 1.3 9.1e+08 2.7e+02 0.0e+00 8 7 23 6 0 8 7 23 6 0 137187
MatSolve 12000 0.0 7.3331e-01 0.0 8.45e+08 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1152
MatSOR 96000 1.0 7.1605e+01 1.2 1.72e+10 1.1 1.0e+09 1.4e+03 2.4e+04 36 44 25 33 38 37 44 25 33 39 399609
MatResidual 48000 1.0 2.5073e+01 1.7 7.46e+09 1.2 1.0e+09 1.4e+03 0.0e+00 10 19 25 33 0 10 19 25 33 0 485156
PCSetUpOnBlocks 12000 1.0 1.7560e-0115.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 12000 1.0 1.4437e+02 1.0 3.04e+10 1.2 3.8e+09 8.4e+02 2.4e+04 77 77 97 78 38 78 77 97 78 39 348175
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 7 8816 0.
DMKSP interface 1 1 656 0.
Vector 4 38 1919384 0.
Matrix 0 50 10310768 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 12 206536 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 10 228640 0.
Preconditioner 1 7 7448 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 10 4 6400 0.
Vector 179 145 6496648 0.
Matrix 122 72 21618064 0.
Matrix Coarsen 4 4 2544 0.
Index Set 74 64 185744 0.
Star Forest Graph 4 4 3456 0.
Vec Scatter 28 19 24056 0.
Preconditioner 10 4 3424 0.
PetscRandom 8 8 5168 0.
--- Event Stage 2: Remaining Solves
Vector 48000 48000 2878368000 0.
========================================================================================================================
Average time to get PetscTime(): 5.00679e-07
Average time for MPI_Barrier(): 1.45912e-05
Average time for zero size MPI_Send(): 6.50518e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_agg_nsmooths 1
-pc_gamg_coarse_eq_limit 1000
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 1
-pc_gamg_threshold 0.05
-pc_gamg_threshold_scale .0
-pc_gamg_type agg
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
using 216 of 216 processes
30^3 unknowns per processor
total system size: 180^3
mesh size: 0.0001
initsolve: 163 iterations
solve 1: 163 iterations
solve 2: 163 iterations
solve 3: 163 iterations
solve 4: 163 iterations
solve 5: 163 iterations
solve 6: 163 iterations
solve 7: 163 iterations
solve 8: 163 iterations
solve 9: 163 iterations
solve 10: 163 iterations
solve 20: 163 iterations
solve 30: 163 iterations
solve 40: 163 iterations
solve 50: 163 iterations
solve 60: 163 iterations
solve 70: 163 iterations
solve 80: 163 iterations
solve 90: 163 iterations
solve 100: 163 iterations
solve 200: 163 iterations
solve 300: 163 iterations
solve 400: 163 iterations
solve 500: 163 iterations
solve 600: 163 iterations
solve 700: 163 iterations
solve 800: 163 iterations
solve 900: 163 iterations
solve 1000: 163 iterations
Time in solve(): 423.567 s
Time in KSPSolve(): 423.327 s (99.9431%)
Number of KSP iterations (total): 163000
Number of solve iterations (total): 1000 (ratio: 163.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0437 with 216 processors, by jczhang Tue Jun 12 02:09:30 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 4.243e+02 1.00000 4.243e+02
Objects: 3.500e+01 1.00000 3.500e+01
Flop: 1.661e+11 1.00537 1.658e+11 3.581e+13
Flop/sec: 3.914e+08 1.00537 3.907e+08 8.439e+10
MPI Messages: 9.850e+05 2.00000 8.208e+05 1.773e+08
MPI Message Lengths: 7.092e+09 2.00000 7.200e+03 1.277e+12
MPI Reductions: 4.915e+05 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.0660e-01 0.0% 0.0000e+00 0.0% 2.160e+03 0.0% 1.802e+03 0.0% 1.700e+01 0.0%
1: First Solve: 6.4530e-01 0.2% 3.5887e+10 0.1% 1.793e+05 0.1% 7.135e+03 0.1% 5.070e+02 0.1%
2: Remaining Solves: 4.2358e+02 99.8% 3.5773e+13 99.9% 1.771e+08 99.9% 7.200e+03 99.9% 4.910e+05 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 6.1989e-05 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSidedF 2 1.0 6.4178e-03 8.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
KSPSetUp 2 1.0 9.5639e-0316.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 1 0 0 0 1 0
KSPSolve 1 1.0 6.5393e-01 1.0 1.66e+08 1.0 1.8e+05 7.1e+03 5.1e+02 0 0 0 0 0 100100100100100 54879
VecTDot 326 1.0 2.6734e-01 1.3 1.76e+07 1.0 0.0e+00 0.0e+00 3.3e+02 0 0 0 0 0 39 11 0 0 64 14223
VecNorm 165 1.0 4.5129e-02 1.5 8.91e+06 1.0 0.0e+00 0.0e+00 1.6e+02 0 0 0 0 0 6 5 0 0 33 42646
VecCopy 1 1.0 9.1076e-05 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 164 1.0 4.1964e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
VecAXPY 326 1.0 2.9594e-02 1.8 1.76e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 11 0 0 0 128488
VecAYPX 163 1.0 2.5827e-02 5.8 8.78e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 5 0 0 0 73387
VecScatterBegin 164 1.0 1.1089e-02 2.3 0.00e+00 0.0 1.8e+05 7.2e+03 0.0e+00 0 0 0 0 0 1 0 99100 0 0
VecScatterEnd 164 1.0 4.3868e-02 8.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0
MatMult 164 1.0 1.7538e-01 1.4 5.76e+07 1.0 1.8e+05 7.2e+03 0.0e+00 0 0 0 0 0 22 34 99100 0 70534
MatSolve 163 1.0 1.2920e-01 1.2 5.55e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 18 33 0 0 0 92709
MatLUFactorNum 1 1.0 6.4151e-03 3.9 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 17867
MatILUFactorSym 1 1.0 2.3391e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 2 1.0 1.4147e-0217.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatAssemblyEnd 2 1.0 4.0359e-03 1.1 0.00e+00 0.0 2.2e+03 1.8e+03 8.0e+00 0 0 0 0 0 1 0 1 0 2 0
MatGetRowIJ 1 1.0 5.1022e-0553.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 2.6393e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 2 1.0 1.3018e-02 3.1 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 8805
PCSetUpOnBlocks 1 1.0 8.4250e-03 2.6 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 13605
PCApply 163 1.0 1.3862e-01 1.2 5.55e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 19 33 0 0 0 86409
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 4.2333e+02 1.0 1.66e+11 1.0 1.8e+08 7.2e+03 4.9e+05100100100100100 100100100100100 84502
VecTDot 326000 1.0 1.0805e+02 1.3 1.76e+10 1.0 0.0e+00 0.0e+00 3.3e+05 22 11 0 0 66 22 11 0 0 66 35191
VecNorm 165000 1.0 3.5069e+01 1.2 8.91e+09 1.0 0.0e+00 0.0e+00 1.6e+05 8 5 0 0 34 8 5 0 0 34 54879
VecCopy 1000 1.0 9.4171e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 163000 1.0 4.2720e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 326000 1.0 2.1878e+01 1.3 1.76e+10 1.0 0.0e+00 0.0e+00 0.0e+00 4 11 0 0 0 4 11 0 0 0 173800
VecAYPX 163000 1.0 1.0099e+01 2.1 8.78e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 5 0 0 0 2 5 0 0 0 187673
VecScatterBegin 164000 1.0 1.1169e+01 2.2 0.00e+00 0.0 1.8e+08 7.2e+03 0.0e+00 2 0100100 0 2 0100100 0 0
VecScatterEnd 164000 1.0 1.9032e+01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0
MatMult 164000 1.0 1.4498e+02 1.2 5.76e+10 1.0 1.8e+08 7.2e+03 0.0e+00 32 35100100 0 32 35100100 0 85325
MatSolve 163000 1.0 1.2030e+02 1.1 5.55e+10 1.0 0.0e+00 0.0e+00 0.0e+00 27 33 0 0 0 27 33 0 0 0 99564
PCSetUpOnBlocks 1000 1.0 2.3875e-02 7.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 163000 1.0 1.3030e+02 1.1 5.55e+10 1.0 0.0e+00 0.0e+00 0.0e+00 29 33 0 0 0 29 33 0 0 0 91925
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 2 2424 0.
DMKSP interface 1 1 656 0.
Vector 4 10 1117680 0.
Matrix 0 4 5974172 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 5 446760 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 2 218528 0.
Preconditioner 1 2 1912 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 1 0 0 0.
Vector 7 1 1656 0.
Matrix 4 0 0 0.
Index Set 5 2 12384 0.
Vec Scatter 1 0 0 0.
Preconditioner 1 0 0 0.
--- Event Stage 2: Remaining Solves
========================================================================================================================
Average time to get PetscTime(): 6.19888e-07
Average time for MPI_Barrier(): 2.01702e-05
Average time for zero size MPI_Send(): 5.63926e-06
#PETSc Option Table entries:
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-nodes_per_proc 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
using 216 of 216 processes
30^3 unknowns per processor
total system size: 180^3
mesh size: 0.0001
initsolve: 100 iterations
solve 1: 100 iterations
solve 2: 100 iterations
solve 3: 100 iterations
solve 4: 100 iterations
solve 5: 100 iterations
solve 6: 100 iterations
solve 7: 100 iterations
solve 8: 100 iterations
solve 9: 100 iterations
solve 10: 100 iterations
solve 20: 100 iterations
solve 30: 100 iterations
solve 40: 100 iterations
solve 50: 100 iterations
solve 60: 100 iterations
solve 70: 100 iterations
solve 80: 100 iterations
solve 90: 100 iterations
solve 100: 100 iterations
Time in solve(): 29.0941 s
Time in KSPSolve(): 29.0545 s (99.8639%)
Number of KSP iterations (total): 10000
Number of solve iterations (total): 100 (ratio: 100.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdwd-0033 with 216 processors, by jczhang Tue Jun 12 09:57:11 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 2.961e+01 1.00008 2.961e+01
Objects: 3.500e+01 1.00000 3.500e+01
Flop: 1.034e+10 1.00537 1.032e+10 2.229e+12
Flop/sec: 3.491e+08 1.00544 3.485e+08 7.528e+10
MPI Messages: 6.123e+04 2.00000 5.102e+04 1.102e+07
MPI Message Lengths: 4.407e+08 2.00000 7.198e+03 7.933e+10
MPI Reductions: 3.064e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 8.8708e-02 0.3% 0.0000e+00 0.0% 2.160e+03 0.0% 1.802e+03 0.0% 1.700e+01 0.1%
1: First Solve: 4.2526e-01 1.4% 2.2182e+10 1.0% 1.112e+05 1.0% 7.095e+03 1.0% 3.190e+02 1.0%
2: Remaining Solves: 2.9096e+01 98.3% 2.2067e+12 99.0% 1.091e+07 99.0% 7.200e+03 99.0% 3.030e+04 98.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 1.9312e-04 6.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSidedF 2 1.0 1.1070e-0245.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0
KSPSetUp 2 1.0 7.4315e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 2 0
KSPSolve 1 1.0 4.2496e-01 1.0 1.03e+08 1.0 1.1e+05 7.1e+03 3.2e+02 1 1 1 1 1 100100100100100 52198
VecTDot 201 1.0 1.3848e-01 1.3 1.09e+07 1.0 0.0e+00 0.0e+00 2.0e+02 0 0 0 0 1 29 11 0 0 63 16930
VecNorm 102 1.0 4.7182e-02 1.5 5.51e+06 1.0 0.0e+00 0.0e+00 1.0e+02 0 0 0 0 0 10 5 0 0 32 25216
VecCopy 1 1.0 7.5817e-05 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 102 1.0 2.6131e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
VecAXPY 200 1.0 1.2963e-02 1.3 1.08e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 11 0 0 0 179961
VecAYPX 100 1.0 5.0817e-03 1.6 5.37e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 5 0 0 0 228384
VecScatterBegin 101 1.0 6.9609e-03 2.7 0.00e+00 0.0 1.1e+05 7.2e+03 0.0e+00 0 0 1 1 0 1 0 98100 0 0
VecScatterEnd 101 1.0 2.8235e-02 7.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0
MatMult 101 1.0 1.0444e-01 1.3 3.55e+07 1.0 1.1e+05 7.2e+03 0.0e+00 0 0 1 1 0 20 34 98100 0 72945
MatSolve 101 1.0 8.1540e-02 1.2 3.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 17 33 0 0 0 91020
MatLUFactorNum 1 1.0 5.0550e-03 3.1 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 22675
MatILUFactorSym 1 1.0 2.3129e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 2 1.0 1.1154e-0236.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 0 0 0 0 0
MatAssemblyEnd 2 1.0 4.6539e-03 1.1 0.00e+00 0.0 2.2e+03 1.8e+03 8.0e+00 0 0 0 0 0 1 0 2 0 3 0
MatGetRowIJ 1 1.0 8.1062e-06 8.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 2.6512e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 2 1.0 1.2755e-02 3.3 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 2 1 0 0 0 8986
PCSetUpOnBlocks 1 1.0 7.0622e-03 2.2 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 1 0 0 0 16230
PCApply 101 1.0 9.0092e-02 1.2 3.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 18 33 0 0 0 82380
--- Event Stage 2: Remaining Solves
KSPSolve 100 1.0 2.9067e+01 1.0 1.02e+10 1.0 1.1e+07 7.2e+03 3.0e+04 98 99 99 99 99 100100100100100 75919
VecTDot 20100 1.0 9.0870e+00 1.3 1.09e+09 1.0 0.0e+00 0.0e+00 2.0e+04 27 11 0 0 66 28 11 0 0 66 25800
VecNorm 10200 1.0 2.8064e+00 1.2 5.51e+08 1.0 0.0e+00 0.0e+00 1.0e+04 9 5 0 0 33 9 5 0 0 34 42393
VecCopy 100 1.0 1.1604e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 10100 1.0 2.5848e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 20000 1.0 1.3038e+00 1.3 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 10 0 0 0 4 11 0 0 0 178922
VecAYPX 10000 1.0 5.1897e-01 1.6 5.37e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 223631
VecScatterBegin 10100 1.0 7.0715e-01 2.5 0.00e+00 0.0 1.1e+07 7.2e+03 0.0e+00 2 0 99 99 0 2 0100100 0 0
VecScatterEnd 10100 1.0 1.2401e+00 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0
MatMult 10100 1.0 9.1593e+00 1.2 3.55e+09 1.0 1.1e+07 7.2e+03 0.0e+00 28 34 99 99 0 29 35100100 0 83174
MatSolve 10100 1.0 7.4668e+00 1.1 3.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00 24 33 0 0 0 24 34 0 0 0 99398
PCSetUpOnBlocks 100 1.0 2.2914e-0312.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 10100 1.0 8.0915e+00 1.1 3.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00 26 33 0 0 0 26 34 0 0 0 91724
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 2 2424 0.
DMKSP interface 1 1 656 0.
Vector 4 10 1117680 0.
Matrix 0 4 5974172 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 5 446760 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 2 218528 0.
Preconditioner 1 2 1912 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 1 0 0 0.
Vector 7 1 1656 0.
Matrix 4 0 0 0.
Index Set 5 2 12384 0.
Vec Scatter 1 0 0 0.
Preconditioner 1 0 0 0.
--- Event Stage 2: Remaining Solves
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 1.07765e-05
Average time for zero size MPI_Send(): 5.81035e-06
#PETSc Option Table entries:
-iterations 100
-ksp_max_it 100
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-nodes_per_proc 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
srun: Warning: can't honor --ntasks-per-node set to 36 which doesn't match the requested tasks 48 with the number of requested nodes 48. Ignoring --ntasks-per-node.
using 1728 of 1728 processes
30^3 unknowns per processor
total system size: 360^3
mesh size: 0.0001
initsolve: 100 iterations
solve 1: 100 iterations
solve 2: 100 iterations
solve 3: 100 iterations
solve 4: 100 iterations
solve 5: 100 iterations
solve 6: 100 iterations
solve 7: 100 iterations
solve 8: 100 iterations
solve 9: 100 iterations
solve 10: 100 iterations
solve 20: 100 iterations
solve 30: 100 iterations
solve 40: 100 iterations
solve 50: 100 iterations
solve 60: 100 iterations
solve 70: 100 iterations
solve 80: 100 iterations
solve 90: 100 iterations
solve 100: 100 iterations
Time in solve(): 92.6599 s
Time in KSPSolve(): 92.6245 s (99.9617%)
Number of KSP iterations (total): 10000
Number of solve iterations (total): 100 (ratio: 100.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0255 with 1728 processors, by jczhang Tue Jun 12 10:32:54 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 9.433e+01 1.00008 9.433e+01
Objects: 3.500e+01 1.00000 3.500e+01
Flop: 1.034e+10 1.00537 1.033e+10 1.785e+13
Flop/sec: 1.096e+08 1.00543 1.095e+08 1.892e+11
MPI Messages: 6.123e+04 2.00000 5.613e+04 9.699e+07
MPI Message Lengths: 4.407e+08 2.00000 7.198e+03 6.981e+11
MPI Reductions: 3.064e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.6240e-01 0.2% 0.0000e+00 0.0% 1.901e+04 0.0% 1.802e+03 0.0% 1.700e+01 0.1%
1: First Solve: 1.5073e+00 1.6% 1.7764e+11 1.0% 9.789e+05 1.0% 7.095e+03 1.0% 3.190e+02 1.0%
2: Remaining Solves: 9.2661e+01 98.2% 1.7670e+13 99.0% 9.599e+07 99.0% 7.200e+03 99.0% 3.030e+04 98.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 1.4305e-04 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSidedF 2 1.0 1.8184e-02 8.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
KSPSetUp 2 1.0 9.9397e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 2 0
KSPSolve 1 1.0 1.5071e+00 1.0 1.03e+08 1.0 9.8e+05 7.1e+03 3.2e+02 2 1 1 1 1 100100100100100 117873
VecTDot 201 1.0 6.6377e-01 1.1 1.09e+07 1.0 0.0e+00 0.0e+00 2.0e+02 1 0 0 0 1 43 11 0 0 63 28256
VecNormBarrier 102 1.0 1.8365e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 11 0 0 0 0 0
VecNorm 102 1.0 1.8855e-01 1.2 5.51e+06 1.0 0.0e+00 0.0e+00 1.0e+02 0 0 0 0 0 12 5 0 0 32 50480
VecCopy 1 1.0 8.1062e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 102 1.0 2.6536e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 200 1.0 2.1202e-02 2.2 1.08e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 11 0 0 0 880215
VecAYPX 100 1.0 1.2952e-02 4.4 5.37e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 5 0 0 0 716824
VecScatterBarrie 101 1.0 2.4745e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 16 0 0 0 0 0
VecScatterBegin 101 1.0 1.7003e-02 5.3 0.00e+00 0.0 9.6e+05 7.2e+03 0.0e+00 0 0 1 1 0 0 0 98100 0 0
VecScatterEnd 101 1.0 2.4191e-02 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatMult 101 1.0 3.4479e-01 1.1 3.55e+07 1.0 9.6e+05 7.2e+03 0.0e+00 0 0 1 1 0 21 34 98100 0 177214
MatSolve 101 1.0 8.3888e-02 1.3 3.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 33 0 0 0 707780
MatLUFactorNum 1 1.0 7.0050e-03 4.3 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 135051
MatILUFactorSym 1 1.0 2.3711e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 2 1.0 1.8245e-02 7.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatAssemblyEnd 2 1.0 2.4878e-02 1.3 0.00e+00 0.0 1.9e+04 1.8e+03 8.0e+00 0 0 0 0 0 2 0 2 0 3 0
MatGetRowIJ 1 1.0 3.0041e-0531.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 3.9482e-04 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 2 1.0 1.4109e-02 3.7 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 1 0 0 0 67051
PCSetUpOnBlocks 1 1.0 8.9779e-03 2.9 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 105373
PCApply 101 1.0 9.6632e-02 1.3 3.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 33 0 0 0 614442
--- Event Stage 2: Remaining Solves
KSPSolve 100 1.0 9.2634e+01 1.0 1.02e+10 1.0 9.6e+07 7.2e+03 3.0e+04 98 99 99 99 99 100100100100100 190747
VecTDot 20100 1.0 4.2454e+01 1.1 1.09e+09 1.0 0.0e+00 0.0e+00 2.0e+04 44 11 0 0 66 45 11 0 0 66 44178
VecNormBarrier 10200 1.0 1.1839e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0
VecNorm 10200 1.0 8.3957e+00 1.0 5.51e+08 1.0 0.0e+00 0.0e+00 1.0e+04 9 5 0 0 33 9 5 0 0 34 113365
VecCopy 100 1.0 1.9274e-02 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 10100 1.0 2.6170e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 20000 1.0 1.3767e+00 1.5 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 11 0 0 0 1355593
VecAYPX 10000 1.0 7.4878e-01 2.5 5.37e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 1239961
VecScatterBarrie 10100 1.0 1.2403e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 13 0 0 0 0 13 0 0 0 0 0
VecScatterBegin 10100 1.0 8.7978e-01 2.8 0.00e+00 0.0 9.6e+07 7.2e+03 0.0e+00 1 0 99 99 0 1 0100100 0 0
VecScatterEnd 10100 1.0 1.5403e+00 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatMult 10100 1.0 2.1855e+01 1.1 3.55e+09 1.0 9.6e+07 7.2e+03 0.0e+00 22 34 99 99 0 23 35100100 0 279576
MatSolve 10100 1.0 7.6917e+00 1.2 3.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00 8 33 0 0 0 8 34 0 0 0 771924
PCSetUpOnBlocks 100 1.0 5.4588e-02218.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 10100 1.0 8.3117e+00 1.1 3.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00 8 33 0 0 0 8 34 0 0 0 714350
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 2 2424 0.
DMKSP interface 1 1 656 0.
Vector 4 10 1117680 0.
Matrix 0 4 5974172 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 5 446760 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 2 218528 0.
Preconditioner 1 2 1912 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 1 0 0 0.
Vector 7 1 1656 0.
Matrix 4 0 0 0.
Index Set 5 2 12384 0.
Vec Scatter 1 0 0 0.
Preconditioner 1 0 0 0.
--- Event Stage 2: Remaining Solves
========================================================================================================================
Average time to get PetscTime(): 5.96046e-07
Average time for MPI_Barrier(): 1.4782e-05
Average time for zero size MPI_Send(): 5.14062e-06
#PETSc Option Table entries:
-iterations 100
-ksp_max_it 100
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_sync
-log_view
-mesh_size 1E-4
-nodes_per_proc 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
srun: Warning: can't honor --ntasks-per-node set to 36 which doesn't match the requested tasks 48 with the number of requested nodes 48. Ignoring --ntasks-per-node.
using 1728 of 1728 processes
30^3 unknowns per processor
total system size: 360^3
mesh size: 0.0001
initsolve: 100 iterations
solve 1: 100 iterations
solve 2: 100 iterations
solve 3: 100 iterations
solve 4: 100 iterations
solve 5: 100 iterations
solve 6: 100 iterations
solve 7: 100 iterations
solve 8: 100 iterations
solve 9: 100 iterations
solve 10: 100 iterations
solve 20: 100 iterations
solve 30: 100 iterations
solve 40: 100 iterations
solve 50: 100 iterations
solve 60: 100 iterations
solve 70: 100 iterations
solve 80: 100 iterations
solve 90: 100 iterations
solve 100: 100 iterations
Time in solve(): 87.3816 s
Time in KSPSolve(): 87.3319 s (99.9432%)
Number of KSP iterations (total): 10000
Number of solve iterations (total): 100 (ratio: 100.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0521 with 1728 processors, by jczhang Tue Jun 12 09:55:23 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 8.886e+01 1.00020 8.886e+01
Objects: 3.500e+01 1.00000 3.500e+01
Flop: 1.034e+10 1.00537 1.033e+10 1.785e+13
Flop/sec: 1.164e+08 1.00556 1.162e+08 2.009e+11
MPI Messages: 6.123e+04 2.00000 5.613e+04 9.699e+07
MPI Message Lengths: 4.407e+08 2.00000 7.198e+03 6.981e+11
MPI Reductions: 3.064e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.4006e-01 0.2% 0.0000e+00 0.0% 1.901e+04 0.0% 1.802e+03 0.0% 1.700e+01 0.1%
1: First Solve: 1.3361e+00 1.5% 1.7764e+11 1.0% 9.789e+05 1.0% 7.095e+03 1.0% 3.190e+02 1.0%
2: Remaining Solves: 8.7382e+01 98.3% 1.7670e+13 99.0% 9.599e+07 99.0% 7.200e+03 99.0% 3.030e+04 98.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 1.4186e-04 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSidedF 2 1.0 2.6349e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
KSPSetUp 2 1.0 8.3871e-0311.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 1 0 0 0 2 0
KSPSolve 1 1.0 1.3359e+00 1.0 1.03e+08 1.0 9.8e+05 7.1e+03 3.2e+02 2 1 1 1 1 100100100100100 132978
VecTDot 201 1.0 8.2208e-01 1.1 1.09e+07 1.0 0.0e+00 0.0e+00 2.0e+02 1 0 0 0 1 60 11 0 0 63 22814
VecNorm 102 1.0 2.4688e-01 1.1 5.51e+06 1.0 0.0e+00 0.0e+00 1.0e+02 0 0 0 0 0 17 5 0 0 32 38552
VecCopy 1 1.0 1.8883e-04 5.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 102 1.0 2.9655e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 200 1.0 2.1235e-02 2.3 1.08e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 11 0 0 0 878832
VecAYPX 100 1.0 2.1644e-02 7.5 5.37e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 5 0 0 0 428973
VecScatterBegin 101 1.0 1.7088e-02 6.0 0.00e+00 0.0 9.6e+05 7.2e+03 0.0e+00 0 0 1 1 0 0 0 98100 0 0
VecScatterEnd 101 1.0 3.3919e-02 7.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatMult 101 1.0 1.1378e-01 1.4 3.55e+07 1.0 9.6e+05 7.2e+03 0.0e+00 0 0 1 1 0 7 34 98100 0 537005
MatSolve 101 1.0 9.4168e-02 1.5 3.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 33 0 0 0 630513
MatLUFactorNum 1 1.0 7.2019e-03 4.5 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 131358
MatILUFactorSym 1 1.0 2.3780e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 2 1.0 2.6405e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatAssemblyEnd 2 1.0 2.9236e-02 1.2 0.00e+00 0.0 1.9e+04 1.8e+03 8.0e+00 0 0 0 0 0 2 0 2 0 3 0
MatGetRowIJ 1 1.0 2.0981e-0522.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 2.5988e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 2 1.0 1.3383e-02 3.5 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 1 0 0 0 70690
PCSetUpOnBlocks 1 1.0 8.7321e-03 2.8 5.65e+05 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 108340
PCApply 101 1.0 1.0021e-01 1.4 3.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 6 33 0 0 0 592485
--- Event Stage 2: Remaining Solves
KSPSolve 100 1.0 8.7357e+01 1.0 1.02e+10 1.0 9.6e+07 7.2e+03 3.0e+04 98 99 99 99 99 100100100100100 202269
VecTDot 20100 1.0 5.4682e+01 1.1 1.09e+09 1.0 0.0e+00 0.0e+00 2.0e+04 60 11 0 0 66 61 11 0 0 66 34299
VecNorm 10200 1.0 1.7258e+01 1.2 5.51e+08 1.0 0.0e+00 0.0e+00 1.0e+04 17 5 0 0 33 17 5 0 0 34 55150
VecCopy 100 1.0 1.1919e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 10100 1.0 4.0643e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 20000 1.0 1.4651e+00 1.6 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 10 0 0 0 1 11 0 0 0 1273800
VecAYPX 10000 1.0 5.7149e-01 1.9 5.37e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 1624624
VecScatterBegin 10100 1.0 8.3204e-01 2.8 0.00e+00 0.0 9.6e+07 7.2e+03 0.0e+00 1 0 99 99 0 1 0100100 0 0
VecScatterEnd 10100 1.0 2.8371e+00 5.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatMult 10100 1.0 1.0399e+01 1.3 3.55e+09 1.0 9.6e+07 7.2e+03 0.0e+00 10 34 99 99 0 10 35100100 0 587563
MatSolve 10100 1.0 7.7816e+00 1.2 3.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00 8 33 0 0 0 8 34 0 0 0 763013
PCSetUpOnBlocks 100 1.0 2.7120e-0312.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 10100 1.0 9.1625e+00 1.3 3.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00 9 33 0 0 0 9 34 0 0 0 648014
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 2 2424 0.
DMKSP interface 1 1 656 0.
Vector 4 10 1117680 0.
Matrix 0 4 5974172 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 5 446760 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 2 218528 0.
Preconditioner 1 2 1912 0.
Viewer 1 0 0 0.
--- Event Stage 1: First Solve
Krylov Solver 1 0 0 0.
Vector 7 1 1656 0.
Matrix 4 0 0 0.
Index Set 5 2 12384 0.
Vec Scatter 1 0 0 0.
Preconditioner 1 0 0 0.
--- Event Stage 2: Remaining Solves
========================================================================================================================
Average time to get PetscTime(): 5.00679e-07
Average time for MPI_Barrier(): 1.92165e-05
Average time for zero size MPI_Send(): 7.10013e-06
#PETSc Option Table entries:
-iterations 100
-ksp_max_it 100
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mesh_size 1E-4
-nodes_per_proc 30
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
More information about the petsc-dev
mailing list