[petsc-dev] [petsc-users] Poor weak scaling when solving successive linearsystems
Junchao Zhang
jczhang at mcs.anl.gov
Thu Jun 7 17:52:57 CDT 2018
OK, I have thought that space was a typo. btw, this option does not show up
in -h.
I changed number of ranks to use all cores on each node to avoid misleading
ratio in -log_view. Since one node has 36 cores, I ran with 6^3=216 ranks,
and 12^3=1728 ranks. I also found call counts of MatSOR etc in the two
tests were different. So they are not strict weak scaling tests. I tried to
add -ksp_max_it 6 -pc_mg_levels 6, but still could not make the two have
the same MatSOR count. Anyway, I attached the load balance output.
I find PCApply_MG calls PCMGMCycle_Private, which is recursive and
indirectly calls MatSOR_MPIAIJ. I believe the following code in
MatSOR_MPIAIJ practically syncs {MatSOR, MatMultAdd}_SeqAIJ between
processors through VecScatter at each MG level. If SOR and MatMultAdd are
imbalanced, the cost is accumulated along MG levels and shows up as large
VecScatter cost.
1460: while (its--) {1461: VecScatterBegin
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterBegin.html#VecScatterBegin>(mat->Mvctx,xx,mat->lvec,INSERT_VALUES
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/INSERT_VALUES.html#INSERT_VALUES>,SCATTER_FORWARD
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1462:
VecScatterEnd
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScatterEnd.html#VecScatterEnd>(mat->Mvctx,xx,mat->lvec,INSERT_VALUES
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/INSERT_VALUES.html#INSERT_VALUES>,SCATTER_FORWARD
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);
1464: /* update rhs: bb1 = bb - B*x */1465: VecScale
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecScale.html#VecScale>(mat->lvec,-1.0);1466:
(*mat->B->ops->multadd)(mat->B,mat->lvec,bb,bb1);
1468: /* local sweep */1469:
(*mat->A->ops->sor)(mat->A,bb1,omega,SOR_SYMMETRIC_SWEEP
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSORType.html#MatSORType>,fshift,lits,1,xx);1470:
}
--Junchao Zhang
On Thu, Jun 7, 2018 at 3:11 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
>
>
> > On Jun 7, 2018, at 12:27 PM, Zhang, Junchao <jczhang at mcs.anl.gov> wrote:
> >
> > Searched but could not find this option, -mat_view::load_balance
>
> There is a space between the view and the : load_balance is a
> particular viewer format that causes the printing of load balance
> information about number of nonzeros in the matrix.
>
> Barry
>
> >
> > --Junchao Zhang
> >
> > On Thu, Jun 7, 2018 at 10:46 AM, Smith, Barry F. <bsmith at mcs.anl.gov>
> wrote:
> > So the only surprise in the results is the SOR. It is embarrassingly
> parallel and normally one would not see a jump.
> >
> > The load balance for SOR time 1.5 is better at 1000 processes than for
> 125 processes of 2.1 not worse so this number doesn't easily explain it.
> >
> > Could you run the 125 and 1000 with -mat_view ::load_balance and see
> what you get out?
> >
> > Thanks
> >
> > Barry
> >
> > Notice that the MatSOR time jumps a lot about 5 secs when the -log_sync
> is on. My only guess is that the MatSOR is sharing memory bandwidth (or
> some other resource? cores?) with the VecScatter and for some reason this
> is worse for 1000 cores but I don't know why.
> >
> > > On Jun 6, 2018, at 9:13 PM, Junchao Zhang <jczhang at mcs.anl.gov> wrote:
> > >
> > > Hi, PETSc developers,
> > > I tested Michael Becker's code. The code calls the same KSPSolve 1000
> times in the second stage and needs cubic number of processors to run. I
> ran with 125 ranks and 1000 ranks, with or without -log_sync option. I
> attach the log view output files and a scaling loss excel file.
> > > I profiled the code with 125 processors. It looks {MatSOR, MatMult,
> MatMultAdd, MatMultTranspose, MatMultTransposeAdd}_SeqAIJ in aij.c took
> ~50% of the time, The other half time was spent on waiting in MPI.
> MatSOR_SeqAIJ took 30%, mostly in PetscSparseDenseMinusDot().
> > > I tested it on a 36 cores/node machine. I found 32 ranks/node gave
> better performance (about 10%) than 36 ranks/node in the 125 ranks
> testing. I guess it is because processors in the former had more balanced
> memory bandwidth. I collected PAPI_DP_OPS (double precision operations) and
> PAPI_TOT_CYC (total cycles) of the 125 ranks case (see the attached files).
> It looks ranks at the two ends have less DP_OPS and TOT_CYC.
> > > Does anyone familiar with the algorithm have quick explanations?
> > >
> > > --Junchao Zhang
> > >
> > > On Mon, Jun 4, 2018 at 11:59 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de> wrote:
> > > Hello again,
> > >
> > > this took me longer than I anticipated, but here we go.
> > > I did reruns of the cases where only half the processes per node were
> used (without -log_sync):
> > >
> > > 125 procs,1st 125 procs,2nd
> 1000 procs,1st 1000 procs,2nd
> > > Max Ratio Max Ratio Max
> Ratio Max Ratio
> > > KSPSolve 1.203E+02 1.0 1.210E+02 1.0
> 1.399E+02 1.1 1.365E+02 1.0
> > > VecTDot 6.376E+00 3.7 6.551E+00 4.0
> 7.885E+00 2.9 7.175E+00 3.4
> > > VecNorm 4.579E+00 7.1 5.803E+00 10.2
> 8.534E+00 6.9 6.026E+00 4.9
> > > VecScale 1.070E-01 2.1 1.129E-01 2.2
> 1.301E-01 2.5 1.270E-01 2.4
> > > VecCopy 1.123E-01 1.3 1.149E-01 1.3
> 1.301E-01 1.6 1.359E-01 1.6
> > > VecSet 7.063E-01 1.7 6.968E-01 1.7
> 7.432E-01 1.8 7.425E-01 1.8
> > > VecAXPY 1.166E+00 1.4 1.167E+00 1.4
> 1.221E+00 1.5 1.279E+00 1.6
> > > VecAYPX 1.317E+00 1.6 1.290E+00 1.6
> 1.536E+00 1.9 1.499E+00 2.0
> > > VecScatterBegin 6.142E+00 3.2 5.974E+00 2.8
> 6.448E+00 3.0 6.472E+00 2.9
> > > VecScatterEnd 3.606E+01 4.2 3.551E+01 4.0
> 5.244E+01 2.7 4.995E+01 2.7
> > > MatMult 3.561E+01 1.6 3.403E+01 1.5
> 3.435E+01 1.4 3.332E+01 1.4
> > > MatMultAdd 1.124E+01 2.0 1.130E+01 2.1
> 2.093E+01 2.9 1.995E+01 2.7
> > > MatMultTranspose 1.372E+01 2.5 1.388E+01 2.6
> 1.477E+01 2.2 1.381E+01 2.1
> > > MatSolve 1.949E-02 0.0 1.653E-02 0.0
> 4.789E-02 0.0 4.466E-02 0.0
> > > MatSOR 6.610E+01 1.3 6.673E+01 1.3
> 7.111E+01 1.3 7.105E+01 1.3
> > > MatResidual 2.647E+01 1.7 2.667E+01 1.7
> 2.446E+01 1.4 2.467E+01 1.5
> > > PCSetUpOnBlocks 5.266E-03 1.4 5.295E-03 1.4
> 5.427E-03 1.5 5.289E-03 1.4
> > > PCApply 1.031E+02 1.0 1.035E+02 1.0
> 1.180E+02 1.0 1.164E+02 1.0
> > >
> > > I also slimmed down my code and basically wrote a simple weak scaling
> test (source files attached) so you can profile it yourself. I appreciate
> the offer Junchao, thank you.
> > > You can adjust the system size per processor at runtime via
> "-nodes_per_proc 30" and the number of repeated calls to the function
> containing KSPsolve() via "-iterations 1000". The physical problem is
> simply calculating the electric potential from a homogeneous charge
> distribution, done multiple times to accumulate time in KSPsolve().
> > > A job would be started using something like
> > > mpirun -n 125 ~/petsc_ws/ws_test -nodes_per_proc 30 -mesh_size 1E-4
> -iterations 1000 \\
> > > -ksp_rtol 1E-6 \
> > > -log_view -log_sync\
> > > -pc_type gamg -pc_gamg_type classical\
> > > -ksp_type cg \
> > > -ksp_norm_type unpreconditioned \
> > > -mg_levels_ksp_type richardson \
> > > -mg_levels_ksp_norm_type none \
> > > -mg_levels_pc_type sor \
> > > -mg_levels_ksp_max_it 1 \
> > > -mg_levels_pc_sor_its 1 \
> > > -mg_levels_esteig_ksp_type cg \
> > > -mg_levels_esteig_ksp_max_it 10 \
> > > -gamg_est_ksp_type cg
> > > , ideally started on a cube number of processes for a cubical process
> grid.
> > > Using 125 processes and 10.000 iterations I get the output in
> "log_view_125_new.txt", which shows the same imbalance for me.
> > > Michael
> > >
> > >
> > > Am 02.06.2018 um 13:40 schrieb Mark Adams:
> > >>
> > >>
> > >> On Fri, Jun 1, 2018 at 11:20 PM, Junchao Zhang <jczhang at mcs.anl.gov>
> wrote:
> > >> Hi,Michael,
> > >> You can add -log_sync besides -log_view, which adds barriers to
> certain events but measures barrier time separately from the events. I find
> this option makes it easier to interpret log_view output.
> > >>
> > >> That is great (good to know).
> > >>
> > >> This should give us a better idea if your large VecScatter costs are
> from slow communication or if it catching some sort of load imbalance.
> > >>
> > >>
> > >> --Junchao Zhang
> > >>
> > >> On Wed, May 30, 2018 at 3:27 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de> wrote:
> > >> Barry: On its way. Could take a couple days again.
> > >>
> > >> Junchao: I unfortunately don't have access to a cluster with a faster
> network. This one has a mixed 4X QDR-FDR InfiniBand 2:1 blocking fat-tree
> network, which I realize causes parallel slowdown if the nodes are not
> connected to the same switch. Each node has 24 processors (2x12/socket) and
> four NUMA domains (two for each socket).
> > >> The ranks are usually not distributed perfectly even, i.e. for 125
> processes, of the six required nodes, five would use 21 cores and one 20.
> > >> Would using another CPU type make a difference communication-wise? I
> could switch to faster ones (on the same network), but I always assumed
> this would only improve performance of the stuff that is unrelated to
> communication.
> > >>
> > >> Michael
> > >>
> > >>
> > >>
> > >>> The log files have something like "Average time for zero size
> MPI_Send(): 1.84231e-05". It looks you ran on a cluster with a very slow
> network. A typical machine should give less than 1/10 of the latency you
> have. An easy way to try is just running the code on a machine with a
> faster network and see what happens.
> > >>>
> > >>> Also, how many cores & numa domains does a compute node have? I
> could not figure out how you distributed the 125 MPI ranks evenly.
> > >>>
> > >>> --Junchao Zhang
> > >>>
> > >>> On Tue, May 29, 2018 at 6:18 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de> wrote:
> > >>> Hello again,
> > >>>
> > >>> here are the updated log_view files for 125 and 1000 processors. I
> ran both problems twice, the first time with all processors per node
> allocated ("-1.txt"), the second with only half on twice the number of
> nodes ("-2.txt").
> > >>>
> > >>>>> On May 24, 2018, at 12:24 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de>
> > >>>>> wrote:
> > >>>>>
> > >>>>> I noticed that for every individual KSP iteration, six vector
> objects are created and destroyed (with CG, more with e.g. GMRES).
> > >>>>>
> > >>>> Hmm, it is certainly not intended at vectors be created and
> destroyed within each KSPSolve() could you please point us to the code that
> makes you think they are being created and destroyed? We create all the
> work vectors at KSPSetUp() and destroy them in KSPReset() not during the
> solve. Not that this would be a measurable distance.
> > >>>>
> > >>>
> > >>> I mean this, right in the log_view output:
> > >>>
> > >>>> Memory usage is given in bytes:
> > >>>>
> > >>>> Object Type Creations Destructions Memory Descendants' Mem.
> > >>>> Reports information only for process 0.
> > >>>>
> > >>>> --- Event Stage 0: Main Stage
> > >>>>
> > >>>> ...
> > >>>>
> > >>>> --- Event Stage 1: First Solve
> > >>>>
> > >>>> ...
> > >>>>
> > >>>> --- Event Stage 2: Remaining Solves
> > >>>>
> > >>>> Vector 23904 23904 1295501184 0.
> > >>> I logged the exact number of KSP iterations over the 999 timesteps
> and its exactly 23904/6 = 3984.
> > >>> Michael
> > >>>
> > >>>
> > >>> Am 24.05.2018 um 19:50 schrieb Smith, Barry F.:
> > >>>>
> > >>>> Please send the log file for 1000 with cg as the solver.
> > >>>>
> > >>>> You should make a bar chart of each event for the two cases to
> see which ones are taking more time and which are taking less (we cannot
> tell with the two logs you sent us since they are for different solvers.)
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>> On May 24, 2018, at 12:24 AM, Michael Becker <
> Michael.Becker at physik.uni-giessen.de>
> > >>>>> wrote:
> > >>>>>
> > >>>>> I noticed that for every individual KSP iteration, six vector
> objects are created and destroyed (with CG, more with e.g. GMRES).
> > >>>>>
> > >>>> Hmm, it is certainly not intended at vectors be created and
> destroyed within each KSPSolve() could you please point us to the code that
> makes you think they are being created and destroyed? We create all the
> work vectors at KSPSetUp() and destroy them in KSPReset() not during the
> solve. Not that this would be a measurable distance.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>> This seems kind of wasteful, is this supposed to be like this? Is
> this even the reason for my problems? Apart from that, everything seems
> quite normal to me (but I'm not the expert here).
> > >>>>>
> > >>>>>
> > >>>>> Thanks in advance.
> > >>>>>
> > >>>>> Michael
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> <log_view_125procs.txt><log_vi
> > >>>>> ew_1000procs.txt>
> > >>>>>
> > >>>
> > >>>
> > >>
> > >>
> > >>
> > >
> > >
> > > <o-wstest-125.txt><Scaling-loss.png><o-wstest-1000.txt><
> o-wstest-sync-125.txt><o-wstest-sync-1000.txt><MatSOR_
> SeqAIJ.png><PAPI_TOT_CYC.png><PAPI_DP_OPS.png>
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20180607/ec8c0048/attachment-0001.html>
-------------- next part --------------
using 216 of 216 processes
30^3 unknowns per processor
total system size: 180^3
mesh size: 0.0001
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 186300 avg 188100 max 189000
Mat Object: 216 MPI processes
type: mpiaij
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 161520 avg 188100 max 188520
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 156360 avg 177577 max 189000
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 75656 avg 87908 max 94500
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 75656 avg 87908 max 94500
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 201530 avg 237200 max 256500
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 201530 avg 237200 max 256500
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 85956 avg 102829 max 111569
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 54571 avg 64151 max 69123
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 84688 avg 107835 max 117713
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 83920 avg 107459 max 117667
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 20241 avg 25363 max 27748
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 6042 avg 7152 max 7637
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 3423 avg 5291 max 5994
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 3047 avg 4938 max 5691
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 1105 avg 1767 max 2171
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 284 avg 475 max 584
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 137 avg 484 max 972
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 484 max 7633
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 284 avg 475 max 584
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 413 max 6197
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 139 max 2244
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 34 max 614
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 24 max 752
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 24 max 5282
Mat Object: 216 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 34 max 614
initsolve: 7 iterations
solve 1: 6 iterations
solve 2: 6 iterations
solve 3: 6 iterations
solve 4: 6 iterations
solve 5: 6 iterations
solve 6: 6 iterations
solve 7: 6 iterations
solve 8: 6 iterations
solve 9: 6 iterations
solve 10: 6 iterations
solve 20: 6 iterations
solve 30: 6 iterations
solve 40: 6 iterations
solve 50: 6 iterations
solve 60: 6 iterations
solve 70: 6 iterations
solve 80: 6 iterations
solve 90: 6 iterations
solve 100: 6 iterations
solve 200: 6 iterations
solve 300: 6 iterations
solve 400: 6 iterations
solve 500: 6 iterations
solve 600: 6 iterations
solve 700: 6 iterations
solve 800: 6 iterations
solve 900: 6 iterations
solve 1000: 6 iterations
Time in solve(): 89.4284 s
Time in KSPSolve(): 89.1823 s (99.7248%)
Number of KSP iterations (total): 6000
Number of solve iterations (total): 1000 (ratio: 6.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0140 with 216 processors, by jczhang Thu Jun 7 17:04:25 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 1.916e+02 1.00001 1.916e+02
Objects: 3.044e+04 1.00003 3.044e+04
Flop: 3.177e+10 1.15810 3.035e+10 6.557e+12
Flop/sec: 1.658e+08 1.15810 1.584e+08 3.422e+10
MPI Messages: 1.594e+06 3.50605 1.083e+06 2.339e+08
MPI Message Lengths: 1.961e+09 2.19940 1.466e+03 3.428e+11
MPI Reductions: 3.258e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 1.0241e-01 0.1% 0.0000e+00 0.0% 2.160e+03 0.0% 1.802e+03 0.0% 1.700e+01 0.1%
1: First Solve: 1.0204e+02 53.3% 9.8679e+09 0.2% 7.808e+05 0.3% 4.093e+03 0.9% 5.530e+02 1.7%
2: Remaining Solves: 8.9446e+01 46.7% 6.5467e+12 99.8% 2.331e+08 99.7% 1.457e+03 99.1% 3.200e+04 98.2%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 6.4135e-05 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 10 1.0 3.3987e-03 1.7 0.00e+00 0.0 1.6e+04 4.0e+00 0.0e+00 0 0 0 0 0 0 0 2 0 0 0
BuildTwoSidedF 27 1.0 7.8870e+00 3.1 0.00e+00 0.0 1.2e+04 1.1e+04 0.0e+00 2 0 0 0 0 4 0 2 4 0 0
KSPSetUp 8 1.0 2.9860e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 1 1.0 1.0204e+02 1.0 4.82e+07 1.2 7.8e+05 4.1e+03 5.5e+02 53 0 0 1 2 100100100100100 97
VecTDot 14 1.0 2.9919e-03 2.2 7.56e+05 1.0 0.0e+00 0.0e+00 1.4e+01 0 0 0 0 0 0 2 0 0 3 54578
VecNorm 9 1.0 1.2019e-03 1.8 4.86e+05 1.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 0 0 1 0 0 2 87344
VecScale 35 1.0 3.3951e-04 2.7 9.47e+04 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 48655
VecCopy 1 1.0 1.0705e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 154 1.0 1.9858e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 14 1.0 9.7609e-04 1.2 7.56e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 167297
VecAYPX 42 1.0 1.5566e-03 1.5 6.46e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 88739
VecAssemblyBegin 2 1.0 4.7922e-0522.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 2 1.0 2.9087e-0530.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 150 1.0 5.4379e-03 2.0 0.00e+00 0.0 2.7e+05 1.5e+03 0.0e+00 0 0 0 0 0 0 0 35 12 0 0
VecScatterEnd 150 1.0 1.9689e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 43 1.0 2.1787e-02 1.2 1.05e+07 1.1 9.2e+04 2.1e+03 0.0e+00 0 0 0 0 0 0 22 12 6 0 99634
MatMultAdd 35 1.0 9.9871e-03 1.5 2.40e+06 1.3 4.8e+04 7.1e+02 0.0e+00 0 0 0 0 0 0 5 6 1 0 48362
MatMultTranspose 35 1.0 1.1008e-02 1.4 2.40e+06 1.3 4.8e+04 7.1e+02 0.0e+00 0 0 0 0 0 0 5 6 1 0 43876
MatSolve 7 0.0 2.2888e-04 0.0 8.72e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 381
MatSOR 70 1.0 5.0331e-02 1.1 1.90e+07 1.2 8.3e+04 1.6e+03 1.4e+01 0 0 0 0 0 0 40 11 4 3 77978
MatLUFactorSym 1 1.0 3.8791e-0428.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 3.1900e-0478.7 3.10e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 973
MatResidual 35 1.0 1.7441e-02 1.3 7.97e+06 1.2 8.3e+04 1.6e+03 0.0e+00 0 0 0 0 0 0 17 11 4 0 93440
MatAssemblyBegin 82 1.0 7.8904e+00 3.1 0.00e+00 0.0 1.2e+04 1.1e+04 0.0e+00 2 0 0 0 0 4 0 2 4 0 0
MatAssemblyEnd 82 1.0 7.4100e-02 1.0 0.00e+00 0.0 1.1e+05 6.2e+02 2.1e+02 0 0 0 0 1 0 0 15 2 38 0
MatGetRow 3100265 1.2 4.7804e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 24 0 0 0 0 45 0 0 0 0 0
MatGetRowIJ 1 0.0 3.3140e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 5 1.0 1.8501e-01 2.3 0.00e+00 0.0 1.0e+05 1.8e+04 1.0e+01 0 0 0 1 0 0 0 13 55 2 0
MatCreateSubMat 5 1.0 2.7853e-01 1.0 0.00e+00 0.0 3.6e+04 1.6e+04 8.4e+01 0 0 0 0 0 0 0 5 18 15 0
MatGetOrdering 1 0.0 1.4496e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 5 1.0 3.0473e-02 1.2 0.00e+00 0.0 4.8e+04 1.0e+03 1.0e+01 0 0 0 0 0 0 0 6 2 2 0
MatCoarsen 5 1.0 9.5112e-03 1.1 0.00e+00 0.0 9.2e+04 6.3e+02 3.0e+01 0 0 0 0 0 0 0 12 2 5 0
MatZeroEntries 5 1.0 1.7691e-03 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 26 1.0 5.6732e-01 1.0 0.00e+00 0.0 3.3e+04 1.7e+04 5.1e+01 0 0 0 0 0 1 0 4 18 9 0
MatPtAP 5 1.0 1.3221e-01 1.0 1.13e+07 1.3 1.2e+05 2.7e+03 8.2e+01 0 0 0 0 0 0 23 15 10 15 16915
MatPtAPSymbolic 5 1.0 8.2783e-02 1.0 0.00e+00 0.0 6.1e+04 2.8e+03 3.5e+01 0 0 0 0 0 0 0 8 5 6 0
MatPtAPNumeric 5 1.0 4.9810e-02 1.0 1.13e+07 1.3 5.5e+04 2.6e+03 4.5e+01 0 0 0 0 0 0 23 7 4 8 44898
MatGetLocalMat 5 1.0 2.6979e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 5 1.0 4.0371e-03 1.5 0.00e+00 0.0 3.6e+04 3.7e+03 0.0e+00 0 0 0 0 0 0 0 5 4 0 0
SFSetGraph 10 1.0 9.2030e-05 5.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 10 1.0 5.9166e-03 1.1 0.00e+00 0.0 4.8e+04 6.4e+02 0.0e+00 0 0 0 0 0 0 0 6 1 0 0
SFBcastBegin 40 1.0 1.4107e-03 1.8 0.00e+00 0.0 9.4e+04 7.4e+02 0.0e+00 0 0 0 0 0 0 0 12 2 0 0
SFBcastEnd 40 1.0 2.5785e-03 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
GAMG: createProl 5 1.0 1.0119e+02 1.0 0.00e+00 0.0 3.6e+05 5.4e+03 2.6e+02 53 0 0 1 1 99 0 45 60 46 0
GAMG: partLevel 5 1.0 1.4521e-01 1.0 1.13e+07 1.3 1.2e+05 2.6e+03 1.9e+02 0 0 0 0 1 0 23 15 10 34 15401
repartition 2 1.0 9.1791e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Invert-Sort 2 1.0 6.6185e-04 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 1 0
Move A 2 1.0 3.4759e-03 1.1 0.00e+00 0.0 1.5e+03 9.0e+02 3.6e+01 0 0 0 0 0 0 0 0 0 7 0
Move P 2 1.0 7.2892e-03 1.0 0.00e+00 0.0 1.7e+03 1.7e+01 3.6e+01 0 0 0 0 0 0 0 0 0 7 0
PCSetUp 2 1.0 1.0135e+02 1.0 1.13e+07 1.3 4.7e+05 4.7e+03 4.7e+02 53 0 0 1 1 99 23 61 70 85 22
PCSetUpOnBlocks 7 1.0 1.0257e-03 5.0 3.10e+05 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 302
PCApply 7 1.0 8.5200e-02 1.0 3.18e+07 1.2 2.6e+05 1.3e+03 1.4e+01 0 0 0 0 0 0 66 34 10 3 76535
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 8.9193e+01 1.0 3.17e+10 1.2 2.3e+08 1.5e+03 3.2e+04 47100100 99 98 100100100100100 73399
VecTDot 12000 1.0 5.0107e+00 1.3 6.48e+08 1.0 0.0e+00 0.0e+00 1.2e+04 2 2 0 0 37 5 2 0 0 38 27933
VecNorm 8000 1.0 2.0433e+00 1.1 4.32e+08 1.0 0.0e+00 0.0e+00 8.0e+03 1 1 0 0 25 2 1 0 0 25 45667
VecScale 30000 1.0 1.7645e-01 1.7 8.12e+07 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 80243
VecCopy 1000 1.0 8.1942e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 108000 1.0 1.3471e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 12000 1.0 8.1873e-01 1.2 6.48e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 1 2 0 0 0 170957
VecAYPX 36000 1.0 1.1726e+00 1.3 5.50e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 100259
VecScatterBegin 127000 1.0 4.3927e+00 2.1 0.00e+00 0.0 2.3e+08 1.5e+03 0.0e+00 2 0100 99 0 4 0100100 0 0
VecScatterEnd 127000 1.0 2.1218e+01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 17 0 0 0 0 0
MatMult 37000 1.0 1.9416e+01 1.2 9.03e+09 1.1 7.9e+07 2.1e+03 0.0e+00 9 29 34 49 0 19 29 34 49 0 96389
MatMultAdd 30000 1.0 1.1328e+01 1.7 2.06e+09 1.3 4.1e+07 7.1e+02 0.0e+00 4 6 18 9 0 10 6 18 9 0 36548
MatMultTranspose 30000 1.0 1.0679e+01 1.6 2.06e+09 1.3 4.1e+07 7.1e+02 0.0e+00 4 6 18 9 0 9 6 18 9 0 38767
MatSolve 6000 0.0 1.0994e-01 0.0 7.48e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 680
MatSOR 60000 1.0 4.4873e+01 1.1 1.63e+10 1.2 7.1e+07 1.6e+03 1.2e+04 22 51 31 33 37 48 51 31 33 38 74798
MatResidual 30000 1.0 1.5853e+01 1.2 6.83e+09 1.2 7.1e+07 1.6e+03 0.0e+00 7 21 31 33 0 16 21 31 33 0 88112
PCSetUpOnBlocks 6000 1.0 9.1378e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 6000 1.0 7.7131e+01 1.0 2.72e+10 1.2 2.3e+08 1.3e+03 1.2e+04 40 85 96 83 37 86 85 97 84 38 72361
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 8 10120 0.
DMKSP interface 1 1 656 0.
Vector 4 45 2361256 0.
Matrix 0 59 14313348 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 14 247728 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 12 231168 0.
Preconditioner 1 8 8692 0.
Viewer 1 2 1680 0.
Application Order 0 1 46656664 0.
--- Event Stage 1: First Solve
Krylov Solver 7 0 0 0.
Vector 137 96 3375264 0.
Matrix 124 65 27659940 0.
Matrix Coarsen 5 5 3180 0.
Index Set 102 90 24085864 0.
Star Forest Graph 10 10 8640 0.
Vec Scatter 28 17 21488 0.
Preconditioner 7 0 0 0.
Viewer 2 0 0 0.
Application Order 1 0 0 0.
--- Event Stage 2: Remaining Solves
Vector 30000 30000 1940160000 0.
========================================================================================================================
Average time to get PetscTime(): 6.19888e-07
Average time for MPI_Barrier(): 1.00136e-05
Average time for zero size MPI_Send(): 6.69007e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mat_view ::load_balance
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_mg_levels 6
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
-------------- next part --------------
srun: Warning: can't honor --ntasks-per-node set to 36 which doesn't match the requested tasks 48 with the number of requested nodes 48. Ignoring --ntasks-per-node.
using 1728 of 1728 processes
30^3 unknowns per processor
total system size: 360^3
mesh size: 0.0001
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 186300 avg 188550 max 189000
Mat Object: 1728 MPI processes
type: mpiaij
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 161490 avg 188550 max 188850
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 156360 avg 183219 max 189000
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 75656 avg 91164 max 94500
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 75656 avg 91164 max 94500
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 201530 avg 246725 max 256500
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 201530 avg 246725 max 256500
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 85956 avg 107132 max 111569
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 54571 avg 66550 max 69123
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 84688 avg 112657 max 117713
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 83920 avg 112441 max 117667
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 20241 avg 26366 max 27748
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 6042 avg 7328 max 7637
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 3423 avg 5508 max 5994
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 3047 avg 5197 max 5691
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 1105 avg 1934 max 2180
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 284 avg 479 max 584
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 137 avg 542 max 972
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 542 max 8392
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 284 avg 479 max 584
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 493 max 7084
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 145 max 2349
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 31 max 670
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 24 max 1100
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 24 max 42986
Mat Object: 1728 MPI processes
type: mpiaij
Load Balance - Nonzeros: Min 0 avg 31 max 670
initsolve: 8 iterations
solve 1: 6 iterations
solve 2: 6 iterations
solve 3: 6 iterations
solve 4: 6 iterations
solve 5: 6 iterations
solve 6: 6 iterations
solve 7: 6 iterations
solve 8: 6 iterations
solve 9: 6 iterations
solve 10: 6 iterations
solve 20: 6 iterations
solve 30: 6 iterations
solve 40: 6 iterations
solve 50: 6 iterations
solve 60: 6 iterations
solve 70: 6 iterations
solve 80: 6 iterations
solve 90: 6 iterations
solve 100: 6 iterations
solve 200: 6 iterations
solve 300: 6 iterations
solve 400: 6 iterations
solve 500: 6 iterations
solve 600: 6 iterations
solve 700: 6 iterations
solve 800: 6 iterations
solve 900: 6 iterations
solve 1000: 6 iterations
Time in solve(): 120.025 s
Time in KSPSolve(): 119.738 s (99.7606%)
Number of KSP iterations (total): 6000
Number of solve iterations (total): 1000 (ratio: 6.00)
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./wstest on a intel-bdw-opt named bdw-0545 with 1728 processors, by jczhang Thu Jun 7 17:05:39 2018
Using Petsc Development GIT revision: v3.9.2-570-g68f20b90 GIT Date: 2018-06-04 15:39:16 +0200
Max Max/Min Avg Total
Time (sec): 2.315e+02 1.00001 2.315e+02
Objects: 3.544e+04 1.00003 3.544e+04
Flop: 3.637e+10 1.16136 3.554e+10 6.141e+13
Flop/sec: 1.571e+08 1.16136 1.535e+08 2.653e+11
MPI Messages: 2.226e+06 4.17170 1.509e+06 2.608e+09
MPI Message Lengths: 2.235e+09 2.20450 1.340e+03 3.494e+12
MPI Reductions: 3.560e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 8.5928e-02 0.0% 0.0000e+00 0.0% 1.901e+04 0.0% 1.802e+03 0.0% 1.700e+01 0.0%
1: First Solve: 1.1133e+02 48.1% 8.9706e+10 0.1% 8.086e+06 0.3% 3.671e+03 0.8% 5.810e+02 1.6%
2: Remaining Solves: 1.2004e+02 51.9% 6.1318e+13 99.9% 2.600e+09 99.7% 1.332e+03 99.1% 3.500e+04 98.3%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecSet 2 1.0 1.2875e-04 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: First Solve
BuildTwoSided 10 1.0 4.9443e-03 1.6 0.00e+00 0.0 1.6e+05 4.0e+00 0.0e+00 0 0 0 0 0 0 0 2 0 0 0
BuildTwoSidedF 27 1.0 1.1099e+01 4.1 0.00e+00 0.0 1.2e+05 1.1e+04 0.0e+00 2 0 0 0 0 4 0 1 4 0 0
KSPSetUp 8 1.0 1.9672e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 0 0 0 0 0 3 0
KSPSolve 1 1.0 1.1133e+02 1.0 6.83e+07 1.5 8.1e+06 3.7e+03 5.8e+02 48 0 0 1 2 100100100100100 806
VecTDot 16 1.0 9.3598e-03 1.7 8.64e+05 1.0 0.0e+00 0.0e+00 1.6e+01 0 0 0 0 0 0 2 0 0 3 159508
VecNorm 10 1.0 3.8018e-03 2.8 5.40e+05 1.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 1 0 0 2 245440
VecScale 40 1.0 2.3422e-0320.4 1.08e+05 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 72283
VecCopy 1 1.0 1.5903e-04 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 172 1.0 3.5458e-03 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 16 1.0 1.1342e-03 1.3 8.64e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 2 0 0 0 1316389
VecAYPX 48 1.0 1.8997e-03 1.7 7.42e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 671749
VecAssemblyBegin 2 1.0 5.9843e-0562.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 2 1.0 7.6056e-0579.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 171 1.0 6.8316e-03 2.3 0.00e+00 0.0 3.0e+06 1.4e+03 0.0e+00 0 0 0 0 0 0 0 37 14 0 0
VecScatterEnd 171 1.0 6.3600e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 49 1.0 3.5715e-02 1.7 1.19e+07 1.1 1.0e+06 2.0e+03 0.0e+00 0 0 0 0 0 0 23 12 7 0 565174
MatMultAdd 40 1.0 4.9321e-02 4.7 2.75e+06 1.3 5.3e+05 6.6e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 92805
MatMultTranspose 40 1.0 2.4180e-02 2.9 2.75e+06 1.3 5.3e+05 6.6e+02 0.0e+00 0 0 0 0 0 0 5 7 1 0 189301
MatSolve 8 0.0 1.4651e-03 0.0 1.89e+06 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1293
MatSOR 80 1.0 7.9000e-02 1.3 2.18e+07 1.2 9.2e+05 1.5e+03 1.6e+01 0 0 0 0 0 0 41 11 5 3 464960
MatLUFactorSym 1 1.0 4.4470e-03373.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 1.3872e-024848.7 2.12e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1532
MatResidual 40 1.0 3.0815e-02 2.0 9.11e+06 1.2 9.2e+05 1.5e+03 0.0e+00 0 0 0 0 0 0 17 11 5 0 497042
MatAssemblyBegin 82 1.0 1.1102e+01 4.1 0.00e+00 0.0 1.2e+05 1.1e+04 0.0e+00 2 0 0 0 0 4 0 1 4 0 0
MatAssemblyEnd 82 1.0 1.2929e-01 1.1 0.00e+00 0.0 1.1e+06 5.2e+02 2.1e+02 0 0 0 0 1 0 0 14 2 36 0
MatGetRow 3100266 1.2 5.0643e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 21 0 0 0 0 43 0 0 0 0 0
MatGetRowIJ 1 0.0 1.6308e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatCreateSubMats 5 1.0 1.9433e-01 2.2 0.00e+00 0.0 1.0e+06 1.6e+04 1.0e+01 0 0 0 0 0 0 0 13 56 2 0
MatCreateSubMat 5 1.0 1.8586e+00 1.0 0.00e+00 0.0 3.7e+05 1.3e+04 8.4e+01 1 0 0 0 0 2 0 5 16 14 0
MatGetOrdering 1 0.0 4.2415e-04 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 5 1.0 8.5395e-02 1.1 0.00e+00 0.0 4.6e+05 9.9e+02 1.0e+01 0 0 0 0 0 0 0 6 2 2 0
MatCoarsen 5 1.0 2.5278e-02 1.2 0.00e+00 0.0 9.7e+05 5.5e+02 5.2e+01 0 0 0 0 0 0 0 12 2 9 0
MatZeroEntries 5 1.0 1.6418e-03 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 26 1.0 3.8725e+00 1.0 0.00e+00 0.0 3.3e+05 1.4e+04 5.1e+01 2 0 0 0 0 3 0 4 16 9 0
MatPtAP 5 1.0 2.0472e-01 1.0 1.11e+07 1.3 1.1e+06 2.5e+03 8.3e+01 0 0 0 0 0 0 21 14 9 14 89957
MatPtAPSymbolic 5 1.0 1.2353e-01 1.0 0.00e+00 0.0 5.8e+05 2.7e+03 3.5e+01 0 0 0 0 0 0 0 7 5 6 0
MatPtAPNumeric 5 1.0 8.0794e-02 1.0 1.11e+07 1.3 5.5e+05 2.3e+03 4.5e+01 0 0 0 0 0 0 21 7 4 8 227941
MatGetLocalMat 5 1.0 2.8760e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetBrAoCol 5 1.0 4.8778e-03 1.8 0.00e+00 0.0 3.4e+05 3.4e+03 0.0e+00 0 0 0 0 0 0 0 4 4 0 0
SFSetGraph 10 1.0 1.1182e-0419.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetUp 10 1.0 8.0597e-03 1.2 0.00e+00 0.0 4.8e+05 5.8e+02 0.0e+00 0 0 0 0 0 0 0 6 1 0 0
SFBcastBegin 62 1.0 2.1942e-03 2.3 0.00e+00 0.0 1.0e+06 6.4e+02 0.0e+00 0 0 0 0 0 0 0 12 2 0 0
SFBcastEnd 62 1.0 6.9718e-03 5.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
GAMG: createProl 5 1.0 1.0694e+02 1.0 0.00e+00 0.0 3.6e+06 5.1e+03 2.8e+02 46 0 0 1 1 96 0 44 61 48 0
GAMG: partLevel 5 1.0 2.7904e-01 1.0 1.11e+07 1.3 1.2e+06 2.4e+03 1.9e+02 0 0 0 0 1 0 21 14 10 33 65998
repartition 2 1.0 1.8520e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 2 0
Invert-Sort 2 1.0 4.2000e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 1 0
Move A 2 1.0 4.0763e-02 1.0 0.00e+00 0.0 1.6e+04 7.9e+02 3.6e+01 0 0 0 0 0 0 0 0 0 6 0
Move P 2 1.0 2.8355e-02 1.1 0.00e+00 0.0 2.2e+04 1.3e+01 3.6e+01 0 0 0 0 0 0 0 0 0 6 0
PCSetUp 2 1.0 1.0727e+02 1.0 2.98e+07 3.5 4.8e+06 4.4e+03 4.9e+02 46 0 0 1 1 96 21 59 71 85 172
PCSetUpOnBlocks 8 1.0 1.8798e-02202.7 2.12e+07 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1130
PCApply 8 1.0 1.5085e-01 1.0 5.39e+07 1.8 2.9e+06 1.2e+03 1.6e+01 0 0 0 0 0 0 68 36 11 3 405880
--- Event Stage 2: Remaining Solves
KSPSolve 1000 1.0 1.1975e+02 1.0 3.63e+10 1.2 2.6e+09 1.3e+03 3.5e+04 52100100 99 98 100100100100100 512039
VecTDot 13000 1.0 9.7158e+00 1.3 7.02e+08 1.0 0.0e+00 0.0e+00 1.3e+04 4 2 0 0 37 7 2 0 0 37 124852
VecNorm 8000 1.0 2.9320e+00 1.1 4.32e+08 1.0 0.0e+00 0.0e+00 8.0e+03 1 1 0 0 22 2 1 0 0 23 254601
VecScale 35000 1.0 2.7666e-01 2.7 9.47e+07 2.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 535462
VecCopy 1000 1.0 8.3770e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 126000 1.0 1.5955e+00 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 12000 1.0 8.3211e-01 1.2 6.48e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 1 2 0 0 0 1345675
VecAYPX 41000 1.0 1.3790e+00 1.5 5.92e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 737825
VecScatterBegin 147000 1.0 5.5527e+00 2.3 0.00e+00 0.0 2.6e+09 1.3e+03 0.0e+00 2 0100 99 0 4 0100100 0 0
VecScatterEnd 147000 1.0 3.3839e+01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 20 0 0 0 0 0
MatMult 42000 1.0 2.4343e+01 1.4 1.01e+10 1.1 8.7e+08 1.9e+03 0.0e+00 9 28 33 48 0 16 28 33 48 0 703788
MatMultAdd 35000 1.0 2.0074e+01 2.3 2.40e+09 1.3 4.6e+08 6.6e+02 0.0e+00 7 7 18 9 0 14 7 18 9 0 199518
MatMultTranspose 35000 1.0 1.7168e+01 2.3 2.40e+09 1.3 4.6e+08 6.6e+02 0.0e+00 4 7 18 9 0 8 7 18 9 0 233286
MatSolve 7000 0.0 1.3333e+00 0.0 1.66e+09 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1244
MatSOR 70000 1.0 5.9088e+01 1.1 1.90e+10 1.2 8.0e+08 1.5e+03 1.4e+04 24 52 31 34 39 46 52 31 34 40 542874
MatResidual 35000 1.0 2.1124e+01 1.5 7.97e+09 1.2 8.0e+08 1.5e+03 0.0e+00 7 22 31 34 0 14 22 31 34 0 634453
PCSetUpOnBlocks 7000 1.0 1.1204e-0119.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 7000 1.0 1.0287e+02 1.0 3.18e+10 1.2 2.5e+09 1.2e+03 1.4e+04 44 87 97 85 39 85 87 97 86 40 519965
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 8 10120 0.
DMKSP interface 1 1 656 0.
Vector 4 45 2366712 0.
Matrix 0 59 16548712 0.
Distributed Mesh 1 1 5248 0.
Index Set 2 14 305000 0.
IS L to G Mapping 1 1 131728 0.
Star Forest Graph 2 2 1728 0.
Discrete System 1 1 932 0.
Vec Scatter 1 12 231168 0.
Preconditioner 1 8 8692 0.
Viewer 1 2 1680 0.
Application Order 0 1 373248664 0.
--- Event Stage 1: First Solve
Krylov Solver 7 0 0 0.
Vector 142 101 3702616 0.
Matrix 124 65 27964988 0.
Matrix Coarsen 5 5 3180 0.
Index Set 102 90 187439200 0.
Star Forest Graph 10 10 8640 0.
Vec Scatter 28 17 21488 0.
Preconditioner 7 0 0 0.
Viewer 2 0 0 0.
Application Order 1 0 0 0.
--- Event Stage 2: Remaining Solves
Vector 35000 35000 2262792000 0.
========================================================================================================================
Average time to get PetscTime(): 6.19888e-07
Average time for MPI_Barrier(): 1.27792e-05
Average time for zero size MPI_Send(): 6.85591e-06
#PETSc Option Table entries:
-gamg_est_ksp_type cg
-iterations 1000
-ksp_norm_type unpreconditioned
-ksp_rtol 1E-6
-ksp_type cg
-log_view
-mat_view ::load_balance
-mesh_size 1E-4
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_max_it 1
-mg_levels_ksp_norm_type none
-mg_levels_ksp_type richardson
-mg_levels_pc_sor_its 1
-mg_levels_pc_type sor
-nodes_per_proc 30
-pc_gamg_type classical
-pc_mg_levels 6
-pc_type gamg
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-debugging=no --COPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --CXXOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --FOPTFLAGS="-g -O3 -DPETSC_KERNEL_USE_UNROLL_4" --with-openmp=1 --download-sowing --download-fblaslapack=1 --download-scalapack=1 --download-metis=1 --download-parmetis=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --PETSC_ARCH=intel-bdw-opt --PETSC_DIR=/home/jczhang/petsc
-----------------------------------------
Libraries compiled on 2018-06-05 18:40:55 on beboplogin2
Machine characteristics: Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Using PETSc directory: /home/jczhang/petsc
Using PETSc arch: intel-bdw-opt
-----------------------------------------
Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O3 -DPETSC_KERNEL_USE_UNROLL_4 -fopenmp
-----------------------------------------
Using include paths: -I/home/jczhang/petsc/include -I/home/jczhang/petsc/intel-bdw-opt/include
-----------------------------------------
Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -lpetsc -Wl,-rpath,/home/jczhang/petsc/intel-bdw-opt/lib -L/home/jczhang/petsc/intel-bdw-opt/lib -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib/debug_mt -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-mpi-2018.0.128-afy57nutkjquvasoogql4bmgwdjdhtbi/compilers_and_libraries_2018.0.128/linux/mpi/intel64/lib -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc/x86_64-suse-linux/4.9.1 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib/gcc -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib64 -Wl,-rpath,/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -L/blues/gpfs/home/jczhang/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/hpctoolkit-2017.06-557cxm5zivsflxdq5sqgcx3j6z7ybn6n/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/tbb/lib/intel64_lin/gcc4.7 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/lib -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/tbb/lib/intel64/gcc4.4 -Wl,-rpath,/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -L/blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/gcc-4.8.5/intel-17.0.4-74uvhjiulyqgvsmywifbbuo46v5n42xc/lib/intel64 -Wl,-rpath,/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -L/blues/gpfs/home/software/bebop/craype-17.02-1-knl/opt/gcc/4.9.1/snos/lib -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib/debug_mt -Wl,-rpath,/opt/intel/mpi-rt/2017.0.0/intel64/lib -lscalapack -lflapack -lfblas -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lmpigi -lrt -lpthread -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------
More information about the petsc-dev
mailing list