[petsc-dev] Kokkos/Crusher perforance
Mark Adams
mfadams at lbl.gov
Tue Jan 25 07:29:48 CST 2022
adding Suyash,
I found the/a problem. Using ex56, which has a crappy decomposition, using
one MPI process/GPU is much faster than using 8 (64 total). (I am looking
at ex13 to see how much of this is due to the decomposition)
If you only use 8 processes it seems that all 8 are put on the first GPU,
but adding -c8 seems to fix this.
Now the numbers are looking reasonable.
On Mon, Jan 24, 2022 at 3:24 PM Barry Smith <bsmith at petsc.dev> wrote:
>
> For this, to start, someone can run
>
> src/vec/vec/tutorials/performance.c
>
> and compare the performance to that in the technical report Evaluation of
> PETSc on a Heterogeneous Architecture \\ the OLCF Summit System \\ Part I:
> Vector Node Performance. Google to find. One does not have to and shouldn't
> do an extensive study right now that compares everything, instead one
> should run a very small number of different size problems (make them big)
> and compare those sizes with what Summit gives. Note you will need to make
> sure that performance.c uses the Kokkos backend.
>
> One hopes for better performance than Summit; if one gets tons worse we
> know something is very wrong somewhere. I'd love to see some comparisons.
>
> Barry
>
>
> On Jan 24, 2022, at 3:06 PM, Justin Chang <jychang48 at gmail.com> wrote:
>
> Also, do you guys have an OLCF liaison? That's actually your better bet if
> you do.
>
> Performance issues with ROCm/Kokkos are pretty common in apps besides just
> PETSc. We have several teams actively working on rectifying this. However,
> I think performance issues can be quicker to identify if we had a more
> "official" and reproducible PETSc GPU benchmark, which I've already
> expressed to some folks in this thread, and as others already commented on
> the difficulty of such a task. Hopefully I will have more time soon to
> illustrate what I am thinking.
>
> On Mon, Jan 24, 2022 at 1:57 PM Justin Chang <jychang48 at gmail.com> wrote:
>
>> My name has been called.
>>
>> Mark, if you're having issues with Crusher, please contact Veronica
>> Vergara (vergaravg at ornl.gov). You can cc me (justin.chang at amd.com) in
>> those emails
>>
>> On Mon, Jan 24, 2022 at 1:49 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>
>>> On Jan 24, 2022, at 2:46 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> Yea, CG/Jacobi is as close to a benchmark code as we could want. I could
>>> run this on one processor to get cleaner numbers.
>>>
>>> Is there a designated ECP technical support contact?
>>>
>>>
>>> Mark, you've forgotten you work for DOE. There isn't a non-ECP
>>> technical support contact.
>>>
>>> But if this is an AMD machine then maybe contact Matt's student
>>> Justin Chang?
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jan 24, 2022 at 2:18 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>>
>>>> I think you should contact the crusher ECP technical support team and
>>>> tell them you are getting dismel performance and ask if you should expect
>>>> better. Don't waste time flogging a dead horse.
>>>>
>>>> On Jan 24, 2022, at 2:16 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>
>>>> On Mon, Jan 24, 2022 at 2:11 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Jan 24, 2022 at 12:55 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang <
>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>
>>>>>>> Mark, I think you can benchmark individual vector operations, and
>>>>>>> once we get reasonable profiling results, we can move to solvers etc.
>>>>>>>
>>>>>>
>>>>>> Can you suggest a code to run or are you suggesting making a vector
>>>>>> benchmark code?
>>>>>>
>>>>> Make a vector benchmark code, testing vector operations that would be
>>>>> used in your solver.
>>>>> Also, we can run MatMult() to see if the profiling result is
>>>>> reasonable.
>>>>> Only once we get some solid results on basic operations, it is useful
>>>>> to run big codes.
>>>>>
>>>>
>>>> So we have to make another throw-away code? Why not just look at the
>>>> vector ops in Mark's actual code?
>>>>
>>>> Matt
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> --Junchao Zhang
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 24, 2022 at 12:09 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 24, 2022 at 12:44 PM Barry Smith <bsmith at petsc.dev>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here except for VecNorm the GPU is used effectively in that most
>>>>>>>>> of the time is time is spent doing real work on the GPU
>>>>>>>>>
>>>>>>>>> VecNorm 402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00
>>>>>>>>> 0.0e+00 4.0e+02 0 1 0 0 20 9 1 0 0 33 30230 225393 0
>>>>>>>>> 0.00e+00 0 0.00e+00 100
>>>>>>>>>
>>>>>>>>> Even the dots are very effective, only the VecNorm flop rate over
>>>>>>>>> the full time is much much lower than the vecdot. Which is somehow due to
>>>>>>>>> the use of the GPU or CPU MPI in the allreduce?
>>>>>>>>>
>>>>>>>>
>>>>>>>> The VecNorm GPU rate is relatively high on Crusher and the CPU rate
>>>>>>>> is about the same as the other vec ops. I don't know what to make of that.
>>>>>>>>
>>>>>>>> But Crusher is clearly not crushing it.
>>>>>>>>
>>>>>>>> Junchao: Perhaps we should ask Kokkos if they have any experience
>>>>>>>> with Crusher that they can share. They could very well find some low level
>>>>>>>> magic.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Mark, can we compare with Spock?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Looks much better. This puts two processes/GPU because there are
>>>>>>>>> only 4.
>>>>>>>>> <jac_out_001_kokkos_Spock_6_1_notpl.txt>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220125/429aba48/attachment-0001.html>
-------------- next part --------------
0 KSP Residual norm 1.410853326294e+00
1 KSP Residual norm 3.308114929726e+00
2 KSP Residual norm 5.268571496560e+00
3 KSP Residual norm 6.149538104592e+00
4 KSP Residual norm 5.850118312153e+00
5 KSP Residual norm 6.691885163871e+00
6 KSP Residual norm 6.804517562756e+00
7 KSP Residual norm 7.197858569937e+00
8 KSP Residual norm 7.822478314857e+00
9 KSP Residual norm 8.202105022638e+00
10 KSP Residual norm 8.939894492312e+00
11 KSP Residual norm 9.429993430012e+00
12 KSP Residual norm 9.605492804767e+00
13 KSP Residual norm 9.640964280678e+00
14 KSP Residual norm 9.298652856327e+00
15 KSP Residual norm 8.688325517281e+00
16 KSP Residual norm 8.103300011658e+00
17 KSP Residual norm 7.657056535579e+00
18 KSP Residual norm 7.274740905565e+00
19 KSP Residual norm 6.989367099698e+00
20 KSP Residual norm 6.693292777717e+00
21 KSP Residual norm 6.264239515746e+00
22 KSP Residual norm 5.946230942315e+00
23 KSP Residual norm 5.461074143939e+00
24 KSP Residual norm 5.000139937199e+00
25 KSP Residual norm 4.690147106850e+00
26 KSP Residual norm 4.340975483114e+00
27 KSP Residual norm 4.216407821646e+00
28 KSP Residual norm 4.075379410030e+00
29 KSP Residual norm 4.093724077948e+00
30 KSP Residual norm 3.972717435085e+00
31 KSP Residual norm 3.757728119779e+00
32 KSP Residual norm 3.540607563741e+00
33 KSP Residual norm 3.431062851880e+00
34 KSP Residual norm 3.450360009855e+00
35 KSP Residual norm 3.593502735404e+00
36 KSP Residual norm 3.780832581840e+00
37 KSP Residual norm 3.905447434318e+00
38 KSP Residual norm 3.984131419229e+00
39 KSP Residual norm 3.945938933976e+00
40 KSP Residual norm 3.553422818113e+00
41 KSP Residual norm 2.938844893302e+00
42 KSP Residual norm 2.809545432521e+00
43 KSP Residual norm 2.953724603153e+00
44 KSP Residual norm 2.944856948692e+00
45 KSP Residual norm 2.714548772425e+00
46 KSP Residual norm 2.757853041702e+00
47 KSP Residual norm 2.802728332990e+00
48 KSP Residual norm 2.733707284580e+00
49 KSP Residual norm 2.795310289754e+00
50 KSP Residual norm 2.885286206575e+00
51 KSP Residual norm 2.840587445960e+00
52 KSP Residual norm 2.986739512809e+00
53 KSP Residual norm 3.038967844916e+00
54 KSP Residual norm 3.120224614592e+00
55 KSP Residual norm 3.252584908500e+00
56 KSP Residual norm 3.329078354051e+00
57 KSP Residual norm 3.493538794345e+00
58 KSP Residual norm 3.693624595560e+00
59 KSP Residual norm 3.946156830176e+00
60 KSP Residual norm 4.372813538537e+00
61 KSP Residual norm 4.793425118505e+00
62 KSP Residual norm 5.506707673470e+00
63 KSP Residual norm 6.150469745023e+00
64 KSP Residual norm 7.009152654362e+00
65 KSP Residual norm 8.253999190110e+00
66 KSP Residual norm 9.773686873303e+00
67 KSP Residual norm 1.174201878873e+01
68 KSP Residual norm 1.396810766198e+01
69 KSP Residual norm 1.531938038251e+01
70 KSP Residual norm 1.513815060009e+01
71 KSP Residual norm 1.351504569209e+01
72 KSP Residual norm 1.189818271063e+01
73 KSP Residual norm 1.055982729886e+01
74 KSP Residual norm 9.291111182468e+00
75 KSP Residual norm 8.994372539499e+00
76 KSP Residual norm 9.974014612561e+00
77 KSP Residual norm 1.127854042048e+01
78 KSP Residual norm 1.252496528261e+01
79 KSP Residual norm 1.418696243993e+01
80 KSP Residual norm 1.532377955119e+01
81 KSP Residual norm 1.370656960788e+01
82 KSP Residual norm 1.180429013782e+01
83 KSP Residual norm 1.003617095145e+01
84 KSP Residual norm 8.394450117817e+00
85 KSP Residual norm 6.899686914524e+00
86 KSP Residual norm 6.179350449619e+00
87 KSP Residual norm 5.565154073979e+00
88 KSP Residual norm 5.150487367510e+00
89 KSP Residual norm 4.999864016175e+00
90 KSP Residual norm 4.869910941255e+00
91 KSP Residual norm 4.744777237912e+00
92 KSP Residual norm 4.753059736768e+00
93 KSP Residual norm 4.746021509746e+00
94 KSP Residual norm 4.676154678970e+00
95 KSP Residual norm 4.667939895068e+00
96 KSP Residual norm 4.982168193998e+00
97 KSP Residual norm 5.376230525346e+00
98 KSP Residual norm 6.027223402693e+00
99 KSP Residual norm 6.688770388651e+00
100 KSP Residual norm 7.685272624683e+00
101 KSP Residual norm 8.540315337448e+00
102 KSP Residual norm 9.039414712941e+00
103 KSP Residual norm 9.412267211525e+00
104 KSP Residual norm 9.404393063521e+00
105 KSP Residual norm 9.809809633962e+00
106 KSP Residual norm 1.019997954431e+01
107 KSP Residual norm 1.032798037382e+01
108 KSP Residual norm 1.018368040001e+01
109 KSP Residual norm 9.032578302284e+00
110 KSP Residual norm 7.511728677100e+00
111 KSP Residual norm 6.320399999215e+00
112 KSP Residual norm 5.638446159168e+00
113 KSP Residual norm 5.503768021011e+00
114 KSP Residual norm 5.781512507352e+00
115 KSP Residual norm 6.668193746580e+00
116 KSP Residual norm 8.289840511454e+00
117 KSP Residual norm 9.602543908825e+00
118 KSP Residual norm 9.885225641874e+00
119 KSP Residual norm 9.475771653754e+00
120 KSP Residual norm 9.253307705621e+00
121 KSP Residual norm 9.188703825743e+00
122 KSP Residual norm 8.982425406803e+00
123 KSP Residual norm 9.029965071148e+00
124 KSP Residual norm 8.936472797372e+00
125 KSP Residual norm 8.847701213231e+00
126 KSP Residual norm 8.850219067523e+00
127 KSP Residual norm 8.883966846716e+00
128 KSP Residual norm 8.822082961919e+00
129 KSP Residual norm 9.144573911170e+00
130 KSP Residual norm 9.210998384025e+00
131 KSP Residual norm 8.767074129481e+00
132 KSP Residual norm 8.653932024226e+00
133 KSP Residual norm 8.738817183375e+00
134 KSP Residual norm 8.847719520860e+00
135 KSP Residual norm 8.823379882635e+00
136 KSP Residual norm 8.688648621431e+00
137 KSP Residual norm 8.766604393781e+00
138 KSP Residual norm 8.961220512489e+00
139 KSP Residual norm 9.038789268757e+00
140 KSP Residual norm 9.255097048034e+00
141 KSP Residual norm 9.457532840426e+00
142 KSP Residual norm 9.353035188344e+00
143 KSP Residual norm 8.972079650141e+00
144 KSP Residual norm 8.990246637705e+00
145 KSP Residual norm 9.133606744913e+00
146 KSP Residual norm 9.284449139694e+00
147 KSP Residual norm 9.446523116163e+00
148 KSP Residual norm 9.392983045581e+00
149 KSP Residual norm 9.190311275931e+00
150 KSP Residual norm 8.637696807809e+00
151 KSP Residual norm 8.246041171334e+00
152 KSP Residual norm 7.974442084343e+00
153 KSP Residual norm 7.819232318105e+00
154 KSP Residual norm 7.908790010611e+00
155 KSP Residual norm 8.281392146382e+00
156 KSP Residual norm 8.711804633156e+00
157 KSP Residual norm 8.972428309154e+00
158 KSP Residual norm 8.821322938720e+00
159 KSP Residual norm 8.694550793978e+00
160 KSP Residual norm 8.497087628681e+00
161 KSP Residual norm 8.342289866176e+00
162 KSP Residual norm 8.323833824628e+00
163 KSP Residual norm 8.340846763041e+00
164 KSP Residual norm 8.938969817866e+00
165 KSP Residual norm 9.072018746931e+00
166 KSP Residual norm 9.382200283204e+00
167 KSP Residual norm 9.618709771467e+00
168 KSP Residual norm 9.816042710750e+00
169 KSP Residual norm 1.006175118406e+01
170 KSP Residual norm 1.013405891235e+01
171 KSP Residual norm 9.945457958847e+00
172 KSP Residual norm 1.006028462918e+01
173 KSP Residual norm 1.001712718542e+01
174 KSP Residual norm 9.950326839565e+00
175 KSP Residual norm 9.870606457184e+00
176 KSP Residual norm 9.505672324164e+00
177 KSP Residual norm 9.422406293510e+00
178 KSP Residual norm 9.180050627762e+00
179 KSP Residual norm 8.686064400557e+00
180 KSP Residual norm 8.568532139747e+00
181 KSP Residual norm 8.734731645402e+00
182 KSP Residual norm 9.018967477404e+00
183 KSP Residual norm 9.460079286079e+00
184 KSP Residual norm 9.448953574953e+00
185 KSP Residual norm 9.685497063794e+00
186 KSP Residual norm 9.869855710508e+00
187 KSP Residual norm 1.003302047960e+01
188 KSP Residual norm 9.564028860536e+00
189 KSP Residual norm 9.013288033632e+00
190 KSP Residual norm 8.750427764456e+00
191 KSP Residual norm 8.903646907458e+00
192 KSP Residual norm 9.285007079918e+00
193 KSP Residual norm 9.424801141906e+00
194 KSP Residual norm 9.291833173642e+00
195 KSP Residual norm 8.991571624860e+00
196 KSP Residual norm 8.694508731874e+00
197 KSP Residual norm 9.031462542355e+00
198 KSP Residual norm 9.496643154125e+00
199 KSP Residual norm 9.284160146520e+00
200 KSP Residual norm 8.742226063537e+00
Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
type: cg
maximum iterations=200, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
type: jacobi
type DIAGONAL
linear system matrix = precond matrix:
Mat Object: 8 MPI processes
type: mpiaijkokkos
rows=12288000, cols=12288000, bs=3
total: nonzeros=982938168, allocated nonzeros=995328000
total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 512000 nodes, limit used is 5
**************************************** ***********************************************************************************************************************
*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
****************************************************************************************************************************************************************
------------------------------------------------------------------ PETSc Performance Summary: -------------------------------------------------------------------
/gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/tutorials/data/../ex56 on a arch-olcf-crusher named crusher002 with 8 processors, by adams Tue Jan 25 08:15:31 2022
Using Petsc Development GIT revision: v3.16.3-684-g003dbea9e0 GIT Date: 2022-01-24 12:23:30 -0600
Max Max/Min Avg Total
Time (sec): 7.811e+00 1.000 7.811e+00
Objects: 1.900e+01 1.000 1.900e+01
Flop: 5.331e+10 1.000 5.331e+10 4.265e+11
Flop/sec: 6.825e+09 1.000 6.825e+09 5.460e+10
MPI Messages: 1.432e+03 1.005 1.426e+03 1.141e+04
MPI Message Lengths: 1.187e+08 1.002 8.310e+04 9.480e+08
MPI Reductions: 6.450e+02 1.000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flop
and VecAXPY() for complex vectors of length N --> 8N flop
Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total Count %Total Avg %Total Count %Total
0: Main Stage: 6.8604e+00 87.8% 1.3230e+09 0.3% 9.500e+01 0.8% 2.101e+06 21.1% 1.800e+01 2.8%
1: Setup: 6.2347e-03 0.1% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 2.000e+00 0.3%
2: Solve: 9.4447e-01 12.1% 4.2516e+11 99.7% 1.131e+04 99.2% 6.616e+04 78.9% 6.060e+02 94.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flop: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
AvgLen: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flop in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
CpuToGpu Count: total number of CPU to GPU copies per processor
CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
GpuToCpu Count: total number of GPU to CPU copies per processor
GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU
Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
BuildTwoSided 4 1.0 8.7896e-01534.8 0.00e+00 0.0 3.8e+01 8.0e+00 4.0e+00 8 0 0 0 1 9 0 40 0 22 0 0 0 0.00e+00 0 0.00e+00 0
BuildTwoSidedF 4 1.0 8.7908e-01509.8 0.00e+00 0.0 9.5e+01 2.1e+06 4.0e+00 8 0 1 21 1 9 0100100 22 0 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyBegin 2 1.0 8.7524e-01 4.1 0.00e+00 0.0 3.8e+01 5.2e+06 2.0e+00 10 0 0 21 0 11 0 40 99 11 0 0 0 0.00e+00 0 0.00e+00 0
MatAssemblyEnd 2 1.0 3.5833e-01 1.0 1.55e+06 0.0 0.0e+00 0.0e+00 4.0e+00 5 0 0 0 1 5 0 0 0 22 17 0 0 0.00e+00 0 0.00e+00 0
VecSet 1 1.0 3.8404e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAssemblyBegin 2 1.0 4.1764e-03 1.1 0.00e+00 0.0 5.7e+01 3.8e+04 2.0e+00 0 0 0 0 0 0 0 60 1 11 0 0 0 0.00e+00 0 0.00e+00 0
VecAssemblyEnd 2 1.0 5.2523e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetGraph 1 1.0 4.5518e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
--- Event Stage 1: Setup
KSPSetUp 1 1.0 7.0851e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 99 0 0 0100 0 0 0 0.00e+00 0 0.00e+00 0
PCSetUp 1 1.0 6.2920e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
--- Event Stage 2: Solve
BuildTwoSided 1 1.0 9.1706e-05 1.6 0.00e+00 0.0 5.6e+01 4.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
MatMult 200 1.0 6.7831e-01 1.0 4.91e+10 1.0 1.1e+04 6.6e+04 1.0e+00 9 92 99 79 0 71 92100100 0 579635 1014212 1 2.04e-04 0 0.00e+00 100
MatView 1 1.0 7.8531e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
KSPSolve 1 1.0 9.4550e-01 1.0 5.31e+10 1.0 1.1e+04 6.6e+04 6.0e+02 12100 99 79 94 100100100100100 449667 893741 1 2.04e-04 0 0.00e+00 100
PCApply 201 1.0 1.6966e-01 1.0 3.09e+08 1.0 0.0e+00 0.0e+00 2.0e+00 2 1 0 0 0 18 1 0 0 0 14558 163941 0 0.00e+00 0 0.00e+00 100
VecTDot 401 1.0 5.3642e-02 1.3 1.23e+09 1.0 0.0e+00 0.0e+00 4.0e+02 1 2 0 0 62 5 2 0 0 66 183716 353914 0 0.00e+00 0 0.00e+00 100
VecNorm 201 1.0 2.2219e-02 1.1 6.17e+08 1.0 0.0e+00 0.0e+00 2.0e+02 0 1 0 0 31 2 1 0 0 33 222325 303155 0 0.00e+00 0 0.00e+00 100
VecCopy 2 1.0 2.3551e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecSet 1 1.0 9.8740e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
VecAXPY 400 1.0 2.3017e-02 1.1 1.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 2 2 0 0 0 427091 514744 0 0.00e+00 0 0.00e+00 100
VecAYPX 199 1.0 1.1312e-02 1.1 6.11e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 1 0 0 0 432323 532889 0 0.00e+00 0 0.00e+00 100
VecPointwiseMult 201 1.0 1.0471e-02 1.1 3.09e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 1 0 0 0 235882 290088 0 0.00e+00 0 0.00e+00 100
VecScatterBegin 200 1.0 1.8458e-01 1.1 0.00e+00 0.0 1.1e+04 6.6e+04 1.0e+00 2 0 99 79 0 19 0100100 0 0 0 1 2.04e-04 0 0.00e+00 0
VecScatterEnd 200 1.0 1.9007e-02 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFSetUp 1 1.0 1.3015e-03 1.3 0.00e+00 0.0 1.1e+02 1.7e+04 1.0e+00 0 0 1 0 0 0 0 1 0 0 0 0 0 0.00e+00 0 0.00e+00 0
SFPack 200 1.0 1.7309e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 18 0 0 0 0 0 0 1 2.04e-04 0 0.00e+00 0
SFUnpack 200 1.0 2.3165e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
--- Event Stage 3: Unknown
--- Event Stage 4: Unknown
--- Event Stage 5: Unknown
--- Event Stage 6: Unknown
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Krylov Solver 1 1 8096 0.
Matrix 3 3 1554675244 0.
Preconditioner 1 1 872 0.
Viewer 2 1 840 0.
Vector 4 8 74208728 0.
Index Set 2 2 235076 0.
Star Forest Graph 1 1 1200 0.
--- Event Stage 1: Setup
Vector 4 1 12289784 0.
--- Event Stage 2: Solve
Vector 1 0 0 0.
--- Event Stage 3: Unknown
--- Event Stage 4: Unknown
--- Event Stage 5: Unknown
--- Event Stage 6: Unknown
========================================================================================================================
Average time to get PetscTime(): 3.51e-08
Average time for MPI_Barrier(): 2.7172e-06
Average time for zero size MPI_Send(): 8.326e-06
#PETSc Option Table entries:
-alpha 1.e-3
-ksp_converged_reason
-ksp_max_it 200
-ksp_monitor
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mat_type aijkokkos
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_max_it 2
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-ne 159
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 1
-pc_gamg_threshold -0.01
-pc_type jacobi
-use_gpu_aware_mpi true
-use_mat_nearnullspace false
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 --download-p4est=1 --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4 PETSC_ARCH=arch-olcf-crusher
-----------------------------------------
Libraries compiled on 2022-01-25 12:50:33 on login2
Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher
-----------------------------------------
Using C compiler: cc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O
Using Fortran compiler: ftn -fPIC -g
-----------------------------------------
Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/include -I/opt/rocm-4.5.0/include
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/lib -L/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lp4est -lsc -lz -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa
-----------------------------------------
More information about the petsc-dev
mailing list