[petsc-dev] Kokkos/Crusher perforance

Mark Adams mfadams at lbl.gov
Tue Jan 25 07:29:48 CST 2022


adding Suyash,

I found the/a problem. Using ex56, which has a crappy decomposition, using
one MPI process/GPU is much faster than using 8 (64 total). (I am looking
at ex13 to see how much of this is due to the decomposition)
If you only use 8 processes it seems that all 8 are put on the first GPU,
but adding -c8 seems to fix this.
Now the numbers are looking reasonable.

On Mon, Jan 24, 2022 at 3:24 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>   For this, to start, someone can run
>
> src/vec/vec/tutorials/performance.c
>
> and compare the performance to that in the technical report Evaluation of
> PETSc on a Heterogeneous Architecture \\ the OLCF Summit System \\ Part I:
> Vector Node Performance. Google to find. One does not have to and shouldn't
> do an extensive study right now that compares everything, instead one
> should run a very small number of different size problems (make them big)
> and compare those sizes with what Summit gives. Note you will need to make
> sure that performance.c uses the Kokkos backend.
>
>   One hopes for better performance than Summit; if one gets tons worse we
> know something is very wrong somewhere. I'd love to see some comparisons.
>
>   Barry
>
>
> On Jan 24, 2022, at 3:06 PM, Justin Chang <jychang48 at gmail.com> wrote:
>
> Also, do you guys have an OLCF liaison? That's actually your better bet if
> you do.
>
> Performance issues with ROCm/Kokkos are pretty common in apps besides just
> PETSc. We have several teams actively working on rectifying this. However,
> I think performance issues can be quicker to identify if we had a more
> "official" and reproducible PETSc GPU benchmark, which I've already
> expressed to some folks in this thread, and as others already commented on
> the difficulty of such a task. Hopefully I will have more time soon to
> illustrate what I am thinking.
>
> On Mon, Jan 24, 2022 at 1:57 PM Justin Chang <jychang48 at gmail.com> wrote:
>
>> My name has been called.
>>
>> Mark, if you're having issues with Crusher, please contact Veronica
>> Vergara (vergaravg at ornl.gov). You can cc me (justin.chang at amd.com) in
>> those emails
>>
>> On Mon, Jan 24, 2022 at 1:49 PM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>
>>> On Jan 24, 2022, at 2:46 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> Yea, CG/Jacobi is as close to a benchmark code as we could want. I could
>>> run this on one processor to get cleaner numbers.
>>>
>>> Is there a designated ECP technical support contact?
>>>
>>>
>>>    Mark, you've forgotten you work for DOE. There isn't a non-ECP
>>> technical support contact.
>>>
>>>    But if this is an AMD machine then maybe contact Matt's student
>>> Justin Chang?
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jan 24, 2022 at 2:18 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>>
>>>>   I think you should contact the crusher ECP technical support team and
>>>> tell them you are getting dismel performance and ask if you should expect
>>>> better. Don't waste time flogging a dead horse.
>>>>
>>>> On Jan 24, 2022, at 2:16 PM, Matthew Knepley <knepley at gmail.com> wrote:
>>>>
>>>> On Mon, Jan 24, 2022 at 2:11 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Jan 24, 2022 at 12:55 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang <
>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>
>>>>>>> Mark, I think you can benchmark individual vector operations, and
>>>>>>> once we get reasonable profiling results, we can move to solvers etc.
>>>>>>>
>>>>>>
>>>>>> Can you suggest a code to run or are you suggesting making a vector
>>>>>> benchmark code?
>>>>>>
>>>>> Make a vector benchmark code, testing vector operations that would be
>>>>> used in your solver.
>>>>> Also, we can run MatMult() to see if the profiling result is
>>>>> reasonable.
>>>>> Only once we get some solid results on basic operations, it is useful
>>>>> to run big codes.
>>>>>
>>>>
>>>> So we have to make another throw-away code? Why not just look at the
>>>> vector ops in Mark's actual code?
>>>>
>>>>    Matt
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> --Junchao Zhang
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 24, 2022 at 12:09 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 24, 2022 at 12:44 PM Barry Smith <bsmith at petsc.dev>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>   Here except for VecNorm the GPU is used effectively in that most
>>>>>>>>> of the time is time is spent doing real work on the GPU
>>>>>>>>>
>>>>>>>>> VecNorm              402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00
>>>>>>>>> 0.0e+00 4.0e+02  0  1  0  0 20   9  1  0  0 33 30230   225393      0
>>>>>>>>> 0.00e+00    0 0.00e+00 100
>>>>>>>>>
>>>>>>>>> Even the dots are very effective, only the VecNorm flop rate over
>>>>>>>>> the full time is much much lower than the vecdot. Which is somehow due to
>>>>>>>>> the use of the GPU or CPU MPI in the allreduce?
>>>>>>>>>
>>>>>>>>
>>>>>>>> The VecNorm GPU rate is relatively high on Crusher and the CPU rate
>>>>>>>> is about the same as the other vec ops. I don't know what to make of that.
>>>>>>>>
>>>>>>>> But Crusher is clearly not crushing it.
>>>>>>>>
>>>>>>>> Junchao: Perhaps we should ask Kokkos if they have any experience
>>>>>>>> with Crusher that they can share. They could very well find some low level
>>>>>>>> magic.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Mark, can we compare with Spock?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  Looks much better. This puts two processes/GPU because there are
>>>>>>>>> only 4.
>>>>>>>>> <jac_out_001_kokkos_Spock_6_1_notpl.txt>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220125/429aba48/attachment-0001.html>
-------------- next part --------------
  0 KSP Residual norm 1.410853326294e+00 
  1 KSP Residual norm 3.308114929726e+00 
  2 KSP Residual norm 5.268571496560e+00 
  3 KSP Residual norm 6.149538104592e+00 
  4 KSP Residual norm 5.850118312153e+00 
  5 KSP Residual norm 6.691885163871e+00 
  6 KSP Residual norm 6.804517562756e+00 
  7 KSP Residual norm 7.197858569937e+00 
  8 KSP Residual norm 7.822478314857e+00 
  9 KSP Residual norm 8.202105022638e+00 
 10 KSP Residual norm 8.939894492312e+00 
 11 KSP Residual norm 9.429993430012e+00 
 12 KSP Residual norm 9.605492804767e+00 
 13 KSP Residual norm 9.640964280678e+00 
 14 KSP Residual norm 9.298652856327e+00 
 15 KSP Residual norm 8.688325517281e+00 
 16 KSP Residual norm 8.103300011658e+00 
 17 KSP Residual norm 7.657056535579e+00 
 18 KSP Residual norm 7.274740905565e+00 
 19 KSP Residual norm 6.989367099698e+00 
 20 KSP Residual norm 6.693292777717e+00 
 21 KSP Residual norm 6.264239515746e+00 
 22 KSP Residual norm 5.946230942315e+00 
 23 KSP Residual norm 5.461074143939e+00 
 24 KSP Residual norm 5.000139937199e+00 
 25 KSP Residual norm 4.690147106850e+00 
 26 KSP Residual norm 4.340975483114e+00 
 27 KSP Residual norm 4.216407821646e+00 
 28 KSP Residual norm 4.075379410030e+00 
 29 KSP Residual norm 4.093724077948e+00 
 30 KSP Residual norm 3.972717435085e+00 
 31 KSP Residual norm 3.757728119779e+00 
 32 KSP Residual norm 3.540607563741e+00 
 33 KSP Residual norm 3.431062851880e+00 
 34 KSP Residual norm 3.450360009855e+00 
 35 KSP Residual norm 3.593502735404e+00 
 36 KSP Residual norm 3.780832581840e+00 
 37 KSP Residual norm 3.905447434318e+00 
 38 KSP Residual norm 3.984131419229e+00 
 39 KSP Residual norm 3.945938933976e+00 
 40 KSP Residual norm 3.553422818113e+00 
 41 KSP Residual norm 2.938844893302e+00 
 42 KSP Residual norm 2.809545432521e+00 
 43 KSP Residual norm 2.953724603153e+00 
 44 KSP Residual norm 2.944856948692e+00 
 45 KSP Residual norm 2.714548772425e+00 
 46 KSP Residual norm 2.757853041702e+00 
 47 KSP Residual norm 2.802728332990e+00 
 48 KSP Residual norm 2.733707284580e+00 
 49 KSP Residual norm 2.795310289754e+00 
 50 KSP Residual norm 2.885286206575e+00 
 51 KSP Residual norm 2.840587445960e+00 
 52 KSP Residual norm 2.986739512809e+00 
 53 KSP Residual norm 3.038967844916e+00 
 54 KSP Residual norm 3.120224614592e+00 
 55 KSP Residual norm 3.252584908500e+00 
 56 KSP Residual norm 3.329078354051e+00 
 57 KSP Residual norm 3.493538794345e+00 
 58 KSP Residual norm 3.693624595560e+00 
 59 KSP Residual norm 3.946156830176e+00 
 60 KSP Residual norm 4.372813538537e+00 
 61 KSP Residual norm 4.793425118505e+00 
 62 KSP Residual norm 5.506707673470e+00 
 63 KSP Residual norm 6.150469745023e+00 
 64 KSP Residual norm 7.009152654362e+00 
 65 KSP Residual norm 8.253999190110e+00 
 66 KSP Residual norm 9.773686873303e+00 
 67 KSP Residual norm 1.174201878873e+01 
 68 KSP Residual norm 1.396810766198e+01 
 69 KSP Residual norm 1.531938038251e+01 
 70 KSP Residual norm 1.513815060009e+01 
 71 KSP Residual norm 1.351504569209e+01 
 72 KSP Residual norm 1.189818271063e+01 
 73 KSP Residual norm 1.055982729886e+01 
 74 KSP Residual norm 9.291111182468e+00 
 75 KSP Residual norm 8.994372539499e+00 
 76 KSP Residual norm 9.974014612561e+00 
 77 KSP Residual norm 1.127854042048e+01 
 78 KSP Residual norm 1.252496528261e+01 
 79 KSP Residual norm 1.418696243993e+01 
 80 KSP Residual norm 1.532377955119e+01 
 81 KSP Residual norm 1.370656960788e+01 
 82 KSP Residual norm 1.180429013782e+01 
 83 KSP Residual norm 1.003617095145e+01 
 84 KSP Residual norm 8.394450117817e+00 
 85 KSP Residual norm 6.899686914524e+00 
 86 KSP Residual norm 6.179350449619e+00 
 87 KSP Residual norm 5.565154073979e+00 
 88 KSP Residual norm 5.150487367510e+00 
 89 KSP Residual norm 4.999864016175e+00 
 90 KSP Residual norm 4.869910941255e+00 
 91 KSP Residual norm 4.744777237912e+00 
 92 KSP Residual norm 4.753059736768e+00 
 93 KSP Residual norm 4.746021509746e+00 
 94 KSP Residual norm 4.676154678970e+00 
 95 KSP Residual norm 4.667939895068e+00 
 96 KSP Residual norm 4.982168193998e+00 
 97 KSP Residual norm 5.376230525346e+00 
 98 KSP Residual norm 6.027223402693e+00 
 99 KSP Residual norm 6.688770388651e+00 
100 KSP Residual norm 7.685272624683e+00 
101 KSP Residual norm 8.540315337448e+00 
102 KSP Residual norm 9.039414712941e+00 
103 KSP Residual norm 9.412267211525e+00 
104 KSP Residual norm 9.404393063521e+00 
105 KSP Residual norm 9.809809633962e+00 
106 KSP Residual norm 1.019997954431e+01 
107 KSP Residual norm 1.032798037382e+01 
108 KSP Residual norm 1.018368040001e+01 
109 KSP Residual norm 9.032578302284e+00 
110 KSP Residual norm 7.511728677100e+00 
111 KSP Residual norm 6.320399999215e+00 
112 KSP Residual norm 5.638446159168e+00 
113 KSP Residual norm 5.503768021011e+00 
114 KSP Residual norm 5.781512507352e+00 
115 KSP Residual norm 6.668193746580e+00 
116 KSP Residual norm 8.289840511454e+00 
117 KSP Residual norm 9.602543908825e+00 
118 KSP Residual norm 9.885225641874e+00 
119 KSP Residual norm 9.475771653754e+00 
120 KSP Residual norm 9.253307705621e+00 
121 KSP Residual norm 9.188703825743e+00 
122 KSP Residual norm 8.982425406803e+00 
123 KSP Residual norm 9.029965071148e+00 
124 KSP Residual norm 8.936472797372e+00 
125 KSP Residual norm 8.847701213231e+00 
126 KSP Residual norm 8.850219067523e+00 
127 KSP Residual norm 8.883966846716e+00 
128 KSP Residual norm 8.822082961919e+00 
129 KSP Residual norm 9.144573911170e+00 
130 KSP Residual norm 9.210998384025e+00 
131 KSP Residual norm 8.767074129481e+00 
132 KSP Residual norm 8.653932024226e+00 
133 KSP Residual norm 8.738817183375e+00 
134 KSP Residual norm 8.847719520860e+00 
135 KSP Residual norm 8.823379882635e+00 
136 KSP Residual norm 8.688648621431e+00 
137 KSP Residual norm 8.766604393781e+00 
138 KSP Residual norm 8.961220512489e+00 
139 KSP Residual norm 9.038789268757e+00 
140 KSP Residual norm 9.255097048034e+00 
141 KSP Residual norm 9.457532840426e+00 
142 KSP Residual norm 9.353035188344e+00 
143 KSP Residual norm 8.972079650141e+00 
144 KSP Residual norm 8.990246637705e+00 
145 KSP Residual norm 9.133606744913e+00 
146 KSP Residual norm 9.284449139694e+00 
147 KSP Residual norm 9.446523116163e+00 
148 KSP Residual norm 9.392983045581e+00 
149 KSP Residual norm 9.190311275931e+00 
150 KSP Residual norm 8.637696807809e+00 
151 KSP Residual norm 8.246041171334e+00 
152 KSP Residual norm 7.974442084343e+00 
153 KSP Residual norm 7.819232318105e+00 
154 KSP Residual norm 7.908790010611e+00 
155 KSP Residual norm 8.281392146382e+00 
156 KSP Residual norm 8.711804633156e+00 
157 KSP Residual norm 8.972428309154e+00 
158 KSP Residual norm 8.821322938720e+00 
159 KSP Residual norm 8.694550793978e+00 
160 KSP Residual norm 8.497087628681e+00 
161 KSP Residual norm 8.342289866176e+00 
162 KSP Residual norm 8.323833824628e+00 
163 KSP Residual norm 8.340846763041e+00 
164 KSP Residual norm 8.938969817866e+00 
165 KSP Residual norm 9.072018746931e+00 
166 KSP Residual norm 9.382200283204e+00 
167 KSP Residual norm 9.618709771467e+00 
168 KSP Residual norm 9.816042710750e+00 
169 KSP Residual norm 1.006175118406e+01 
170 KSP Residual norm 1.013405891235e+01 
171 KSP Residual norm 9.945457958847e+00 
172 KSP Residual norm 1.006028462918e+01 
173 KSP Residual norm 1.001712718542e+01 
174 KSP Residual norm 9.950326839565e+00 
175 KSP Residual norm 9.870606457184e+00 
176 KSP Residual norm 9.505672324164e+00 
177 KSP Residual norm 9.422406293510e+00 
178 KSP Residual norm 9.180050627762e+00 
179 KSP Residual norm 8.686064400557e+00 
180 KSP Residual norm 8.568532139747e+00 
181 KSP Residual norm 8.734731645402e+00 
182 KSP Residual norm 9.018967477404e+00 
183 KSP Residual norm 9.460079286079e+00 
184 KSP Residual norm 9.448953574953e+00 
185 KSP Residual norm 9.685497063794e+00 
186 KSP Residual norm 9.869855710508e+00 
187 KSP Residual norm 1.003302047960e+01 
188 KSP Residual norm 9.564028860536e+00 
189 KSP Residual norm 9.013288033632e+00 
190 KSP Residual norm 8.750427764456e+00 
191 KSP Residual norm 8.903646907458e+00 
192 KSP Residual norm 9.285007079918e+00 
193 KSP Residual norm 9.424801141906e+00 
194 KSP Residual norm 9.291833173642e+00 
195 KSP Residual norm 8.991571624860e+00 
196 KSP Residual norm 8.694508731874e+00 
197 KSP Residual norm 9.031462542355e+00 
198 KSP Residual norm 9.496643154125e+00 
199 KSP Residual norm 9.284160146520e+00 
200 KSP Residual norm 8.742226063537e+00 
Linear solve did not converge due to DIVERGED_ITS iterations 200
KSP Object: 8 MPI processes
  type: cg
  maximum iterations=200, initial guess is zero
  tolerances:  relative=1e-12, absolute=1e-50, divergence=10000.
  left preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object: 8 MPI processes
  type: jacobi
    type DIAGONAL
  linear system matrix = precond matrix:
  Mat Object: 8 MPI processes
    type: mpiaijkokkos
    rows=12288000, cols=12288000, bs=3
    total: nonzeros=982938168, allocated nonzeros=995328000
    total number of mallocs used during MatSetValues calls=0
      using I-node (on process 0) routines: found 512000 nodes, limit used is 5
**************************************** ***********************************************************************************************************************
***                                WIDEN YOUR WINDOW TO 160 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document                                 ***
****************************************************************************************************************************************************************

------------------------------------------------------------------ PETSc Performance Summary: -------------------------------------------------------------------

/gpfs/alpine/csc314/scratch/adams/petsc/src/ksp/ksp/tutorials/data/../ex56 on a arch-olcf-crusher named crusher002 with 8 processors, by adams Tue Jan 25 08:15:31 2022
Using Petsc Development GIT revision: v3.16.3-684-g003dbea9e0  GIT Date: 2022-01-24 12:23:30 -0600

                         Max       Max/Min     Avg       Total
Time (sec):           7.811e+00     1.000   7.811e+00
Objects:              1.900e+01     1.000   1.900e+01
Flop:                 5.331e+10     1.000   5.331e+10  4.265e+11
Flop/sec:             6.825e+09     1.000   6.825e+09  5.460e+10
MPI Messages:         1.432e+03     1.005   1.426e+03  1.141e+04
MPI Message Lengths:  1.187e+08     1.002   8.310e+04  9.480e+08
MPI Reductions:       6.450e+02     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 6.8604e+00  87.8%  1.3230e+09   0.3%  9.500e+01   0.8%  2.101e+06       21.1%  1.800e+01   2.8%
 1:           Setup: 6.2347e-03   0.1%  0.0000e+00   0.0%  0.000e+00   0.0%  0.000e+00        0.0%  2.000e+00   0.3%
 2:           Solve: 9.4447e-01  12.1%  4.2516e+11  99.7%  1.131e+04  99.2%  6.616e+04       78.9%  6.060e+02  94.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   - GpuToCpu - GPU
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          4 1.0 8.7896e-01534.8 0.00e+00 0.0 3.8e+01 8.0e+00 4.0e+00  8  0  0  0  1   9  0 40  0 22     0       0      0 0.00e+00    0 0.00e+00  0
BuildTwoSidedF         4 1.0 8.7908e-01509.8 0.00e+00 0.0 9.5e+01 2.1e+06 4.0e+00  8  0  1 21  1   9  0100100 22     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyBegin       2 1.0 8.7524e-01 4.1 0.00e+00 0.0 3.8e+01 5.2e+06 2.0e+00 10  0  0 21  0  11  0 40 99 11     0       0      0 0.00e+00    0 0.00e+00  0
MatAssemblyEnd         2 1.0 3.5833e-01 1.0 1.55e+06 0.0 0.0e+00 0.0e+00 4.0e+00  5  0  0  0  1   5  0  0  0 22    17       0      0 0.00e+00    0 0.00e+00  0
VecSet                 1 1.0 3.8404e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAssemblyBegin       2 1.0 4.1764e-03 1.1 0.00e+00 0.0 5.7e+01 3.8e+04 2.0e+00  0  0  0  0  0   0  0 60  1 11     0       0      0 0.00e+00    0 0.00e+00  0
VecAssemblyEnd         2 1.0 5.2523e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFSetGraph             1 1.0 4.5518e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0

--- Event Stage 1: Setup

KSPSetUp               1 1.0 7.0851e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0  99  0  0  0100     0       0      0 0.00e+00    0 0.00e+00  0
PCSetUp                1 1.0 6.2920e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0

--- Event Stage 2: Solve

BuildTwoSided          1 1.0 9.1706e-05 1.6 0.00e+00 0.0 5.6e+01 4.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
MatMult              200 1.0 6.7831e-01 1.0 4.91e+10 1.0 1.1e+04 6.6e+04 1.0e+00  9 92 99 79  0  71 92100100  0 579635   1014212      1 2.04e-04    0 0.00e+00 100
MatView                1 1.0 7.8531e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
KSPSolve               1 1.0 9.4550e-01 1.0 5.31e+10 1.0 1.1e+04 6.6e+04 6.0e+02 12100 99 79 94 100100100100100 449667   893741      1 2.04e-04    0 0.00e+00 100
PCApply              201 1.0 1.6966e-01 1.0 3.09e+08 1.0 0.0e+00 0.0e+00 2.0e+00  2  1  0  0  0  18  1  0  0  0 14558   163941      0 0.00e+00    0 0.00e+00 100
VecTDot              401 1.0 5.3642e-02 1.3 1.23e+09 1.0 0.0e+00 0.0e+00 4.0e+02  1  2  0  0 62   5  2  0  0 66 183716   353914      0 0.00e+00    0 0.00e+00 100
VecNorm              201 1.0 2.2219e-02 1.1 6.17e+08 1.0 0.0e+00 0.0e+00 2.0e+02  0  1  0  0 31   2  1  0  0 33 222325   303155      0 0.00e+00    0 0.00e+00 100
VecCopy                2 1.0 2.3551e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecSet                 1 1.0 9.8740e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
VecAXPY              400 1.0 2.3017e-02 1.1 1.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   2  2  0  0  0 427091   514744      0 0.00e+00    0 0.00e+00 100
VecAYPX              199 1.0 1.1312e-02 1.1 6.11e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   1  1  0  0  0 432323   532889      0 0.00e+00    0 0.00e+00 100
VecPointwiseMult     201 1.0 1.0471e-02 1.1 3.09e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   1  1  0  0  0 235882   290088      0 0.00e+00    0 0.00e+00 100
VecScatterBegin      200 1.0 1.8458e-01 1.1 0.00e+00 0.0 1.1e+04 6.6e+04 1.0e+00  2  0 99 79  0  19  0100100  0     0       0      1 2.04e-04    0 0.00e+00  0
VecScatterEnd        200 1.0 1.9007e-02 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFSetUp                1 1.0 1.3015e-03 1.3 0.00e+00 0.0 1.1e+02 1.7e+04 1.0e+00  0  0  1  0  0   0  0  1  0  0     0       0      0 0.00e+00    0 0.00e+00  0
SFPack               200 1.0 1.7309e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0  18  0  0  0  0     0       0      1 2.04e-04    0 0.00e+00  0
SFUnpack             200 1.0 2.3165e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0

--- Event Stage 3: Unknown


--- Event Stage 4: Unknown


--- Event Stage 5: Unknown


--- Event Stage 6: Unknown

---------------------------------------------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

       Krylov Solver     1              1         8096     0.
              Matrix     3              3   1554675244     0.
      Preconditioner     1              1          872     0.
              Viewer     2              1          840     0.
              Vector     4              8     74208728     0.
           Index Set     2              2       235076     0.
   Star Forest Graph     1              1         1200     0.

--- Event Stage 1: Setup

              Vector     4              1     12289784     0.

--- Event Stage 2: Solve

              Vector     1              0            0     0.

--- Event Stage 3: Unknown


--- Event Stage 4: Unknown


--- Event Stage 5: Unknown


--- Event Stage 6: Unknown

========================================================================================================================
Average time to get PetscTime(): 3.51e-08
Average time for MPI_Barrier(): 2.7172e-06
Average time for zero size MPI_Send(): 8.326e-06
#PETSc Option Table entries:
-alpha 1.e-3
-ksp_converged_reason
-ksp_max_it 200
-ksp_monitor
-ksp_norm_type unpreconditioned
-ksp_rtol 1.e-12
-ksp_type cg
-ksp_view
-log_view
-mat_type aijkokkos
-mg_levels_esteig_ksp_max_it 10
-mg_levels_esteig_ksp_type cg
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_ksp_max_it 2
-mg_levels_ksp_type chebyshev
-mg_levels_pc_type jacobi
-ne 159
-pc_gamg_coarse_eq_limit 100
-pc_gamg_coarse_grid_layout_type compact
-pc_gamg_esteig_ksp_max_it 10
-pc_gamg_esteig_ksp_type cg
-pc_gamg_process_eq_limit 400
-pc_gamg_repartition false
-pc_gamg_reuse_interpolation true
-pc_gamg_square_graph 1
-pc_gamg_threshold -0.01
-pc_type jacobi
-use_gpu_aware_mpi true
-use_mat_nearnullspace false
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --COPTFLAGS="-g -O" --CXXOPTFLAGS="-g -O" --FOPTFLAGS=-g --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 --download-p4est=1 --with-zlib-dir=/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4 PETSC_ARCH=arch-olcf-crusher
-----------------------------------------
Libraries compiled on 2022-01-25 12:50:33 on login2 
Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4
Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc
Using PETSc arch: arch-olcf-crusher
-----------------------------------------

Using C compiler: cc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O   
Using Fortran compiler: ftn  -fPIC -g     
-----------------------------------------

Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/include -I/opt/rocm-4.5.0/include
-----------------------------------------

Using C linker: cc
Using Fortran linker: ftn
Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/lib -L/sw/crusher/spack-envs/base/opt/cray-sles15-zen3/cce-13.0.0/zlib-1.2.11-qx5p4iereg4sjvfi5uwk6jn56o6se2q4/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lp4est -lsc -lz -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa
-----------------------------------------


More information about the petsc-dev mailing list