<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Jan 22, 2022 at 10:25 AM Jed Brown <<a href="mailto:jed@jedbrown.org">jed@jedbrown.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Mark Adams <<a href="mailto:mfadams@lbl.gov" target="_blank">mfadams@lbl.gov</a>> writes:<br>
<br>
> On Fri, Jan 21, 2022 at 9:55 PM Barry Smith <<a href="mailto:bsmith@petsc.dev" target="_blank">bsmith@petsc.dev</a>> wrote:<br>
><br>
>><br>
>> Interesting, Is this with all native Kokkos kernels or do some kokkos<br>
>> kernels use rocm?<br>
>><br>
><br>
> Ah, good question. I often run with tpl=0 but I did not specify here on<br>
> Crusher. In looking at the log files I see<br>
> -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/externalpackages/git.kokkos-kernels/src/impl/tpls<br>
><br>
> Here is a run with tpls turned off. These tpl includes are gone.<br>
><br>
> It looks pretty much the same. A little slower but that could be noise.<br>
<br>
> ************************************************************************************************************************<br>
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***<br>
> ************************************************************************************************************************<br>
<br>
We gotta say 160 chars because that's what we use now.<br>
<br></blockquote><div><br></div><div>done</div><div><br></div><div>as far as streams, does it know to run on the GPU? You don't specify something like -G 1 here for GPUs. I think you just get them all.</div><div><br></div><div><br></div><div>11:14 adams/aijkokkos-gpu-logging= crusher:/gpfs/alpine/csc314/scratch/adams/petsc$ make PETSC_DIR=/gpfs/alpine/csc314/scratch/adams/petsc PETSC_ARCH=arch-olcf-crusher streams<br>cc -o MPIVersion.o -c -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O3 -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/opt/rocm-4.5.0/include `pwd`/MPIVersion.c<br>Running streams with '/usr/bin/srun -p batch -N 1 -A csc314_crusher -t 00:10:00 ' using 'NPMAX=128'<br>1 53355.9207 Rate (MB/s)<br>2 39565.2208 Rate (MB/s) 0.741534<br>3 34538.3431 Rate (MB/s) 0.64732<br>4 32469.3375 Rate (MB/s) 0.608543<br>5 31041.1569 Rate (MB/s) 0.581776<br>6 30113.3826 Rate (MB/s) 0.564387<br>7 29562.5285 Rate (MB/s) 0.554063<br>8 29228.8090 Rate (MB/s) 0.547808<br>9 31474.3616 Rate (MB/s) 0.589895<br>10 31306.7647 Rate (MB/s) 0.586754<br>11 31147.4674 Rate (MB/s) 0.583768<br>12 31006.5008 Rate (MB/s) 0.581126<br>13 30859.4559 Rate (MB/s) 0.57837<br>14 30796.0587 Rate (MB/s) 0.577182<br>15 30604.4849 Rate (MB/s) 0.573591<br>16 30565.4340 Rate (MB/s) 0.572859<br>17 32421.9349 Rate (MB/s) 0.607654<br>18 34365.3424 Rate (MB/s) 0.644078<br>19 36289.4518 Rate (MB/s) 0.680139<br>20 38194.5300 Rate (MB/s) 0.715845<br>21 40160.4660 Rate (MB/s) 0.75269<br>22 42062.3931 Rate (MB/s) 0.788336<br>23 43890.2036 Rate (MB/s) 0.822593<br>24 45775.4680 Rate (MB/s) 0.857927<br>25 47708.8770 Rate (MB/s) 0.894163<br>26 49559.6810 Rate (MB/s) 0.928851<br>27 51457.5537 Rate (MB/s) 0.964421<br>28 53528.3420 Rate (MB/s) 1.00323<br></div><div> </div></div></div>