<div dir="ltr">
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue"">Hi, I’m Rohan, a student working on compilation techniques for distributed tensor computations. I’m looking at using PETSc as a baseline for experiments I’m running, and want to understand if I’m using PETSc as it was intended to achieve high performance, and if the performance I’m seeing is expected. Currently, I’m just looking at SpMV operations.</p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";min-height:14px"><br></p>
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue"">My experiments are run on the Lassen Supercomputer (<a href="https://hpc.llnl.gov/hardware/platforms/lassen"><span class="gmail-s1" style="color:rgb(220,161,13)">https://hpc.llnl.gov/hardware/platforms/lassen</span></a>). The system has 40 CPUs, 4 V100s and an Infiniband interconnect. A visualization of the architecture is here: <a href="https://hpc.llnl.gov/sites/default/files/power9-AC922systemDiagram2_1.png"><span class="gmail-s1" style="color:rgb(220,161,13)" title="">https://hpc.llnl.gov/sites/default/files/power9-AC922systemDiagram2_1.png</span></a>.</p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";min-height:14px"><br></p>
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue"">As of now, I’m trying to understand the single-node performance of PETSc, as the scaling performance onto multiple nodes appears to be as I expect. I’m using the arabic-2005 sparse matrix from the SuiteSparse matrix collection, detailed here: <a href="https://sparse.tamu.edu/LAW/arabic-2005"><span class="gmail-s1" style="color:rgb(220,161,13)">https://sparse.tamu.edu/LAW/arabic-2005</span></a>. As a trusted baseline, I am comparing against SpMV code generated by the TACO compiler (<a href="http://tensor-compiler.org/codegen.html?expr=y(i)%20=%20A(i,j)%20*%20x(j)&format=y:d:0;A:ds:0,1;x:d:0&sched=split:i:i0:i1:32;reorder:i0:i1:j;parallelize:i0:CPU%20Thread:No%20Races)"><span class="gmail-s1" style="color:rgb(220,161,13)">http://tensor-compiler.org/codegen.html?expr=y(i)%20=%20A(i,j)%20*%20x(j)&format=y:d:0;A:ds:0,1;x:d:0&sched=split:i:i0:i1:32;reorder:i0:i1:j;parallelize:i0:CPU%20Thread:No%20Races)</span></a>.</p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";min-height:14px"><br></p>
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue"">My experiments find that PETSc is roughly 4 times slower on a single thread and node than the kernel generated by TACO:</p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";min-height:14px"><br></p>
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue"">PETSc: 1 Thread: 5694.72 ms, 1 Node 40 threads: 262.6 ms.</p>
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue"">TACO: 1 Thread: 1341 ms, 1 Node 40 threads: 86 ms.</p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";min-height:14px"><br></p>
<p class="gmail-p3" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";color:rgb(220,161,13)"><span class="gmail-s2" style="color:rgb(0,0,0)">My code using PETSc is here: <a href="https://github.com/rohany/taco/blob/9e0e30b16bfba5319b15b2d1392f35376952f838/petsc/benchmark.cpp#L38">https://github.com/rohany/taco/blob/9e0e30b16bfba5319b15b2d1392f35376952f838/petsc/benchmark.cpp#L38</a></span>.</p>
<p class="gmail-p4" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";color:rgb(220,161,13);min-height:14px"><br></p>
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue"">Runs from 1 thread and 1 node with -log_view are attached to the email. The command lines for each were as follows:</p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";min-height:14px"><br></p>
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue"">1 node 1 thread: `jsrun -n 1 -c 1 -r 1 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix $TENSOR_DIR/arabic-2005.petsc -log_view`</p>
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue"">1 node 40 threads: `jsrun -n 40 -c 1 -r 40 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix $TENSOR_DIR/arabic-2005.petsc -log_view`</p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";min-height:14px"><br></p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";min-height:14px"><br></p>
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue"">In addition to these benchmarking concerns, I wanted to share my experiences trying to load data from Matrix Market files into PETSc, which ended up 1being much more difficult than I anticipated. Essentially, trying to iterate through the Matrix Market files and using `write` to insert entries into a `Mat` was extremely slow. In order to get reasonable performance, I had to use an external utility to basically construct a CSR matrix, and then pass the arrays from the CSR Matrix into `MatCreateSeqAIJWithArrays`. I couldn’t find any more guidance on PETSc forums or Google, so I wanted to know if this was the right way to go.</p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";min-height:14px"><br></p>
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue"">Thanks,</p>
<p class="gmail-p2" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue";min-height:14px"><br></p>
<p class="gmail-p1" style="margin:0px;font-variant-numeric:normal;font-variant-east-asian:normal;font-stretch:normal;font-size:12px;line-height:normal;font-family:"Helvetica Neue"">Rohan Yadav</p></div>