[petsc-dev] Questions around benchmarking and data loading with PETSc

Fri Dec 10 22:39:12 CST 2021

On Fri, Dec 10, 2021 at 8:05 PM Rohan Yadav <rohany at alumni.cmu.edu> wrote:

> Hi, I’m Rohan, a student working on compilation techniques for distributed
> tensor computations. I’m looking at using PETSc as a baseline for
> experiments I’m running, and want to understand if I’m using PETSc as it
> was intended to achieve high performance, and if the performance I’m seeing
> is expected. Currently, I’m just looking at SpMV operations.
>
>
> My experiments are run on the Lassen Supercomputer (
> https://hpc.llnl.gov/hardware/platforms/lassen). The system has 40 CPUs,
> 4 V100s and an Infiniband interconnect. A visualization of the architecture
> is here:
> https://hpc.llnl.gov/sites/default/files/power9-AC922systemDiagram2_1.png.
>
>
> As of now, I’m trying to understand the single-node performance of PETSc,
> as the scaling performance onto multiple nodes appears to be as I expect.
> I’m using the arabic-2005 sparse matrix from the SuiteSparse matrix
> collection, detailed here: https://sparse.tamu.edu/LAW/arabic-2005. As a
> trusted baseline, I am comparing against SpMV code generated by the TACO
> compiler (
> http://tensor-compiler.org/codegen.html?expr=y(i)%20=%20A(i,j)%20*%20x(j)&format=y:d:0;A:ds:0,1;x:d:0&sched=split:i:i0:i1:32;reorder:i0:i1:j;parallelize:i0:CPU%20Thread:No%20Races)
> .
>
I don't know what "No Races" means, but it seems you'd better also verify
the result of SpMV.

>
> My experiments find that PETSc is roughly 4 times slower on a single
> thread and node than the kernel generated by TACO:
>
>
> PETSc: 1 Thread: 5694.72 ms, 1 Node 40 threads: 262.6 ms.
>
> TACO: 1 Thread: 1341 ms, 1 Node 40 threads: 86 ms.
>
You can think petsc's default CSR spmv is the baseline,  which is done in
~10 lines of code.

>
> My code using PETSc is here:
> https://github.com/rohany/taco/blob/9e0e30b16bfba5319b15b2d1392f35376952f838/petsc/benchmark.cpp#L38
> .
>
>
> Runs from 1 thread and 1 node with -log_view are attached to the email.
> The command lines for each were as follows:
>
>
> 1 node 1 thread: `jsrun -n 1 -c 1 -r 1 -b rs ./bin/benchmark -n 20 -warmup
> 10 -matrix $TENSOR_DIR/arabic-2005.petsc -log_view`
>
> 1 node 40 threads: `jsrun -n 40 -c 1 -r 40 -b rs ./bin/benchmark -n 20
> -warmup 10 -matrix $TENSOR_DIR/arabic-2005.petsc -log_view`
>
>
>
> In addition to these benchmarking concerns, I wanted to share my
> experiences trying to load data from Matrix Market files into PETSc, which
> ended up 1being much more difficult than I anticipated. Essentially, trying
> to iterate through the Matrix Market files and using `write` to insert
> entries into a `Mat` was extremely slow. In order to get reasonable
> performance, I had to use an external utility to basically construct a CSR
> matrix, and then pass the arrays from the CSR Matrix into
> `MatCreateSeqAIJWithArrays`. I couldn’t find any more guidance on PETSc
> forums or Google, so I wanted to know if this was the right way to go.
>
>
> Thanks,
>
>
> Rohan Yadav
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20211210/405eedf8/attachment.html>