[petsc-dev] Questions around benchmarking and data loading with PETSc

Rohan Yadav rohany at alumni.cmu.edu
Fri Dec 10 17:54:27 CST 2021


Hi, I’m Rohan, a student working on compilation techniques for distributed
tensor computations. I’m looking at using PETSc as a baseline for
experiments I’m running, and want to understand if I’m using PETSc as it
was intended to achieve high performance, and if the performance I’m seeing
is expected. Currently, I’m just looking at SpMV operations.


My experiments are run on the Lassen Supercomputer (
https://hpc.llnl.gov/hardware/platforms/lassen). The system has 40 CPUs, 4
V100s and an Infiniband interconnect. A visualization of the architecture
is here:
https://hpc.llnl.gov/sites/default/files/power9-AC922systemDiagram2_1.png.


As of now, I’m trying to understand the single-node performance of PETSc,
as the scaling performance onto multiple nodes appears to be as I expect.
I’m using the arabic-2005 sparse matrix from the SuiteSparse matrix
collection, detailed here: https://sparse.tamu.edu/LAW/arabic-2005. As a
trusted baseline, I am comparing against SpMV code generated by the TACO
compiler (
http://tensor-compiler.org/codegen.html?expr=y(i)%20=%20A(i,j)%20*%20x(j)&format=y:d:0;A:ds:0,1;x:d:0&sched=split:i:i0:i1:32;reorder:i0:i1:j;parallelize:i0:CPU%20Thread:No%20Races)
.


My experiments find that PETSc is roughly 4 times slower on a single thread
and node than the kernel generated by TACO:


PETSc: 1 Thread: 5694.72 ms, 1 Node 40 threads: 262.6 ms.

TACO: 1 Thread: 1341 ms, 1 Node 40 threads: 86 ms.


My code using PETSc is here:
https://github.com/rohany/taco/blob/9e0e30b16bfba5319b15b2d1392f35376952f838/petsc/benchmark.cpp#L38
.


Runs from 1 thread and 1 node with -log_view are attached to the email. The
command lines for each were as follows:


1 node 1 thread: `jsrun -n 1 -c 1 -r 1 -b rs ./bin/benchmark -n 20 -warmup
10 -matrix $TENSOR_DIR/arabic-2005.petsc -log_view`

1 node 40 threads: `jsrun -n 40 -c 1 -r 40 -b rs ./bin/benchmark -n 20
-warmup 10 -matrix $TENSOR_DIR/arabic-2005.petsc -log_view`



In addition to these benchmarking concerns, I wanted to share my
experiences trying to load data from Matrix Market files into PETSc, which
ended up 1being much more difficult than I anticipated. Essentially, trying
to iterate through the Matrix Market files and using `write` to insert
entries into a `Mat` was extremely slow. In order to get reasonable
performance, I had to use an external utility to basically construct a CSR
matrix, and then pass the arrays from the CSR Matrix into
`MatCreateSeqAIJWithArrays`. I couldn’t find any more guidance on PETSc
forums or Google, so I wanted to know if this was the right way to go.


Thanks,


Rohan Yadav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20211210/1e65105c/attachment-0001.html>
-------------- next part --------------
/g/g15/yadav2/.bashrc: line 1: module: command not found
/g/g15/yadav2/.bashrc: line 2: module: command not found
/g/g15/yadav2/.bashrc: line 3: module: command not found
/g/g15/yadav2/.bashrc: line 6: module: command not found
/g/g15/yadav2/.bashrc: line 7: module: command not found
Before matrix load
After matrix load
Average time: 5652.444561 ms.
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./bin/benchmark on a  named lassen776 with 1 processor, by yadav2 Fri Dec 10 15:28:04 2021
Using Petsc Release Version 3.13.0, Mar 29, 2020 

                         Max       Max/Min     Avg       Total 
Time (sec):           2.731e+02     1.000   2.731e+02
Objects:              5.000e+00     1.000   5.000e+00
Flop:                 3.782e+10     1.000   3.782e+10  3.782e+10
Flop/sec:             1.385e+08     1.000   1.385e+08  1.385e+08
MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total 
 0:      Main Stage: 2.7308e+02 100.0%  3.7817e+10 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult               30 1.0 1.6957e+02 1.0 3.78e+10 1.0 0.0e+00 0.0e+00 0.0e+00 62100  0  0  0  62100  0  0  0   223
MatAssemblyBegin       1 1.0 7.2800e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 4.9849e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatLoad                1 1.0 1.0329e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 38  0  0  0  0  38  0  0  0  0     0
VecSet                 4 1.0 2.0869e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     1              0            0     0.
              Viewer     2              0            0     0.
              Vector     2              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 4.98e-08
#PETSc Option Table entries:
-log_view
-matload_block_size 1
-matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc
-n 20
-warmup 10
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-g -DNoChange -qfullpath" FFLAGS="-g -qfullpath -qzerosize -qxlf2003=polymorphic" CXXFLAGS= --with-cc=/usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlc --with-cxx=/usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlC --with-fc=/usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlf --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib="/usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib/liblapack.so /usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib/libblas.so" --with-x=0 --with-clanguage=C --with-scalapack=0 --with-metis=1 --with-metis-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv --with-hdf5=1 --with-hdf5-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4 --with-hypre=1 --with-hypre-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk --with-parmetis=1 --with-parmetis-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/include --with-superlu_dist-lib=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse=0 --with-zlib-include=/usr/include --with-zlib-lib=/usr/lib64/libz.so --with-zlib=1
-----------------------------------------
Libraries compiled on 2020-04-09 16:35:17 on rzansel18 
Machine characteristics: Linux-4.14.0-115.10.1.1chaos.ch6a.ppc64le-ppc64le-with-redhat-7.6-Maipo
Using PETSc directory: /usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q
Using PETSc arch: 
-----------------------------------------

Using C compiler: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlc -g -DNoChange -qfullpath  
Using Fortran compiler: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlf -g -qfullpath -qzerosize -qxlf2003=polymorphic    
-----------------------------------------

Using include paths: -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv/include -I/usr/include
-----------------------------------------

Using C linker: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlc
Using Fortran linker: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlf
Using libraries: -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q/lib -lpetsc -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/lib -Wl,-rpath,/usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib -L/usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv/lib /usr/lib64/libz.so -Wl,-rpath,/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib -L/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlsmp/5.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlsmp/5.1.1/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlmass/9.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlmass/9.1.1/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlf/16.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlf/16.1.1/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/lib -Wl,-rpath,/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3 -L/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3 -Wl,-rpath,/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64 -L/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64 -Wl,-rpath,/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib:/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64 -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlC/16.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlC/16.1.1/lib -lHYPRE -lsuperlu_dist -llapack -lblas -lhdf5_hl -lhdf5 -lparmetis -lmetis -ldl -lmpiprofilesupport -lmpi_ibm_usempi -lmpi_ibm_mpifh -lmpi_ibm -lxlf90_r -lxlopt -lxl -lxlfmath -lgcc_s -lrt -lpthread -lm -ldl -lmpiprofilesupport -lmpi_ibm -lxlopt -lxl -libmc++ -lstdc++ -lm -lgcc_s -lpthread -ldl
-----------------------------------------

logout

------------------------------------------------------------
Sender: LSF System <lsfadmin at lassen710>
Subject: Job 3035809: <jsrun -n 1 -c 1 -r 1 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -log_view> in cluster <lassen> Done

Job <jsrun -n 1 -c 1 -r 1 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -log_view> was submitted from host <lassen627> by user <yadav2> in cluster <lassen> at Fri Dec 10 15:23:21 2021
Job was executed on host(s) <1*lassen710>, in queue <pbatch>, as user <yadav2> in cluster <lassen> at Fri Dec 10 15:23:24 2021
                            <40*lassen776>
</g/g15/yadav2> was used as the home directory.
</g/g15/yadav2/taco/petsc> was used as the working directory.
Started at Fri Dec 10 15:23:24 2021
Terminated at Fri Dec 10 15:28:35 2021
Results reported at Fri Dec 10 15:28:35 2021

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
jsrun -n 1 -c 1 -r 1 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -log_view
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time :                                   0.41 sec.
    Max Memory :                                 158 MB
    Average Memory :                             68.49 MB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   1060 MB
    Max Processes :                              4
    Max Threads :                                27
    Run time :                                   311 sec.
    Turnaround time :                            314 sec.

The output (if any) is above this job summary.
-------------- next part --------------
/g/g15/yadav2/.bashrc: line 1: module: command not found
/g/g15/yadav2/.bashrc: line 2: module: command not found
/g/g15/yadav2/.bashrc: line 3: module: command not found
/g/g15/yadav2/.bashrc: line 6: module: command not found
/g/g15/yadav2/.bashrc: line 7: module: command not found
Before matrix load
After matrix load
Average time: 262.627921 ms.
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./bin/benchmark on a  named lassen772 with 40 processors, by yadav2 Fri Dec 10 15:25:40 2021
Using Petsc Release Version 3.13.0, Mar 29, 2020 

                         Max       Max/Min     Avg       Total 
Time (sec):           1.093e+02     1.000   1.093e+02
Objects:              1.300e+01     1.000   1.300e+01
Flop:                 1.715e+09     3.071   9.456e+08  3.783e+10
Flop/sec:             1.569e+07     3.071   8.652e+06  3.461e+08
MPI Messages:         1.365e+03     1.233   1.247e+03  4.987e+04
MPI Message Lengths:  3.821e+09    58.505   1.632e+05  8.137e+09
MPI Reductions:       3.200e+01     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total 
 0:      Main Stage: 1.0929e+02 100.0%  3.7825e+10 100.0%  4.987e+04 100.0%  1.632e+05      100.0%  2.500e+01  78.1% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          1 1.0 9.2847e-0322.5 0.00e+00 0.0 1.6e+03 4.0e+00 1.0e+00  0  0  3  0  3   0  0  3  0  4     0
MatMult               30 1.0 7.8912e+00 1.0 1.71e+09 3.1 4.7e+04 1.1e+04 0.0e+00  7100 93  6  0   7100 93  6  0  4793
MatAssemblyBegin       1 1.0 1.0711e-05 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 2.3071e+00 6.3 0.00e+00 0.0 3.1e+03 2.8e+03 5.0e+00  2  0  6  0 16   2  0  6  0 20     0
MatLoad                1 1.0 1.0140e+02 1.0 0.00e+00 0.0 3.3e+03 2.3e+06 1.9e+01 93  0  7 94 59  93  0  7 94 76     0
VecSet                 3 1.0 5.3865e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin       30 1.0 4.3584e-02 2.7 0.00e+00 0.0 4.7e+04 1.1e+04 0.0e+00  0  0 93  6  0   0  0 93  6  0     0
VecScatterEnd         30 1.0 4.9138e+00853.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
SFSetGraph             1 1.0 9.3534e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                1 1.0 1.1334e-02 2.1 0.00e+00 0.0 3.1e+03 2.8e+03 1.0e+00  0  0  6  0  3   0  0  6  0  4     0
SFBcastOpBegin        30 1.0 4.3470e-02 2.7 0.00e+00 0.0 4.7e+04 1.1e+04 0.0e+00  0  0 93  6  0   0  0 93  6  0     0
SFBcastOpEnd          30 1.0 4.9136e+00862.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0     0
SFPack                30 1.0 3.5817e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFUnpack              30 1.0 3.0848e-05 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     3              0            0     0.
              Viewer     2              0            0     0.
         Vec Scatter     1              0            0     0.
              Vector     4              1         1696     0.
           Index Set     2              2       312812     0.
   Star Forest Graph     1              0            0     0.
========================================================================================================================
Average time to get PetscTime(): 4.2e-08
Average time for MPI_Barrier(): 1.933e-06
Average time for zero size MPI_Send(): 2.3585e-06
#PETSc Option Table entries:
-log_view
-matload_block_size 1
-matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc
-n 20
-warmup 10
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --prefix=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-g -DNoChange -qfullpath" FFLAGS="-g -qfullpath -qzerosize -qxlf2003=polymorphic" CXXFLAGS= --with-cc=/usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlc --with-cxx=/usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlC --with-fc=/usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlf --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib="/usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib/liblapack.so /usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib/libblas.so" --with-x=0 --with-clanguage=C --with-scalapack=0 --with-metis=1 --with-metis-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv --with-hdf5=1 --with-hdf5-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4 --with-hypre=1 --with-hypre-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk --with-parmetis=1 --with-parmetis-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/include --with-superlu_dist-lib=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse=0 --with-zlib-include=/usr/include --with-zlib-lib=/usr/lib64/libz.so --with-zlib=1
-----------------------------------------
Libraries compiled on 2020-04-09 16:35:17 on rzansel18 
Machine characteristics: Linux-4.14.0-115.10.1.1chaos.ch6a.ppc64le-ppc64le-with-redhat-7.6-Maipo
Using PETSc directory: /usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q
Using PETSc arch: 
-----------------------------------------

Using C compiler: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlc -g -DNoChange -qfullpath  
Using Fortran compiler: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlf -g -qfullpath -qzerosize -qxlf2003=polymorphic    
-----------------------------------------

Using include paths: -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv/include -I/usr/include
-----------------------------------------

Using C linker: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlc
Using Fortran linker: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlf
Using libraries: -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q/lib -lpetsc -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/lib -Wl,-rpath,/usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib -L/usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv/lib /usr/lib64/libz.so -Wl,-rpath,/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib -L/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlsmp/5.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlsmp/5.1.1/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlmass/9.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlmass/9.1.1/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlf/16.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlf/16.1.1/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/lib -Wl,-rpath,/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3 -L/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3 -Wl,-rpath,/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64 -L/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64 -Wl,-rpath,/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib:/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64 -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlC/16.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlC/16.1.1/lib -lHYPRE -lsuperlu_dist -llapack -lblas -lhdf5_hl -lhdf5 -lparmetis -lmetis -ldl -lmpiprofilesupport -lmpi_ibm_usempi -lmpi_ibm_mpifh -lmpi_ibm -lxlf90_r -lxlopt -lxl -lxlfmath -lgcc_s -lrt -lpthread -lm -ldl -lmpiprofilesupport -lmpi_ibm -lxlopt -lxl -libmc++ -lstdc++ -lm -lgcc_s -lpthread -ldl
-----------------------------------------

logout

------------------------------------------------------------
Sender: LSF System <lsfadmin at lassen710>
Subject: Job 3035811: <jsrun -n 40 -c 1 -r 40 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -log_view> in cluster <lassen> Done

Job <jsrun -n 40 -c 1 -r 40 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -log_view> was submitted from host <lassen627> by user <yadav2> in cluster <lassen> at Fri Dec 10 15:23:41 2021
Job was executed on host(s) <1*lassen710>, in queue <pbatch>, as user <yadav2> in cluster <lassen> at Fri Dec 10 15:23:43 2021
                            <40*lassen772>
</g/g15/yadav2> was used as the home directory.
</g/g15/yadav2/taco/petsc> was used as the working directory.
Started at Fri Dec 10 15:23:43 2021
Terminated at Fri Dec 10 15:26:21 2021
Results reported at Fri Dec 10 15:26:21 2021

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
jsrun -n 40 -c 1 -r 40 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -log_view
------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time :                                   0.36 sec.
    Max Memory :                                 59 MB
    Average Memory :                             57.24 MB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   1425 MB
    Max Processes :                              4
    Max Threads :                                27
    Run time :                                   157 sec.
    Turnaround time :                            160 sec.

The output (if any) is above this job summary.


More information about the petsc-dev mailing list