[petsc-users] Performance problem using COO interface
Junchao Zhang
junchao.zhang at gmail.com
Mon Jan 23 09:34:35 CST 2023
Hi, Philip,
It looks the performance of MatPtAP is pretty bad. There are a lot of
issues with PtAP, which I am going to address.
MatPtAPNumeric 181 1.0 nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 0.0e+00
56 0 4 21 0 56 0 4 21 0 -nan -nan 0 0.00e+00 0 0.00e+00
0
Thanks.
--Junchao Zhang
On Fri, Jan 20, 2023 at 10:55 AM Fackler, Philip via petsc-users <
petsc-users at mcs.anl.gov> wrote:
> The following is the log_view output for the ported case using 4 MPI tasks.
>
> ****************************************************************************************************************************************************************
>
> *** WIDEN YOUR WINDOW TO 160 CHARACTERS.
> Use 'enscript -r -fCourier9' to print this document
> ***
>
> ****************************************************************************************************************************************************************
>
> ------------------------------------------------------------------ PETSc
> Performance Summary:
> ------------------------------------------------------------------
>
> Unknown Name on a named iguazu with 4 processors, by 4pf Fri Jan 20
> 11:53:04 2023
> Using Petsc Release Version 3.18.3, unknown
>
> Max Max/Min Avg Total
> Time (sec): 1.447e+01 1.000 1.447e+01
> Objects: 1.229e+03 1.003 1.226e+03
> Flops: 5.053e+09 1.217 4.593e+09 1.837e+10
> Flops/sec: 3.492e+08 1.217 3.174e+08 1.269e+09
> MPI Msg Count: 1.977e+04 1.067 1.895e+04 7.580e+04
> MPI Msg Len (bytes): 7.374e+07 1.088 3.727e+03 2.825e+08
> MPI Reductions: 2.065e+03 1.000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total Count
> %Total Avg %Total Count %Total
> 0: Main Stage: 1.4471e+01 100.0% 1.8371e+10 100.0% 7.580e+04
> 100.0% 3.727e+03 100.0% 2.046e+03 99.1%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flop: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> AvgLen: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flop in this
> phase
> %M - percent messages in this phase %L - percent message lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
> all processors)
> GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU
> time over all processors)
> CpuToGpu Count: total number of CPU to GPU copies per processor
> CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per
> processor)
> GpuToCpu Count: total number of GPU to CPU copies per processor
> GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per
> processor)
> GPU %F: percent flops on GPU in this event
>
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flop
> --- Global --- --- Stage ---- Total
> GPU - CpuToGpu - - GpuToCpu - GPU
>
> Max Ratio Max Ratio Max Ratio Mess AvgLen
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> Mflop/s Count Size Count Size %F
>
>
> ------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------
>
>
> --- Event Stage 0: Main Stage
>
> BuildTwoSided 257 1.0 nan nan 0.00e+00 0.0 4.4e+02 8.0e+00
> 2.6e+02 1 0 1 0 12 1 0 1 0 13 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> BuildTwoSidedF 210 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04
> 2.1e+02 1 0 0 2 10 1 0 0 2 10 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 7.0e+00 10 0 0 0 0 10 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFSetGraph 69 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFSetUp 47 1.0 nan nan 0.00e+00 0.0 7.3e+02 2.1e+03
> 4.7e+01 0 0 1 1 2 0 0 1 1 2 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFBcastBegin 222 1.0 nan nan 0.00e+00 0.0 2.3e+03 1.9e+04
> 0.0e+00 0 0 3 16 0 0 0 3 16 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFBcastEnd 222 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFReduceBegin 254 1.0 nan nan 0.00e+00 0.0 1.5e+03 1.2e+04
> 0.0e+00 0 0 2 6 0 0 0 2 6 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFReduceEnd 254 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFFetchOpBegin 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFFetchOpEnd 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFPack 8091 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFUnpack 8092 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecDot 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00
> 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecMDot 398 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 4.0e+02 0 0 0 0 19 0 0 0 0 19 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecNorm 641 1.0 nan nan 4.45e+07 1.2 0.0e+00 0.0e+00
> 6.4e+02 1 1 0 0 31 1 1 0 0 31 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecScale 601 1.0 nan nan 2.08e+07 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecCopy 3735 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecSet 2818 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecAXPY 123 1.0 nan nan 8.68e+06 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecAYPX 6764 1.0 nan nan 1.90e+08 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecAXPBYCZ 2388 1.0 nan nan 1.83e+08 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecWAXPY 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecMAXPY 681 1.0 nan nan 1.36e+08 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecAssemblyBegin 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecAssemblyEnd 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecPointwiseMult 4449 1.0 nan nan 6.06e+07 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecScatterBegin 7614 1.0 nan nan 0.00e+00 0.0 7.1e+04 2.9e+03
> 1.3e+01 0 0 94 73 1 0 0 94 73 1 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecScatterEnd 7614 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00
> 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecReduceArith 120 1.0 nan nan 8.60e+06 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecReduceComm 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecNormalize 401 1.0 nan nan 4.09e+07 1.2 0.0e+00 0.0e+00
> 4.0e+02 0 1 0 0 19 0 1 0 0 20 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> TSStep 20 1.0 1.2908e+01 1.0 5.05e+09 1.2 7.6e+04 3.7e+03
> 2.0e+03 89 100 100 98 96 89 100 100 98 97 1423
> -nan 0 0.00e+00 0 0.00e+00 99
>
> TSFunctionEval 140 1.0 nan nan 1.00e+07 1.2 1.1e+03 3.7e+04
> 0.0e+00 1 0 1 15 0 1 0 1 15 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> TSJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04
> 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan
> -nan 0 0.00e+00 0 0.00e+00 87
>
> MatMult 4934 1.0 nan nan 4.16e+09 1.2 5.1e+04 2.7e+03
> 4.0e+00 15 82 68 49 0 15 82 68 49 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> MatMultAdd 1104 1.0 nan nan 9.00e+07 1.2 8.8e+03 1.4e+02
> 0.0e+00 1 2 12 0 0 1 2 12 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> MatMultTranspose 1104 1.0 nan nan 9.01e+07 1.2 8.8e+03 1.4e+02
> 1.0e+00 1 2 12 0 0 1 2 12 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> MatSolve 368 0.0 nan nan 3.57e+04 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatSOR 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatLUFactorSym 2 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatLUFactorNum 2 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatConvert 8 1.0 nan nan 0.00e+00 0.0 8.0e+01 1.2e+03
> 4.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatScale 66 1.0 nan nan 1.48e+07 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 99
>
> MatResidual 1104 1.0 nan nan 1.01e+09 1.2 1.2e+04 2.9e+03
> 0.0e+00 4 20 16 12 0 4 20 16 12 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> MatAssemblyBegin 590 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04
> 2.0e+02 1 0 0 2 10 1 0 0 2 10 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatAssemblyEnd 590 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.4e+02 2 0 0 0 7 2 0 0 0 7 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatGetRowIJ 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatCreateSubMat 122 1.0 nan nan 0.00e+00 0.0 6.3e+01 1.8e+02
> 1.7e+02 2 0 0 0 8 2 0 0 0 8 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatGetOrdering 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatCoarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03
> 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatZeroEntries 61 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatAXPY 6 1.0 nan nan 1.37e+06 1.2 0.0e+00 0.0e+00
> 1.8e+01 1 0 0 0 1 1 0 0 0 1 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatTranspose 6 1.0 nan nan 0.00e+00 0.0 2.2e+02 2.9e+04
> 4.8e+01 1 0 0 2 2 1 0 0 2 2 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatMatMultSym 4 1.0 nan nan 0.00e+00 0.0 2.2e+02 1.7e+03
> 2.8e+01 0 0 0 0 1 0 0 0 0 1 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatMatMultNum 4 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatPtAPSymbolic 5 1.0 nan nan 0.00e+00 0.0 6.2e+02 5.2e+03
> 4.4e+01 3 0 1 1 2 3 0 1 1 2 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatPtAPNumeric 181 1.0 nan nan 0.00e+00 0.0 3.3e+03 1.8e+04
> 0.0e+00 56 0 4 21 0 56 0 4 21 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatGetLocalMat 185 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+01 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatSetValuesCOO 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> KSPSetUp 483 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.2e+01 0 0 0 0 1 0 0 0 0 1 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> KSPSolve 60 1.0 1.1843e+01 1.0 4.91e+09 1.2 7.3e+04 2.9e+03
> 1.2e+03 82 97 97 75 60 82 97 97 75 60 1506
> -nan 0 0.00e+00 0 0.00e+00 99
>
> KSPGMRESOrthog 398 1.0 nan nan 7.97e+07 1.2 0.0e+00 0.0e+00
> 4.0e+02 1 2 0 0 19 1 2 0 0 19 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> SNESSolve 60 1.0 1.2842e+01 1.0 5.01e+09 1.2 7.5e+04 3.6e+03
> 2.0e+03 89 99 100 96 95 89 99 100 96 96 1419
> -nan 0 0.00e+00 0 0.00e+00 99
>
> SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SNESFunctionEval 120 1.0 nan nan 3.01e+07 1.2 9.6e+02 3.7e+04
> 0.0e+00 1 1 1 13 0 1 1 1 13 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> SNESJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04
> 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan
> -nan 0 0.00e+00 0 0.00e+00 87
>
> SNESLineSearch 60 1.0 nan nan 6.99e+07 1.2 9.6e+02 1.9e+04
> 2.4e+02 1 1 1 6 12 1 1 1 6 12 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> PCSetUp_GAMG+ 60 1.0 nan nan 3.53e+07 1.2 5.2e+03 1.4e+04
> 4.3e+02 62 1 7 25 21 62 1 7 25 21 -nan
> -nan 0 0.00e+00 0 0.00e+00 96
>
> PCGAMGCreateG 3 1.0 nan nan 1.32e+06 1.2 2.2e+02 2.9e+04
> 4.2e+01 1 0 0 2 2 1 0 0 2 2 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> GAMG Coarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03
> 1.2e+02 1 0 1 0 6 1 0 1 0 6 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> GAMG MIS/Agg 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03
> 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> PCGAMGProl 3 1.0 nan nan 0.00e+00 0.0 7.8e+01 7.8e+02
> 4.8e+01 0 0 0 0 2 0 0 0 0 2 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> GAMG Prol-col 3 1.0 nan nan 0.00e+00 0.0 5.2e+01 5.8e+02
> 2.1e+01 0 0 0 0 1 0 0 0 0 1 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> GAMG Prol-lift 3 1.0 nan nan 0.00e+00 0.0 2.6e+01 1.2e+03
> 1.5e+01 0 0 0 0 1 0 0 0 0 1 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> PCGAMGOptProl 3 1.0 nan nan 3.40e+07 1.2 5.8e+02 2.4e+03
> 1.1e+02 1 1 1 0 6 1 1 1 0 6 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> GAMG smooth 3 1.0 nan nan 2.85e+05 1.2 1.9e+02 1.9e+03
> 3.0e+01 0 0 0 0 1 0 0 0 0 1 -nan
> -nan 0 0.00e+00 0 0.00e+00 43
>
> PCGAMGCreateL 3 1.0 nan nan 0.00e+00 0.0 4.8e+02 6.5e+03
> 8.0e+01 3 0 1 1 4 3 0 1 1 4 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> GAMG PtAP 3 1.0 nan nan 0.00e+00 0.0 4.5e+02 7.1e+03
> 2.7e+01 3 0 1 1 1 3 0 1 1 1 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> GAMG Reduce 1 1.0 nan nan 0.00e+00 0.0 3.6e+01 3.7e+01
> 5.3e+01 0 0 0 0 3 0 0 0 0 3 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> PCGAMG Gal l00 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.4e+04
> 9.0e+00 46 0 1 6 0 46 0 1 6 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> PCGAMG Opt l00 1 1.0 nan nan 0.00e+00 0.0 4.8e+01 1.7e+02
> 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> PCGAMG Gal l01 60 1.0 nan nan 0.00e+00 0.0 1.6e+03 2.9e+04
> 9.0e+00 13 0 2 16 0 13 0 2 16 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> PCGAMG Opt l01 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 4.8e+03
> 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> PCGAMG Gal l02 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.2e+03
> 1.7e+01 0 0 1 0 1 0 0 1 0 1 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> PCGAMG Opt l02 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 2.2e+02
> 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> PCSetUp 182 1.0 nan nan 3.53e+07 1.2 5.3e+03 1.4e+04
> 7.7e+02 64 1 7 27 37 64 1 7 27 38 -nan
> -nan 0 0.00e+00 0 0.00e+00 96
>
> PCSetUpOnBlocks 368 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> PCApply 60 1.0 nan nan 4.85e+09 1.2 7.3e+04 2.9e+03
> 1.1e+03 81 96 96 75 54 81 96 96 75 54 -nan
> -nan 0 0.00e+00 0 0.00e+00 99
>
> KSPSolve_FS_0 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00
> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> KSPSolve_FS_1 60 1.0 nan nan 4.79e+09 1.2 7.2e+04 2.9e+03
> 1.1e+03 81 95 96 75 54 81 95 96 75 54 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
>
> --- Event Stage 1: Unknown
>
>
> ------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------
>
>
> Object Type Creations Destructions. Reports information only
> for process 0.
>
> --- Event Stage 0: Main Stage
>
> Container 14 14
> Distributed Mesh 9 9
> Index Set 120 120
> IS L to G Mapping 10 10
> Star Forest Graph 87 87
> Discrete System 9 9
> Weak Form 9 9
> Vector 761 761
> TSAdapt 1 1
> TS 1 1
> DMTS 1 1
> SNES 1 1
> DMSNES 3 3
> SNESLineSearch 1 1
> Krylov Solver 11 11
> DMKSP interface 1 1
> Matrix 171 171
> Matrix Coarsen 3 3
> Preconditioner 11 11
> Viewer 2 1
> PetscRandom 3 3
>
> --- Event Stage 1: Unknown
>
>
> ========================================================================================================================
> Average time to get PetscTime(): 3.82e-08
> Average time for MPI_Barrier(): 2.2968e-06
> Average time for zero size MPI_Send(): 3.371e-06
> #PETSc Option Table entries:
> -log_view
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with 64 bit PetscInt
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 8
> Configure options: PETSC_DIR=/home2/4pf/petsc
> PETSC_ARCH=arch-kokkos-serial --prefix=/home2/4pf/.local/serial
> --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --with-cuda=0
> --with-shared-libraries --with-64-bit-indices --with-debugging=0
> --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3
> --with-kokkos-dir=/home2/4pf/.local/serial
> --with-kokkos-kernels-dir=/home2/4pf/.local/serial --download-f2cblaslapack
>
> -----------------------------------------
> Libraries compiled on 2023-01-06 18:21:31 on iguazu
> Machine characteristics: Linux-4.18.0-383.el8.x86_64-x86_64-with-glibc2.28
> Using PETSc directory: /home2/4pf/.local/serial
> Using PETSc arch:
> -----------------------------------------
>
> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -O3
> -----------------------------------------
>
> Using include paths: -I/home2/4pf/.local/serial/include
> -----------------------------------------
>
> Using C linker: mpicc
> Using libraries: -Wl,-rpath,/home2/4pf/.local/serial/lib
> -L/home2/4pf/.local/serial/lib -lpetsc
> -Wl,-rpath,/home2/4pf/.local/serial/lib64 -L/home2/4pf/.local/serial/lib64
> -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib
> -lkokkoskernels -lkokkoscontainers -lkokkoscore -lf2clapack -lf2cblas -lm
> -lX11 -lquadmath -lstdc++ -ldl
> -----------------------------------------
>
>
> ---
>
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Zhang, Junchao <jczhang at mcs.anl.gov>
> *Sent:* Tuesday, January 17, 2023 17:25
> *To:* Fackler, Philip <facklerpw at ornl.gov>;
> xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Cc:* Mills, Richard Tran <rtmills at anl.gov>; Blondel, Sophie <
> sblondel at utk.edu>; Roth, Philip <rothpc at ornl.gov>
> *Subject:* [EXTERNAL] Re: Performance problem using COO interface
>
> Hi, Philip,
> Could you add -log_view and see what functions are used in the solve?
> Since it is CPU-only, perhaps with -log_view of different runs, we can
> easily see which functions slowed down.
>
> --Junchao Zhang
> ------------------------------
> *From:* Fackler, Philip <facklerpw at ornl.gov>
> *Sent:* Tuesday, January 17, 2023 4:13 PM
> *To:* xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Cc:* Mills, Richard Tran <rtmills at anl.gov>; Zhang, Junchao <
> jczhang at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth, Philip <
> rothpc at ornl.gov>
> *Subject:* Performance problem using COO interface
>
> In Xolotl's feature-petsc-kokkos branch I have ported the code to use
> petsc's COO interface for creating the Jacobian matrix (and the Kokkos
> interface for interacting with Vec entries). As the attached plots show for
> one case, while the code for computing the RHSFunction and RHSJacobian
> perform similarly (or slightly better) after the port, the performance for
> the solve as a whole is significantly worse.
>
> Note:
> This is all CPU-only (so kokkos and kokkos-kernels are built with only the
> serial backend).
> The dev version is using MatSetValuesStencil with the default
> implementations for Mat and Vec.
> The port version is using MatSetValuesCOO and is run with -dm_mat_type
> aijkokkos -dm_vec_type kokkos.
> The port/def version is using MatSetValuesCOO and is run with -dm_vec_type
> kokkos (using the default Mat implementation).
>
> So, this seems to be due be a performance difference in the petsc
> implementations. Please advise. Is this a known issue? Or am I missing
> something?
>
> Thank you for the help,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230123/baee7dc1/attachment-0001.html>
More information about the petsc-users
mailing list