[petsc-users] [EXTERNAL] Re: Performance problem using COO interface

Fackler, Philip facklerpw at ornl.gov
Mon Jan 23 09:52:08 CST 2023


Thank you for looking into that.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: Monday, January 23, 2023 10:34
To: Fackler, Philip <facklerpw at ornl.gov>
Cc: Zhang, Junchao <jczhang at mcs.anl.gov>; xolotl-psi-development at lists.sourceforge.net <xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth, Philip <rothpc at ornl.gov>
Subject: [EXTERNAL] Re: [petsc-users] Performance problem using COO interface

Hi, Philip,
  It looks the performance of MatPtAP is pretty bad.  There are a lot of issues with PtAP, which I am going to address.
MatPtAPNumeric       181 1.0   nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 0.0e+00 56  0  4 21  0  56  0  4 21  0  -nan     -nan      0 0.00e+00    0 0.00e+00  0

 Thanks.
--Junchao Zhang


On Fri, Jan 20, 2023 at 10:55 AM Fackler, Philip via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
The following is the log_view output for the ported case using 4 MPI tasks.

****************************************************************************************************************************************************************
***                                WIDEN YOUR WINDOW TO 160 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document                                 ***
****************************************************************************************************************************************************************

------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------

Unknown Name on a  named iguazu with 4 processors, by 4pf Fri Jan 20 11:53:04 2023
Using Petsc Release Version 3.18.3, unknown

                         Max       Max/Min     Avg       Total
Time (sec):           1.447e+01     1.000   1.447e+01
Objects:              1.229e+03     1.003   1.226e+03
Flops:                5.053e+09     1.217   4.593e+09  1.837e+10
Flops/sec:            3.492e+08     1.217   3.174e+08  1.269e+09
MPI Msg Count:        1.977e+04     1.067   1.895e+04  7.580e+04
MPI Msg Len (bytes):  7.374e+07     1.088   3.727e+03  2.825e+08
MPI Reductions:       2.065e+03     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 1.4471e+01 100.0%  1.8371e+10 100.0%  7.580e+04 100.0%  3.727e+03      100.0%  2.046e+03  99.1%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
   GPU    - CpuToGpu -   - GpuToCpu - GPU

                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
 Mflop/s Count   Size   Count   Size  %F

------------------------------------------------------------------------------------------------------------------------
---------------------------------------


--- Event Stage 0: Main Stage

BuildTwoSided        257 1.0   nan nan 0.00e+00 0.0 4.4e+02 8.0e+00 2.6e+02  1  0  1  0 12   1  0  1  0 13  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

BuildTwoSidedF       210 1.0   nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 2.1e+02  1  0  0  2 10   1  0  0  2 10  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

DMCreateMat            1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 10  0  0  0  0  10  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFSetGraph            69 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFSetUp               47 1.0   nan nan 0.00e+00 0.0 7.3e+02 2.1e+03 4.7e+01  0  0  1  1  2   0  0  1  1  2  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFBcastBegin         222 1.0   nan nan 0.00e+00 0.0 2.3e+03 1.9e+04 0.0e+00  0  0  3 16  0   0  0  3 16  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFBcastEnd           222 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFReduceBegin        254 1.0   nan nan 0.00e+00 0.0 1.5e+03 1.2e+04 0.0e+00  0  0  2  6  0   0  0  2  6  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFReduceEnd          254 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFFetchOpBegin         1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFFetchOpEnd           1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFPack              8091 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFUnpack            8092 1.0   nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecDot                60 1.0   nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 6.0e+01  0  0  0  0  3   0  0  0  0  3  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

VecMDot              398 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+02  0  0  0  0 19   0  0  0  0 19  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecNorm              641 1.0   nan nan 4.45e+07 1.2 0.0e+00 0.0e+00 6.4e+02  1  1  0  0 31   1  1  0  0 31  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

VecScale             601 1.0   nan nan 2.08e+07 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

VecCopy             3735 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecSet              2818 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecAXPY              123 1.0   nan nan 8.68e+06 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

VecAYPX             6764 1.0   nan nan 1.90e+08 1.2 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

VecAXPBYCZ          2388 1.0   nan nan 1.83e+08 1.2 0.0e+00 0.0e+00 0.0e+00  0  4  0  0  0   0  4  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

VecWAXPY              60 1.0   nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

VecMAXPY             681 1.0   nan nan 1.36e+08 1.2 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

VecAssemblyBegin       7 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecAssemblyEnd         7 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecPointwiseMult    4449 1.0   nan nan 6.06e+07 1.2 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

VecScatterBegin     7614 1.0   nan nan 0.00e+00 0.0 7.1e+04 2.9e+03 1.3e+01  0  0 94 73  1   0  0 94 73  1  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecScatterEnd       7614 1.0   nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 0.0e+00  3  0  0  0  0   3  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecReduceArith       120 1.0   nan nan 8.60e+06 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

VecReduceComm         60 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+01  0  0  0  0  3   0  0  0  0  3  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecNormalize         401 1.0   nan nan 4.09e+07 1.2 0.0e+00 0.0e+00 4.0e+02  0  1  0  0 19   0  1  0  0 20  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

TSStep                20 1.0 1.2908e+01 1.0 5.05e+09 1.2 7.6e+04 3.7e+03 2.0e+03 89 100 100 98 96  89 100 100 98 97  1423
    -nan      0 0.00e+00    0 0.00e+00  99

TSFunctionEval       140 1.0   nan nan 1.00e+07 1.2 1.1e+03 3.7e+04 0.0e+00  1  0  1 15  0   1  0  1 15  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

TSJacobianEval        60 1.0   nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 6.0e+01  2  0  1  6  3   2  0  1  6  3  -nan
    -nan      0 0.00e+00    0 0.00e+00  87

MatMult             4934 1.0   nan nan 4.16e+09 1.2 5.1e+04 2.7e+03 4.0e+00 15 82 68 49  0  15 82 68 49  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

MatMultAdd          1104 1.0   nan nan 9.00e+07 1.2 8.8e+03 1.4e+02 0.0e+00  1  2 12  0  0   1  2 12  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

MatMultTranspose    1104 1.0   nan nan 9.01e+07 1.2 8.8e+03 1.4e+02 1.0e+00  1  2 12  0  0   1  2 12  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

MatSolve             368 0.0   nan nan 3.57e+04 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatSOR                60 1.0   nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatLUFactorSym         2 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatLUFactorNum         2 1.0   nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatConvert             8 1.0   nan nan 0.00e+00 0.0 8.0e+01 1.2e+03 4.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatScale              66 1.0   nan nan 1.48e+07 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  99

MatResidual         1104 1.0   nan nan 1.01e+09 1.2 1.2e+04 2.9e+03 0.0e+00  4 20 16 12  0   4 20 16 12  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

MatAssemblyBegin     590 1.0   nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 2.0e+02  1  0  0  2 10   1  0  0  2 10  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatAssemblyEnd       590 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+02  2  0  0  0  7   2  0  0  0  7  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatGetRowIJ            2 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatCreateSubMat      122 1.0   nan nan 0.00e+00 0.0 6.3e+01 1.8e+02 1.7e+02  2  0  0  0  8   2  0  0  0  8  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatGetOrdering         2 0.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatCoarsen             3 1.0   nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02  0  0  1  0  6   0  0  1  0  6  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatZeroEntries        61 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatAXPY                6 1.0   nan nan 1.37e+06 1.2 0.0e+00 0.0e+00 1.8e+01  1  0  0  0  1   1  0  0  0  1  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatTranspose           6 1.0   nan nan 0.00e+00 0.0 2.2e+02 2.9e+04 4.8e+01  1  0  0  2  2   1  0  0  2  2  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatMatMultSym          4 1.0   nan nan 0.00e+00 0.0 2.2e+02 1.7e+03 2.8e+01  0  0  0  0  1   0  0  0  0  1  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatMatMultNum          4 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatPtAPSymbolic        5 1.0   nan nan 0.00e+00 0.0 6.2e+02 5.2e+03 4.4e+01  3  0  1  1  2   3  0  1  1  2  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatPtAPNumeric       181 1.0   nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 0.0e+00 56  0  4 21  0  56  0  4 21  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatGetLocalMat       185 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatSetPreallCOO        1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatSetValuesCOO       60 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

KSPSetUp             483 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+01  0  0  0  0  1   0  0  0  0  1  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

KSPSolve              60 1.0 1.1843e+01 1.0 4.91e+09 1.2 7.3e+04 2.9e+03 1.2e+03 82 97 97 75 60  82 97 97 75 60  1506
    -nan      0 0.00e+00    0 0.00e+00  99

KSPGMRESOrthog       398 1.0   nan nan 7.97e+07 1.2 0.0e+00 0.0e+00 4.0e+02  1  2  0  0 19   1  2  0  0 19  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

SNESSolve             60 1.0 1.2842e+01 1.0 5.01e+09 1.2 7.5e+04 3.6e+03 2.0e+03 89 99 100 96 95  89 99 100 96 96  1419
    -nan      0 0.00e+00    0 0.00e+00  99

SNESSetUp              1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SNESFunctionEval     120 1.0   nan nan 3.01e+07 1.2 9.6e+02 3.7e+04 0.0e+00  1  1  1 13  0   1  1  1 13  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

SNESJacobianEval      60 1.0   nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 6.0e+01  2  0  1  6  3   2  0  1  6  3  -nan
    -nan      0 0.00e+00    0 0.00e+00  87

SNESLineSearch        60 1.0   nan nan 6.99e+07 1.2 9.6e+02 1.9e+04 2.4e+02  1  1  1  6 12   1  1  1  6 12  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

PCSetUp_GAMG+         60 1.0   nan nan 3.53e+07 1.2 5.2e+03 1.4e+04 4.3e+02 62  1  7 25 21  62  1  7 25 21  -nan
    -nan      0 0.00e+00    0 0.00e+00  96

 PCGAMGCreateG         3 1.0   nan nan 1.32e+06 1.2 2.2e+02 2.9e+04 4.2e+01  1  0  0  2  2   1  0  0  2  2  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

 GAMG Coarsen          3 1.0   nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02  1  0  1  0  6   1  0  1  0  6  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

  GAMG MIS/Agg         3 1.0   nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02  0  0  1  0  6   0  0  1  0  6  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

 PCGAMGProl            3 1.0   nan nan 0.00e+00 0.0 7.8e+01 7.8e+02 4.8e+01  0  0  0  0  2   0  0  0  0  2  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

  GAMG Prol-col        3 1.0   nan nan 0.00e+00 0.0 5.2e+01 5.8e+02 2.1e+01  0  0  0  0  1   0  0  0  0  1  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

  GAMG Prol-lift       3 1.0   nan nan 0.00e+00 0.0 2.6e+01 1.2e+03 1.5e+01  0  0  0  0  1   0  0  0  0  1  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

 PCGAMGOptProl         3 1.0   nan nan 3.40e+07 1.2 5.8e+02 2.4e+03 1.1e+02  1  1  1  0  6   1  1  1  0  6  -nan
    -nan      0 0.00e+00    0 0.00e+00  100

  GAMG smooth          3 1.0   nan nan 2.85e+05 1.2 1.9e+02 1.9e+03 3.0e+01  0  0  0  0  1   0  0  0  0  1  -nan
    -nan      0 0.00e+00    0 0.00e+00  43

 PCGAMGCreateL         3 1.0   nan nan 0.00e+00 0.0 4.8e+02 6.5e+03 8.0e+01  3  0  1  1  4   3  0  1  1  4  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

  GAMG PtAP            3 1.0   nan nan 0.00e+00 0.0 4.5e+02 7.1e+03 2.7e+01  3  0  1  1  1   3  0  1  1  1  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

  GAMG Reduce          1 1.0   nan nan 0.00e+00 0.0 3.6e+01 3.7e+01 5.3e+01  0  0  0  0  3   0  0  0  0  3  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

PCGAMG Gal l00        60 1.0   nan nan 0.00e+00 0.0 1.1e+03 1.4e+04 9.0e+00 46  0  1  6  0  46  0  1  6  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

PCGAMG Opt l00         1 1.0   nan nan 0.00e+00 0.0 4.8e+01 1.7e+02 7.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

PCGAMG Gal l01        60 1.0   nan nan 0.00e+00 0.0 1.6e+03 2.9e+04 9.0e+00 13  0  2 16  0  13  0  2 16  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

PCGAMG Opt l01         1 1.0   nan nan 0.00e+00 0.0 7.2e+01 4.8e+03 7.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

PCGAMG Gal l02        60 1.0   nan nan 0.00e+00 0.0 1.1e+03 1.2e+03 1.7e+01  0  0  1  0  1   0  0  1  0  1  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

PCGAMG Opt l02         1 1.0   nan nan 0.00e+00 0.0 7.2e+01 2.2e+02 7.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

PCSetUp              182 1.0   nan nan 3.53e+07 1.2 5.3e+03 1.4e+04 7.7e+02 64  1  7 27 37  64  1  7 27 38  -nan
    -nan      0 0.00e+00    0 0.00e+00  96

PCSetUpOnBlocks      368 1.0   nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

PCApply               60 1.0   nan nan 4.85e+09 1.2 7.3e+04 2.9e+03 1.1e+03 81 96 96 75 54  81 96 96 75 54  -nan
    -nan      0 0.00e+00    0 0.00e+00  99

KSPSolve_FS_0         60 1.0   nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

KSPSolve_FS_1         60 1.0   nan nan 4.79e+09 1.2 7.2e+04 2.9e+03 1.1e+03 81 95 96 75 54  81 95 96 75 54  -nan
    -nan      0 0.00e+00    0 0.00e+00  100


--- Event Stage 1: Unknown

------------------------------------------------------------------------------------------------------------------------
---------------------------------------


Object Type          Creations   Destructions. Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container    14             14
    Distributed Mesh     9              9
           Index Set   120            120
   IS L to G Mapping    10             10
   Star Forest Graph    87             87
     Discrete System     9              9
           Weak Form     9              9
              Vector   761            761
             TSAdapt     1              1
                  TS     1              1
                DMTS     1              1
                SNES     1              1
              DMSNES     3              3
      SNESLineSearch     1              1
       Krylov Solver    11             11
     DMKSP interface     1              1
              Matrix   171            171
      Matrix Coarsen     3              3
      Preconditioner    11             11
              Viewer     2              1
         PetscRandom     3              3

--- Event Stage 1: Unknown

========================================================================================================================
Average time to get PetscTime(): 3.82e-08
Average time for MPI_Barrier(): 2.2968e-06
Average time for zero size MPI_Send(): 3.371e-06
#PETSc Option Table entries:
-log_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with 64 bit PetscInt
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: PETSC_DIR=/home2/4pf/petsc PETSC_ARCH=arch-kokkos-serial --prefix=/home2/4pf/.local/serial --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --with-cuda=0 --with-shared-libraries --with-64-bit-indices --with-debugging=0 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --with-kokkos-dir=/home2/4pf/.local/serial --with-kokkos-kernels-dir=/home2/4pf/.local/serial --download-f2cblaslapack

-----------------------------------------
Libraries compiled on 2023-01-06 18:21:31 on iguazu
Machine characteristics: Linux-4.18.0-383.el8.x86_64-x86_64-with-glibc2.28
Using PETSc directory: /home2/4pf/.local/serial
Using PETSc arch:
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -O3
-----------------------------------------

Using include paths: -I/home2/4pf/.local/serial/include
-----------------------------------------

Using C linker: mpicc
Using libraries: -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib -lpetsc -Wl,-rpath,/home2/4pf/.local/serial/lib64 -L/home2/4pf/.local/serial/lib64 -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib -lkokkoskernels -lkokkoscontainers -lkokkoscore -lf2clapack -lf2cblas -lm -lX11 -lquadmath -lstdc++ -ldl
-----------------------------------------


---


Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Zhang, Junchao <jczhang at mcs.anl.gov<mailto:jczhang at mcs.anl.gov>>
Sent: Tuesday, January 17, 2023 17:25
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>; xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Cc: Mills, Richard Tran <rtmills at anl.gov<mailto:rtmills at anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>; Roth, Philip <rothpc at ornl.gov<mailto:rothpc at ornl.gov>>
Subject: [EXTERNAL] Re: Performance problem using COO interface

Hi, Philip,
  Could you add -log_view and see what functions are used in the solve? Since it is CPU-only, perhaps with -log_view of different runs, we can easily see which functions slowed down.

--Junchao Zhang
________________________________
From: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Sent: Tuesday, January 17, 2023 4:13 PM
To: xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Cc: Mills, Richard Tran <rtmills at anl.gov<mailto:rtmills at anl.gov>>; Zhang, Junchao <jczhang at mcs.anl.gov<mailto:jczhang at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>; Roth, Philip <rothpc at ornl.gov<mailto:rothpc at ornl.gov>>
Subject: Performance problem using COO interface

In Xolotl's feature-petsc-kokkos branch I have ported the code to use petsc's COO interface for creating the Jacobian matrix (and the Kokkos interface for interacting with Vec entries). As the attached plots show for one case, while the code for computing the RHSFunction and RHSJacobian perform similarly (or slightly better) after the port, the performance for the solve as a whole is significantly worse.

Note:
This is all CPU-only (so kokkos and kokkos-kernels are built with only the serial backend).
The dev version is using MatSetValuesStencil with the default implementations for Mat and Vec.
The port version is using MatSetValuesCOO and is run with -dm_mat_type aijkokkos -dm_vec_type kokkos​.
The port/def version is using MatSetValuesCOO and is run with -dm_vec_type kokkos​ (using the default Mat implementation).

So, this seems to be due be a performance difference in the petsc implementations. Please advise. Is this a known issue? Or am I missing something?

Thank you for the help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230123/7970996c/attachment-0001.html>


More information about the petsc-users mailing list