<div dir="ltr"><div>Hi, Philip,</div><div> It looks the performance of MatPtAP is pretty bad. There are a lot of issues with PtAP, which I am going to address. </div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div>MatPtAPNumeric 181 1.0 nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 0.0e+00 56 0 4 21 0 56 0 4 21 0 -nan -nan 0 0.00e+00 0 0.00e+00 0</div></blockquote><div><br></div><div> Thanks.</div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">--Junchao Zhang</div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Jan 20, 2023 at 10:55 AM Fackler, Philip via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov">petsc-users@mcs.anl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="msg7509031262052082599">
<div dir="ltr">
<div><span style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">The following is the log_view output for the ported case using 4 MPI tasks.</span></div>
<div><span style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)"><br>
</span></div>
<div><span style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">****************************************************************************************************************************************************************
<div>*** WIDEN YOUR WINDOW TO 160 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***</div>
<div>****************************************************************************************************************************************************************</div>
<div><br>
</div>
<div>------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------</div>
<div><br>
</div>
<div>Unknown Name on a named iguazu with 4 processors, by 4pf Fri Jan 20 11:53:04 2023</div>
<div>Using Petsc Release Version 3.18.3, unknown </div>
<div><br>
</div>
<div> Max Max/Min Avg Total</div>
<div>Time (sec): 1.447e+01 1.000 1.447e+01</div>
<div>Objects: 1.229e+03 1.003 1.226e+03</div>
<div>Flops: 5.053e+09 1.217 4.593e+09 1.837e+10</div>
<div>Flops/sec: 3.492e+08 1.217 3.174e+08 1.269e+09</div>
<div>MPI Msg Count: 1.977e+04 1.067 1.895e+04 7.580e+04</div>
<div>MPI Msg Len (bytes): 7.374e+07 1.088 3.727e+03 2.825e+08</div>
<div>MPI Reductions: 2.065e+03 1.000</div>
<div><br>
</div>
<div>Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)</div>
<div> e.g., VecAXPY() for real vectors of length N --> 2N flops</div>
<div> and VecAXPY() for complex vectors of length N --> 8N flops</div>
<div><br>
</div>
<div>Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --</div>
<div> Avg %Total Avg %Total Count %Total Avg %Total Count %Total</div>
<div> 0: Main Stage: 1.4471e+01 100.0% 1.8371e+10 100.0% 7.580e+04 100.0% 3.727e+03 100.0% 2.046e+03 99.1%</div>
<div><br>
</div>
<div>------------------------------------------------------------------------------------------------------------------------</div>
<div>See the 'Profiling' chapter of the users' manual for details on interpreting output.</div>
<div>Phase summary info:</div>
<div> Count: number of times phase was executed</div>
<div> Time and Flop: Max - maximum over all processors</div>
<div> Ratio - ratio of maximum to minimum over all processors</div>
<div> Mess: number of messages sent</div>
<div> AvgLen: average message length (bytes)</div>
<div> Reduct: number of global reductions</div>
<div> Global: entire computation</div>
<div> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().</div>
<div> %T - percent time in this phase %F - percent flop in this phase</div>
<div> %M - percent messages in this phase %L - percent message lengths in this phase</div>
<div> %R - percent reductions in this phase</div>
<div> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)</div>
<div> GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)</div>
<div> CpuToGpu Count: total number of CPU to GPU copies per processor</div>
<div> CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)</div>
<div> GpuToCpu Count: total number of GPU to CPU copies per processor</div>
<div> GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)</div>
<div> GPU %F: percent flops on GPU in this event</div>
<div>------------------------------------------------------------------------------------------------------------------------</div>
<div>Event Count Time (sec) Flop --- Global --- --- Stage ---- Total</div>
<div> GPU - CpuToGpu - - GpuToCpu - GPU</div>
<div><br>
</div>
<div> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s</div>
<div> Mflop/s Count Size Count Size %F</div>
<div><br>
</div>
<div>------------------------------------------------------------------------------------------------------------------------</div>
<div>---------------------------------------</div>
<div><br>
</div>
<div><br>
</div>
<div>--- Event Stage 0: Main Stage</div>
<div><br>
</div>
<div>BuildTwoSided 257 1.0 nan nan 0.00e+00 0.0 4.4e+02 8.0e+00 2.6e+02 1 0 1 0 12 1 0 1 0 13 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>BuildTwoSidedF 210 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 2.1e+02 1 0 0 2 10 1 0 0 2 10 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 10 0 0 0 0 10 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFSetGraph 69 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFSetUp 47 1.0 nan nan 0.00e+00 0.0 7.3e+02 2.1e+03 4.7e+01 0 0 1 1 2 0 0 1 1 2 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFBcastBegin 222 1.0 nan nan 0.00e+00 0.0 2.3e+03 1.9e+04 0.0e+00 0 0 3 16 0 0 0 3 16 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFBcastEnd 222 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFReduceBegin 254 1.0 nan nan 0.00e+00 0.0 1.5e+03 1.2e+04 0.0e+00 0 0 2 6 0 0 0 2 6 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFReduceEnd 254 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFFetchOpBegin 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFFetchOpEnd 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFPack 8091 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFUnpack 8092 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecDot 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecMDot 398 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+02 0 0 0 0 19 0 0 0 0 19 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecNorm 641 1.0 nan nan 4.45e+07 1.2 0.0e+00 0.0e+00 6.4e+02 1 1 0 0 31 1 1 0 0 31 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecScale 601 1.0 nan nan 2.08e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecCopy 3735 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecSet 2818 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecAXPY 123 1.0 nan nan 8.68e+06 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecAYPX 6764 1.0 nan nan 1.90e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecAXPBYCZ 2388 1.0 nan nan 1.83e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecWAXPY 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecMAXPY 681 1.0 nan nan 1.36e+08 1.2 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecAssemblyBegin 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecAssemblyEnd 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecPointwiseMult 4449 1.0 nan nan 6.06e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecScatterBegin 7614 1.0 nan nan 0.00e+00 0.0 7.1e+04 2.9e+03 1.3e+01 0 0 94 73 1 0 0 94 73 1 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecScatterEnd 7614 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecReduceArith 120 1.0 nan nan 8.60e+06 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecReduceComm 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecNormalize 401 1.0 nan nan 4.09e+07 1.2 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 19 0 1 0 0 20 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>TSStep 20 1.0 1.2908e+01 1.0 5.05e+09 1.2 7.6e+04 3.7e+03 2.0e+03 89 100 100 98 96 89 100 100 98 97 1423</div>
<div> -nan 0 0.00e+00 0 0.00e+00 99</div>
<div><br>
</div>
<div>TSFunctionEval 140 1.0 nan nan 1.00e+07 1.2 1.1e+03 3.7e+04 0.0e+00 1 0 1 15 0 1 0 1 15 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>TSJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 87</div>
<div><br>
</div>
<div>MatMult 4934 1.0 nan nan 4.16e+09 1.2 5.1e+04 2.7e+03 4.0e+00 15 82 68 49 0 15 82 68 49 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>MatMultAdd 1104 1.0 nan nan 9.00e+07 1.2 8.8e+03 1.4e+02 0.0e+00 1 2 12 0 0 1 2 12 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>MatMultTranspose 1104 1.0 nan nan 9.01e+07 1.2 8.8e+03 1.4e+02 1.0e+00 1 2 12 0 0 1 2 12 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>MatSolve 368 0.0 nan nan 3.57e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatSOR 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatLUFactorSym 2 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatLUFactorNum 2 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatConvert 8 1.0 nan nan 0.00e+00 0.0 8.0e+01 1.2e+03 4.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatScale 66 1.0 nan nan 1.48e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 99</div>
<div><br>
</div>
<div>MatResidual 1104 1.0 nan nan 1.01e+09 1.2 1.2e+04 2.9e+03 0.0e+00 4 20 16 12 0 4 20 16 12 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>MatAssemblyBegin 590 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 2.0e+02 1 0 0 2 10 1 0 0 2 10 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatAssemblyEnd 590 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 1.4e+02 2 0 0 0 7 2 0 0 0 7 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatGetRowIJ 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatCreateSubMat 122 1.0 nan nan 0.00e+00 0.0 6.3e+01 1.8e+02 1.7e+02 2 0 0 0 8 2 0 0 0 8 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatGetOrdering 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatCoarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatZeroEntries 61 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatAXPY 6 1.0 nan nan 1.37e+06 1.2 0.0e+00 0.0e+00 1.8e+01 1 0 0 0 1 1 0 0 0 1 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatTranspose 6 1.0 nan nan 0.00e+00 0.0 2.2e+02 2.9e+04 4.8e+01 1 0 0 2 2 1 0 0 2 2 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatMatMultSym 4 1.0 nan nan 0.00e+00 0.0 2.2e+02 1.7e+03 2.8e+01 0 0 0 0 1 0 0 0 0 1 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatMatMultNum 4 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatPtAPSymbolic 5 1.0 nan nan 0.00e+00 0.0 6.2e+02 5.2e+03 4.4e+01 3 0 1 1 2 3 0 1 1 2 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatPtAPNumeric 181 1.0 nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 0.0e+00 56 0 4 21 0 56 0 4 21 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatGetLocalMat 185 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatSetValuesCOO 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>KSPSetUp 483 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 2.2e+01 0 0 0 0 1 0 0 0 0 1 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>KSPSolve 60 1.0 1.1843e+01 1.0 4.91e+09 1.2 7.3e+04 2.9e+03 1.2e+03 82 97 97 75 60 82 97 97 75 60 1506</div>
<div> -nan 0 0.00e+00 0 0.00e+00 99</div>
<div><br>
</div>
<div>KSPGMRESOrthog 398 1.0 nan nan 7.97e+07 1.2 0.0e+00 0.0e+00 4.0e+02 1 2 0 0 19 1 2 0 0 19 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>SNESSolve 60 1.0 1.2842e+01 1.0 5.01e+09 1.2 7.5e+04 3.6e+03 2.0e+03 89 99 100 96 95 89 99 100 96 96 1419</div>
<div> -nan 0 0.00e+00 0 0.00e+00 99</div>
<div><br>
</div>
<div>SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SNESFunctionEval 120 1.0 nan nan 3.01e+07 1.2 9.6e+02 3.7e+04 0.0e+00 1 1 1 13 0 1 1 1 13 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>SNESJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 87</div>
<div><br>
</div>
<div>SNESLineSearch 60 1.0 nan nan 6.99e+07 1.2 9.6e+02 1.9e+04 2.4e+02 1 1 1 6 12 1 1 1 6 12 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>PCSetUp_GAMG+ 60 1.0 nan nan 3.53e+07 1.2 5.2e+03 1.4e+04 4.3e+02 62 1 7 25 21 62 1 7 25 21 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 96</div>
<div><br>
</div>
<div> PCGAMGCreateG 3 1.0 nan nan 1.32e+06 1.2 2.2e+02 2.9e+04 4.2e+01 1 0 0 2 2 1 0 0 2 2 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div> GAMG Coarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02 1 0 1 0 6 1 0 1 0 6 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div> GAMG MIS/Agg 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div> PCGAMGProl 3 1.0 nan nan 0.00e+00 0.0 7.8e+01 7.8e+02 4.8e+01 0 0 0 0 2 0 0 0 0 2 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div> GAMG Prol-col 3 1.0 nan nan 0.00e+00 0.0 5.2e+01 5.8e+02 2.1e+01 0 0 0 0 1 0 0 0 0 1 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div> GAMG Prol-lift 3 1.0 nan nan 0.00e+00 0.0 2.6e+01 1.2e+03 1.5e+01 0 0 0 0 1 0 0 0 0 1 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div> PCGAMGOptProl 3 1.0 nan nan 3.40e+07 1.2 5.8e+02 2.4e+03 1.1e+02 1 1 1 0 6 1 1 1 0 6 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div> GAMG smooth 3 1.0 nan nan 2.85e+05 1.2 1.9e+02 1.9e+03 3.0e+01 0 0 0 0 1 0 0 0 0 1 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 43</div>
<div><br>
</div>
<div> PCGAMGCreateL 3 1.0 nan nan 0.00e+00 0.0 4.8e+02 6.5e+03 8.0e+01 3 0 1 1 4 3 0 1 1 4 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div> GAMG PtAP 3 1.0 nan nan 0.00e+00 0.0 4.5e+02 7.1e+03 2.7e+01 3 0 1 1 1 3 0 1 1 1 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div> GAMG Reduce 1 1.0 nan nan 0.00e+00 0.0 3.6e+01 3.7e+01 5.3e+01 0 0 0 0 3 0 0 0 0 3 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>PCGAMG Gal l00 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.4e+04 9.0e+00 46 0 1 6 0 46 0 1 6 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>PCGAMG Opt l00 1 1.0 nan nan 0.00e+00 0.0 4.8e+01 1.7e+02 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>PCGAMG Gal l01 60 1.0 nan nan 0.00e+00 0.0 1.6e+03 2.9e+04 9.0e+00 13 0 2 16 0 13 0 2 16 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>PCGAMG Opt l01 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 4.8e+03 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>PCGAMG Gal l02 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.2e+03 1.7e+01 0 0 1 0 1 0 0 1 0 1 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>PCGAMG Opt l02 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 2.2e+02 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>PCSetUp 182 1.0 nan nan 3.53e+07 1.2 5.3e+03 1.4e+04 7.7e+02 64 1 7 27 37 64 1 7 27 38 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 96</div>
<div><br>
</div>
<div>PCSetUpOnBlocks 368 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>PCApply 60 1.0 nan nan 4.85e+09 1.2 7.3e+04 2.9e+03 1.1e+03 81 96 96 75 54 81 96 96 75 54 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 99</div>
<div><br>
</div>
<div>KSPSolve_FS_0 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>KSPSolve_FS_1 60 1.0 nan nan 4.79e+09 1.2 7.2e+04 2.9e+03 1.1e+03 81 95 96 75 54 81 95 96 75 54 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div><br>
</div>
<div>--- Event Stage 1: Unknown</div>
<div><br>
</div>
<div>------------------------------------------------------------------------------------------------------------------------</div>
<div>---------------------------------------</div>
<div><br>
</div>
<div><br>
</div>
<div>Object Type Creations Destructions. Reports information only for process 0.</div>
<div><br>
</div>
<div>--- Event Stage 0: Main Stage</div>
<div><br>
</div>
<div> Container 14 14</div>
<div> Distributed Mesh 9 9</div>
<div> Index Set 120 120</div>
<div> IS L to G Mapping 10 10</div>
<div> Star Forest Graph 87 87</div>
<div> Discrete System 9 9</div>
<div> Weak Form 9 9</div>
<div> Vector 761 761</div>
<div> TSAdapt 1 1</div>
<div> TS 1 1</div>
<div> DMTS 1 1</div>
<div> SNES 1 1</div>
<div> DMSNES 3 3</div>
<div> SNESLineSearch 1 1</div>
<div> Krylov Solver 11 11</div>
<div> DMKSP interface 1 1</div>
<div> Matrix 171 171</div>
<div> Matrix Coarsen 3 3</div>
<div> Preconditioner 11 11</div>
<div> Viewer 2 1</div>
<div> PetscRandom 3 3</div>
<div><br>
</div>
<div>--- Event Stage 1: Unknown</div>
<div><br>
</div>
<div>========================================================================================================================</div>
<div>Average time to get PetscTime(): 3.82e-08</div>
<div>Average time for MPI_Barrier(): 2.2968e-06</div>
<div>Average time for zero size MPI_Send(): 3.371e-06</div>
<div>#PETSc Option Table entries:</div>
<div>-log_view</div>
<div>#End of PETSc Option Table entries</div>
<div>Compiled without FORTRAN kernels</div>
<div>Compiled with 64 bit PetscInt</div>
<div>Compiled with full precision matrices (default)</div>
<div>sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8</div>
<div>Configure options: PETSC_DIR=/home2/4pf/petsc PETSC_ARCH=arch-kokkos-serial --prefix=/home2/4pf/.local/serial --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --with-cuda=0 --with-shared-libraries --with-64-bit-indices
--with-debugging=0 --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --with-kokkos-dir=/home2/4pf/.local/serial --with-kokkos-kernels-dir=/home2/4pf/.local/serial --download-f2cblaslapack</div>
<div><br>
</div>
<div>-----------------------------------------</div>
<div>Libraries compiled on 2023-01-06 18:21:31 on iguazu </div>
<div>Machine characteristics: Linux-4.18.0-383.el8.x86_64-x86_64-with-glibc2.28</div>
<div>Using PETSc directory: /home2/4pf/.local/serial</div>
<div>Using PETSc arch: </div>
<div>-----------------------------------------</div>
<div><br>
</div>
<div>Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -O3
</div>
<div>-----------------------------------------</div>
<div><br>
</div>
<div>Using include paths: -I/home2/4pf/.local/serial/include</div>
<div>-----------------------------------------</div>
<div><br>
</div>
<div>Using C linker: mpicc</div>
<div>Using libraries: -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib -lpetsc -Wl,-rpath,/home2/4pf/.local/serial/lib64 -L/home2/4pf/.local/serial/lib64 -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib
-lkokkoskernels -lkokkoscontainers -lkokkoscore -lf2clapack -lf2cblas -lm -lX11 -lquadmath -lstdc++ -ldl</div>
<div>-----------------------------------------</div>
<div><br>
</div>
<div><br>
</div>
<div>---</div>
<br>
</span></div>
<div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_7509031262052082599Signature">
<div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-size:11pt"><strong>Philip Fackler<br>
</strong></span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<div><span style="font-size:11pt">Research Software Engineer, Application Engineering Group</span></div>
<div><span style="font-size:11pt">Advanced Computing Systems Research Section</span></div>
<div><span style="font-size:11pt">Computer Science and Mathematics Division<br>
</span></div>
<div><span style="font-size:11pt"><strong>Oak Ridge National Laboratory</strong></span><span style="font-size:11pt"></span><br>
</div>
</div>
</div>
</div>
</div>
<div id="m_7509031262052082599appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_7509031262052082599divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Zhang, Junchao <<a href="mailto:jczhang@mcs.anl.gov" target="_blank">jczhang@mcs.anl.gov</a>><br>
<b>Sent:</b> Tuesday, January 17, 2023 17:25<br>
<b>To:</b> Fackler, Philip <<a href="mailto:facklerpw@ornl.gov" target="_blank">facklerpw@ornl.gov</a>>; <a href="mailto:xolotl-psi-development@lists.sourceforge.net" target="_blank">xolotl-psi-development@lists.sourceforge.net</a> <<a href="mailto:xolotl-psi-development@lists.sourceforge.net" target="_blank">xolotl-psi-development@lists.sourceforge.net</a>>; <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><br>
<b>Cc:</b> Mills, Richard Tran <<a href="mailto:rtmills@anl.gov" target="_blank">rtmills@anl.gov</a>>; Blondel, Sophie <<a href="mailto:sblondel@utk.edu" target="_blank">sblondel@utk.edu</a>>; Roth, Philip <<a href="mailto:rothpc@ornl.gov" target="_blank">rothpc@ornl.gov</a>><br>
<b>Subject:</b> [EXTERNAL] Re: Performance problem using COO interface</font>
<div> </div>
</div>
<div dir="ltr">
<div><span style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">Hi, Philip,</span></div>
<div><span style="background-color:rgb(255,255,255)"><font color="#000000" face="Calibri, Arial, Helvetica, sans-serif"><span style="font-size:12pt"> Could you add -log_view and see what functions are used in the solve? Since it is
CPU-only, </span>perhaps<span style="font-size:12pt"> with -log_view of different runs, we can easily see which functions slowed down.</span></font></span></div>
<div><span style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)"><br>
</span></div>
<div><span style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">--Junchao Zhang</span></div>
<div id="m_7509031262052082599x_appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_7509031262052082599x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Fackler, Philip <<a href="mailto:facklerpw@ornl.gov" target="_blank">facklerpw@ornl.gov</a>><br>
<b>Sent:</b> Tuesday, January 17, 2023 4:13 PM<br>
<b>To:</b> <a href="mailto:xolotl-psi-development@lists.sourceforge.net" target="_blank">xolotl-psi-development@lists.sourceforge.net</a> <<a href="mailto:xolotl-psi-development@lists.sourceforge.net" target="_blank">xolotl-psi-development@lists.sourceforge.net</a>>; <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>><br>
<b>Cc:</b> Mills, Richard Tran <<a href="mailto:rtmills@anl.gov" target="_blank">rtmills@anl.gov</a>>; Zhang, Junchao <<a href="mailto:jczhang@mcs.anl.gov" target="_blank">jczhang@mcs.anl.gov</a>>; Blondel, Sophie <<a href="mailto:sblondel@utk.edu" target="_blank">sblondel@utk.edu</a>>; Roth, Philip <<a href="mailto:rothpc@ornl.gov" target="_blank">rothpc@ornl.gov</a>><br>
<b>Subject:</b> Performance problem using COO interface</font>
<div> </div>
</div>
<div dir="ltr">
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<span style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">In Xolotl's feature-petsc-kokkos branch I have ported the code to use petsc's COO interface for creating the Jacobian matrix (and the Kokkos
interface for interacting with Vec entries). As the attached plots show for one case, while the code for computing the RHSFunction and RHSJacobian perform similarly (or slightly better) after the port, the performance for the solve as a whole is significantly
worse.</span></div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<span style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)"><br>
</span></div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<span style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">Note:</span></div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<span style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">This is all CPU-only (so kokkos and kokkos-kernels are built with only the serial backend).<br>
</span></div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<span style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">The dev version is using MatSetValuesStencil with the default implementations for Mat and Vec.</span></div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<span style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">The port version is using MatSetValuesCOO and is run with
<code>-dm_mat_type aijkokkos -dm_vec_type kokkos</code>.<br>
</span></div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<span style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">The port/def version is using MatSetValuesCOO and is run with
<code>-dm_vec_type kokkos</code> (using the default Mat implementation).<br>
</span></div>
<div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0)">
So, this seems to be due be a performance difference in the petsc implementations. Please advise. Is this a known issue? Or am I missing something?</div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0)">
Thank you for the help,<br>
</div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_7509031262052082599x_x_Signature">
<div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-size:11pt"><strong>Philip Fackler<br>
</strong></span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<div><span style="font-size:11pt">Research Software Engineer, Application Engineering Group</span></div>
<div><span style="font-size:11pt">Advanced Computing Systems Research Section</span></div>
<div><span style="font-size:11pt">Computer Science and Mathematics Division<br>
</span></div>
<div><span style="font-size:11pt"><strong>Oak Ridge National Laboratory</strong></span><span style="font-size:11pt"></span><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div></blockquote></div>