<div dir="ltr">Maybe Philip could narrow this down by using not GMRES/SOR solvers?<div>Try GMRES/jacobi</div><div>Try bicg/sor</div><div>If one of those fixes the problem it might help or at least get Philip moving.</div><div><br></div><div>Mark</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Dec 1, 2022 at 5:06 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com">junchao.zhang@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi, Philip,<div> Sorry for the long delay. I could not get something useful from the -log_view output. Since I have already built xolotl, could you give me instructions on how to do a xolotl test to reproduce the divergence with petsc GPU backends (but fine on CPU)?</div><div> Thank you.</div><div><div><div dir="ltr"><div dir="ltr">--Junchao Zhang</div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <<a href="mailto:facklerpw@ornl.gov" target="_blank">facklerpw@ornl.gov</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>
<div dir="ltr">
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------
<div><br>
</div>
<div>Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 2022</div>
<div>Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date: 2022-10-28 14:39:41 +0000</div>
<div><br>
</div>
<div> Max Max/Min Avg Total</div>
<div>Time (sec): 6.023e+00 1.000 6.023e+00</div>
<div>Objects: 1.020e+02 1.000 1.020e+02</div>
<div>Flops: 1.080e+09 1.000 1.080e+09 1.080e+09</div>
<div>Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08</div>
<div>MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00</div>
<div>MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00</div>
<div>MPI Reductions: 0.000e+00 0.000</div>
<div><br>
</div>
<div>Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)</div>
<div> e.g., VecAXPY() for real vectors of length N --> 2N flops</div>
<div> and VecAXPY() for complex vectors of length N --> 8N flops</div>
<div><br>
</div>
<div>Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions --</div>
<div> Avg %Total Avg %Total Count %Total Avg %Total Count %Total</div>
<div> 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%</div>
<div><br>
</div>
<div>------------------------------------------------------------------------------------------------------------------------</div>
<div>See the 'Profiling' chapter of the users' manual for details on interpreting output.</div>
<div>Phase summary info:</div>
<div> Count: number of times phase was executed</div>
<div> Time and Flop: Max - maximum over all processors</div>
<div> Ratio - ratio of maximum to minimum over all processors</div>
<div> Mess: number of messages sent</div>
<div> AvgLen: average message length (bytes)</div>
<div> Reduct: number of global reductions</div>
<div> Global: entire computation</div>
<div> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().</div>
<div> %T - percent time in this phase %F - percent flop in this phase</div>
<div> %M - percent messages in this phase %L - percent message lengths in this phase</div>
<div> %R - percent reductions in this phase</div>
<div> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)</div>
<div> GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)</div>
<div> CpuToGpu Count: total number of CPU to GPU copies per processor</div>
<div> CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)</div>
<div> GpuToCpu Count: total number of GPU to CPU copies per processor</div>
<div> GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)</div>
<div> GPU %F: percent flops on GPU in this event</div>
<div>------------------------------------------------------------------------------------------------------------------------</div>
<div>Event Count Time (sec) Flop --- Global --- --- Stage ---- Total</div>
<div> GPU - CpuToGpu - - GpuToCpu - GPU</div>
<div><br>
</div>
<div> Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s</div>
<div> Mflop/s Count Size Count Size %F</div>
<div><br>
</div>
<div>------------------------------------------------------------------------------------------------------------------------</div>
<div>---------------------------------------</div>
<div><br>
</div>
<div><br>
</div>
<div>--- Event Stage 0: Main Stage</div>
<div><br>
</div>
<div>BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan</div>
<div> -nan 2 5.14e-03 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 184</div>
<div> -nan 2 5.14e-03 0 0.00e+00 54</div>
<div><br>
</div>
<div>TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan</div>
<div> -nan 1 3.36e-04 0 0.00e+00 100</div>
<div><br>
</div>
<div>TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 97</div>
<div><br>
</div>
<div>MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602</div>
<div> -nan 1 4.80e-03 0 0.00e+00 46</div>
<div><br>
</div>
<div>KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 95 99 0 0 0 95 99 0 0 0 188</div>
<div> -nan 1 4.80e-03 0 0.00e+00 53</div>
<div><br>
</div>
<div>SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 97</div>
<div><br>
</div>
<div>SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 100</div>
<div><br>
</div>
<div>PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan</div>
<div> -nan 1 4.80e-03 0 0.00e+00 19</div>
<div><br>
</div>
<div>KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div>KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan</div>
<div> -nan 0 0.00e+00 0 0.00e+00 0</div>
<div><br>
</div>
<div><br>
</div>
<div>--- Event Stage 1: Unknown</div>
<div><br>
</div>
<div>------------------------------------------------------------------------------------------------------------------------</div>
<div>---------------------------------------</div>
<div><br>
</div>
<div><br>
</div>
<div>Object Type Creations Destructions. Reports information only for process 0.</div>
<div><br>
</div>
<div>--- Event Stage 0: Main Stage</div>
<div><br>
</div>
<div> Container 5 5</div>
<div> Distributed Mesh 2 2</div>
<div> Index Set 11 11</div>
<div> IS L to G Mapping 1 1</div>
<div> Star Forest Graph 7 7</div>
<div> Discrete System 2 2</div>
<div> Weak Form 2 2</div>
<div> Vector 49 49</div>
<div> TSAdapt 1 1</div>
<div> TS 1 1</div>
<div> DMTS 1 1</div>
<div> SNES 1 1</div>
<div> DMSNES 3 3</div>
<div> SNESLineSearch 1 1</div>
<div> Krylov Solver 4 4</div>
<div> DMKSP interface 1 1</div>
<div> Matrix 4 4</div>
<div> Preconditioner 4 4</div>
<div> Viewer 2 1</div>
<div><br>
</div>
<div>--- Event Stage 1: Unknown</div>
<div><br>
</div>
<div>========================================================================================================================</div>
<div>Average time to get PetscTime(): 3.14e-08</div>
<div>#PETSc Option Table entries:</div>
<div>-log_view</div>
<div>-log_view_gpu_times</div>
<div>#End of PETSc Option Table entries</div>
<div>Compiled without FORTRAN kernels</div>
<div>Compiled with 64 bit PetscInt</div>
<div>Compiled with full precision matrices (default)</div>
<div>sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8</div>
<div>Configure options: PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries --prefix=/home/4pf/build/petsc/cuda-no-tpls/install
--with-64-bit-indices --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install</div>
<div><br>
</div>
<div>-----------------------------------------</div>
<div>Libraries compiled on 2022-11-01 21:01:08 on PC0115427
</div>
<div>Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35</div>
<div>Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install</div>
<div>Using PETSc arch: </div>
<div>-----------------------------------------</div>
<div><br>
</div>
<div>Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -O3
</div>
<div>-----------------------------------------</div>
<div><br>
</div>
<div>Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include</div>
<div>-----------------------------------------</div>
<div><br>
</div>
<div>Using C linker: mpicc</div>
<div>Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib
-Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib -L/home/4pf/build/kokkos/cuda/install/lib -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers -lkokkoscore -llapack -lblas
-lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl</div>
<div>-----------------------------------------</div>
<br>
</div>
<div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_9067613147681362154m_3508739715783367517Signature">
<div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-size:11pt"><strong>Philip Fackler<br>
</strong></span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<div><span style="font-size:11pt">Research Software Engineer, Application Engineering Group</span></div>
<div><span style="font-size:11pt">Advanced Computing Systems Research Section</span></div>
<div><span style="font-size:11pt">Computer Science and Mathematics Division<br>
</span></div>
<div><span style="font-size:11pt"><strong>Oak Ridge National Laboratory</strong></span><span style="font-size:11pt"></span><br>
</div>
</div>
</div>
</div>
</div>
<div id="m_9067613147681362154m_3508739715783367517appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_9067613147681362154m_3508739715783367517divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> Tuesday, November 15, 2022 13:03<br>
<b>To:</b> Fackler, Philip <<a href="mailto:facklerpw@ornl.gov" target="_blank">facklerpw@ornl.gov</a>><br>
<b>Cc:</b> <a href="mailto:xolotl-psi-development@lists.sourceforge.net" target="_blank">xolotl-psi-development@lists.sourceforge.net</a> <<a href="mailto:xolotl-psi-development@lists.sourceforge.net" target="_blank">xolotl-psi-development@lists.sourceforge.net</a>>; <a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Blondel, Sophie <<a href="mailto:sblondel@utk.edu" target="_blank">sblondel@utk.edu</a>>; Roth, Philip <<a href="mailto:rothpc@ornl.gov" target="_blank">rothpc@ornl.gov</a>><br>
<b>Subject:</b> Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.</font>
<div> </div>
</div>
<div>
<div dir="ltr">Can you paste -log_view result so I can see what functions are used?
<div>
<div><br clear="all">
<div>
<div dir="ltr">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
</div>
</div>
<br>
<div>
<div dir="ltr">On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <<a href="mailto:facklerpw@ornl.gov" target="_blank">facklerpw@ornl.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Yes, most (but not all) of our system test cases fail with the kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos backend.<br>
</div>
<div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_9067613147681362154m_3508739715783367517x_m_-5191598488002563482Signature">
<div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-size:11pt"><strong>Philip Fackler<br>
</strong></span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<div><span style="font-size:11pt">Research Software Engineer, Application Engineering Group</span></div>
<div><span style="font-size:11pt">Advanced Computing Systems Research Section</span></div>
<div><span style="font-size:11pt">Computer Science and Mathematics Division<br>
</span></div>
<div><span style="font-size:11pt"><strong>Oak Ridge National Laboratory</strong></span><span style="font-size:11pt"></span><br>
</div>
</div>
</div>
</div>
</div>
<div id="m_9067613147681362154m_3508739715783367517x_m_-5191598488002563482appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="m_9067613147681362154m_3508739715783367517x_m_-5191598488002563482divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" target="_blank">junchao.zhang@gmail.com</a>><br>
<b>Sent:</b> Monday, November 14, 2022 19:34<br>
<b>To:</b> Fackler, Philip <<a href="mailto:facklerpw@ornl.gov" target="_blank">facklerpw@ornl.gov</a>><br>
<b>Cc:</b> <a href="mailto:xolotl-psi-development@lists.sourceforge.net" target="_blank">
xolotl-psi-development@lists.sourceforge.net</a> <<a href="mailto:xolotl-psi-development@lists.sourceforge.net" target="_blank">xolotl-psi-development@lists.sourceforge.net</a>>;
<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a> <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>>; Blondel, Sophie <<a href="mailto:sblondel@utk.edu" target="_blank">sblondel@utk.edu</a>>;
Zhang, Junchao <<a href="mailto:jczhang@mcs.anl.gov" target="_blank">jczhang@mcs.anl.gov</a>>; Roth, Philip <<a href="mailto:rothpc@ornl.gov" target="_blank">rothpc@ornl.gov</a>><br>
<b>Subject:</b> [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.</font>
<div> </div>
</div>
<div>
<div dir="ltr">Hi, Philip,<br>
Sorry to hear that. It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right?
<div style="border:0px;font-variant-numeric:inherit;font-variant-east-asian:inherit;font-stretch:inherit;font-size:12pt;line-height:inherit;font-family:Calibri,Arial,Helvetica,sans-serif;margin:0px;padding:0px;vertical-align:baseline;color:black">
<br>
</div>
<div>
<div dir="ltr">
<div dir="ltr">--Junchao Zhang</div>
</div>
</div>
<br>
</div>
<br>
<div>
<div dir="ltr">On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <<a href="mailto:petsc-users@mcs.anl.gov" target="_blank">petsc-users@mcs.anl.gov</a>> wrote:<br>
</div>
<blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out.</div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<br>
</div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the
system test parameter files with the ones from the "feature-petsc-kokkos" branch. See
<a title="hxxps://github.com/ORNL-Fusion/xolotl/tree/feature-petsc-kokkos/benchmarks">
here</a> the files that begin with "params_system_".</div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<br>
</div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem.<br>
</div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<br>
</div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Any help would be appreciated.</div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
<br>
</div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0);background-color:rgb(255,255,255)">
Thanks,<br>
</div>
<div>
<div style="font-family:Consolas,Courier,monospace;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="m_9067613147681362154m_3508739715783367517x_m_-5191598488002563482x_m_-25761908020450439Signature">
<div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-size:11pt"><strong>Philip Fackler<br>
</strong></span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<div><span style="font-size:11pt">Research Software Engineer, Application Engineering Group</span></div>
<div><span style="font-size:11pt">Advanced Computing Systems Research Section</span></div>
<div><span style="font-size:11pt">Computer Science and Mathematics Division<br>
</span></div>
<div><span style="font-size:11pt"><strong>Oak Ridge National Laboratory</strong></span><span style="font-size:11pt"></span><br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div></blockquote></div>
</blockquote></div>