[petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device.

Junchao Zhang junchao.zhang at gmail.com
Tue Feb 7 15:26:03 CST 2023


Hi, Philip,
  I believe this MR https://gitlab.com/petsc/petsc/-/merge_requests/6030
would fix the problem.  It is a fix to petsc/release, but you can
cherry-pick it to petsc/main.
  Could you try that in your case?
  Thanks.
--Junchao Zhang


On Fri, Jan 20, 2023 at 11:31 AM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> Sorry, no progress. I guess that is because a vector was gotten but not
> restored (e.g., VecRestoreArray() etc), causing host and device data not
> synced.  Maybe in your code, or in petsc code.
> After the ECP AM, I will have more time on this bug.
> Thanks.
>
> --Junchao Zhang
>
>
> On Fri, Jan 20, 2023 at 11:00 AM Fackler, Philip <facklerpw at ornl.gov>
> wrote:
>
>> Any progress on this? Any info/help needed?
>>
>> Thanks,
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>> ------------------------------
>> *From:* Fackler, Philip <facklerpw at ornl.gov>
>> *Sent:* Thursday, December 8, 2022 09:07
>> *To:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Cc:* xolotl-psi-development at lists.sourceforge.net <
>> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth,
>> Philip <rothpc at ornl.gov>
>> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
>> Vec diverging when running on CUDA device.
>>
>> Great! Thank you!
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>> ------------------------------
>> *From:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Sent:* Wednesday, December 7, 2022 18:47
>> *To:* Fackler, Philip <facklerpw at ornl.gov>
>> *Cc:* xolotl-psi-development at lists.sourceforge.net <
>> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth,
>> Philip <rothpc at ornl.gov>
>> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
>> Vec diverging when running on CUDA device.
>>
>> Hi, Philip,
>>  I could reproduce the error. I need to find a  way to debug it.  Thanks.
>>
>> /home/jczhang/xolotl/test/system/SystemTestCase.cpp(317): fatal error: in
>> "System/PSI_1": absolute value of diffNorm{0.19704848134353209} exceeds
>> 1e-10
>> *** 1 failure is detected in the test module "Regression"
>>
>>
>> --Junchao Zhang
>>
>>
>> On Tue, Dec 6, 2022 at 10:10 AM Fackler, Philip <facklerpw at ornl.gov>
>> wrote:
>>
>> I think it would be simpler to use the develop branch for this issue. But
>> you can still just build the SystemTester. Then (if you changed the PSI_1
>> case) run:
>>
>>  ./test/system/SystemTester -t System/PSI_1 -- -v​
>>
>> (No need for multiple MPI ranks)
>>
>> Thanks,
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>> ------------------------------
>> *From:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Sent:* Monday, December 5, 2022 15:40
>> *To:* Fackler, Philip <facklerpw at ornl.gov>
>> *Cc:* xolotl-psi-development at lists.sourceforge.net <
>> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth,
>> Philip <rothpc at ornl.gov>
>> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
>> Vec diverging when running on CUDA device.
>>
>> I configured with xolotl branch feature-petsc-kokkos, and typed `make`
>> under ~/xolotl-build/.  Though there were errors,  a lot of *Tester were
>> built.
>>
>> [ 62%] Built target xolotlViz
>> [ 63%] Linking CXX executable TemperatureProfileHandlerTester
>> [ 64%] Linking CXX executable TemperatureGradientHandlerTester
>> [ 64%] Built target TemperatureProfileHandlerTester
>> [ 64%] Built target TemperatureConstantHandlerTester
>> [ 64%] Built target TemperatureGradientHandlerTester
>> [ 65%] Linking CXX executable HeatEquationHandlerTester
>> [ 65%] Built target HeatEquationHandlerTester
>> [ 66%] Linking CXX executable FeFitFluxHandlerTester
>> [ 66%] Linking CXX executable W111FitFluxHandlerTester
>> [ 67%] Linking CXX executable FuelFitFluxHandlerTester
>> [ 67%] Linking CXX executable W211FitFluxHandlerTester
>>
>> Which Tester should I use to run with the parameter file
>> benchmarks/params_system_PSI_2.txt? And how many ranks should I use?
>> Could you give an example command line?
>> Thanks.
>>
>> --Junchao Zhang
>>
>>
>> On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>> Hello, Philip,
>>    Do I still need to use the feature-petsc-kokkos branch?
>> --Junchao Zhang
>>
>>
>> On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <facklerpw at ornl.gov>
>> wrote:
>>
>> Junchao,
>>
>> Thank you for working on this. If you open the parameter file for, say,
>> the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type
>> aijkokkos -dm_vec_type kokkos​` to the "petscArgs=" field (or the
>> corresponding cusparse/cuda option).
>>
>> Thanks,
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>> ------------------------------
>> *From:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Sent:* Thursday, December 1, 2022 17:05
>> *To:* Fackler, Philip <facklerpw at ornl.gov>
>> *Cc:* xolotl-psi-development at lists.sourceforge.net <
>> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth,
>> Philip <rothpc at ornl.gov>
>> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
>> Vec diverging when running on CUDA device.
>>
>> Hi, Philip,
>>   Sorry for the long delay.  I could not get something useful from the
>> -log_view output.  Since I have already built xolotl, could you give me
>> instructions on how to do a xolotl test to reproduce the divergence with
>> petsc GPU backends (but fine on CPU)?
>>   Thank you.
>> --Junchao Zhang
>>
>>
>> On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <facklerpw at ornl.gov>
>> wrote:
>>
>> ------------------------------------------------------------------ PETSc
>> Performance Summary:
>> ------------------------------------------------------------------
>>
>> Unknown Name on a  named PC0115427 with 1 processor, by 4pf Wed Nov 16
>> 14:36:46 2022
>> Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date:
>> 2022-10-28 14:39:41 +0000
>>
>>                          Max       Max/Min     Avg       Total
>> Time (sec):           6.023e+00     1.000   6.023e+00
>> Objects:              1.020e+02     1.000   1.020e+02
>> Flops:                1.080e+09     1.000   1.080e+09  1.080e+09
>> Flops/sec:            1.793e+08     1.000   1.793e+08  1.793e+08
>> MPI Msg Count:        0.000e+00     0.000   0.000e+00  0.000e+00
>> MPI Msg Len (bytes):  0.000e+00     0.000   0.000e+00  0.000e+00
>> MPI Reductions:       0.000e+00     0.000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>>                             e.g., VecAXPY() for real vectors of length N
>> --> 2N flops
>>                             and VecAXPY() for complex vectors of length N
>> --> 8N flops
>>
>> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
>> ---  -- Message Lengths --  -- Reductions --
>>                         Avg     %Total     Avg     %Total    Count
>> %Total     Avg         %Total    Count   %Total
>>  0:      Main Stage: 6.0226e+00 100.0%  1.0799e+09 100.0%  0.000e+00
>> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>>    Count: number of times phase was executed
>>    Time and Flop: Max - maximum over all processors
>>                   Ratio - ratio of maximum to minimum over all processors
>>    Mess: number of messages sent
>>    AvgLen: average message length (bytes)
>>    Reduct: number of global reductions
>>    Global: entire computation
>>    Stage: stages of a computation. Set stages with PetscLogStagePush()
>> and PetscLogStagePop().
>>       %T - percent time in this phase         %F - percent flop in this
>> phase
>>       %M - percent messages in this phase     %L - percent message
>> lengths in this phase
>>       %R - percent reductions in this phase
>>    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
>> over all processors)
>>    GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU
>> time over all processors)
>>    CpuToGpu Count: total number of CPU to GPU copies per processor
>>    CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per
>> processor)
>>    GpuToCpu Count: total number of GPU to CPU copies per processor
>>    GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per
>> processor)
>>    GPU %F: percent flops on GPU in this event
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Event                Count      Time (sec)     Flop
>>        --- Global ---  --- Stage ----  Total
>>    GPU    - CpuToGpu -   - GpuToCpu - GPU
>>
>>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
>>  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>>  Mflop/s Count   Size   Count   Size  %F
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> ---------------------------------------
>>
>>
>> --- Event Stage 0: Main Stage
>>
>> BuildTwoSided          3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> DMCreateMat            1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> SFSetGraph             3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> SFSetUp                3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> SFPack              4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> SFUnpack            4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> VecDot               190 1.0   nan nan 2.11e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> VecMDot              775 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> VecNorm             1728 1.0   nan nan 1.92e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  2  0  0  0   0  2  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> VecScale            1983 1.0   nan nan 6.24e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> VecCopy              780 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> VecSet              4955 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  2  0  0  0  0   2  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> VecAXPY              190 1.0   nan nan 2.11e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> VecAYPX              597 1.0   nan nan 6.64e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> VecAXPBYCZ           643 1.0   nan nan 1.79e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  2  0  0  0   0  2  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> VecWAXPY             502 1.0   nan nan 5.58e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> VecMAXPY            1159 1.0   nan nan 3.68e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  3  0  0  0   0  3  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> VecScatterBegin     4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
>>     -nan      2 5.14e-03    0 0.00e+00  0
>>
>> VecScatterEnd       4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> VecReduceArith       380 1.0   nan nan 4.23e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> VecReduceComm        190 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> VecNormalize         965 1.0   nan nan 1.61e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> TSStep                20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00 97100  0  0  0  97100  0  0  0   184
>>     -nan      2 5.14e-03    0 0.00e+00 54
>>
>> TSFunctionEval       597 1.0   nan nan 6.64e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 63  1  0  0  0  63  1  0  0  0  -nan
>>     -nan      1 3.36e-04    0 0.00e+00 100
>>
>> TSJacobianEval       190 1.0   nan nan 3.37e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 24  3  0  0  0  24  3  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 97
>>
>> MatMult             1930 1.0   nan nan 4.46e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  1 41  0  0  0   1 41  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> MatMultTranspose       1 1.0   nan nan 3.44e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> MatSolve             965 1.0   nan nan 5.04e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  1  5  0  0  0   1  5  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> MatSOR               965 1.0   nan nan 3.33e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  4 31  0  0  0   4 31  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> MatLUFactorSym         1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> MatLUFactorNum       190 1.0   nan nan 1.16e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  1 11  0  0  0   1 11  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> MatScale             190 1.0   nan nan 3.26e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  0  3  0  0  0   0  3  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> MatAssemblyBegin     761 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> MatAssemblyEnd       761 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> MatGetRowIJ            1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> MatCreateSubMats     380 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> MatGetOrdering         1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> MatZeroEntries       379 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> MatSetPreallCOO        1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> MatSetValuesCOO      190 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> KSPSetUp             760 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> KSPSolve             190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 10 86  0  0  0  10 86  0  0  0  1602
>>     -nan      1 4.80e-03    0 0.00e+00 46
>>
>> KSPGMRESOrthog       775 1.0   nan nan 2.27e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00  1  2  0  0  0   1  2  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> SNESSolve             71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00 95 99  0  0  0  95 99  0  0  0   188
>>     -nan      1 4.80e-03    0 0.00e+00 53
>>
>> SNESSetUp              1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> SNESFunctionEval     573 1.0   nan nan 2.23e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 60  2  0  0  0  60  2  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> SNESJacobianEval     190 1.0   nan nan 3.37e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 24  3  0  0  0  24  3  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 97
>>
>> SNESLineSearch       190 1.0   nan nan 1.05e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 53 10  0  0  0  53 10  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00 100
>>
>> PCSetUp              570 1.0   nan nan 1.16e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2 11  0  0  0   2 11  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> PCApply              965 1.0   nan nan 6.14e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  8 57  0  0  0   8 57  0  0  0  -nan
>>     -nan      1 4.80e-03    0 0.00e+00 19
>>
>> KSPSolve_FS_0        965 1.0   nan nan 3.33e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  4 31  0  0  0   4 31  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>> KSPSolve_FS_1        965 1.0   nan nan 1.66e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00  2 15  0  0  0   2 15  0  0  0  -nan
>>     -nan      0 0.00e+00    0 0.00e+00  0
>>
>>
>> --- Event Stage 1: Unknown
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> ---------------------------------------
>>
>>
>> Object Type          Creations   Destructions. Reports information only
>> for process 0.
>>
>> --- Event Stage 0: Main Stage
>>
>>            Container     5              5
>>     Distributed Mesh     2              2
>>            Index Set    11             11
>>    IS L to G Mapping     1              1
>>    Star Forest Graph     7              7
>>      Discrete System     2              2
>>            Weak Form     2              2
>>               Vector    49             49
>>              TSAdapt     1              1
>>                   TS     1              1
>>                 DMTS     1              1
>>                 SNES     1              1
>>               DMSNES     3              3
>>       SNESLineSearch     1              1
>>        Krylov Solver     4              4
>>      DMKSP interface     1              1
>>               Matrix     4              4
>>       Preconditioner     4              4
>>               Viewer     2              1
>>
>> --- Event Stage 1: Unknown
>>
>>
>> ========================================================================================================================
>> Average time to get PetscTime(): 3.14e-08
>> #PETSc Option Table entries:
>> -log_view
>> -log_view_gpu_times
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with 64 bit PetscInt
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8 sizeof(PetscInt) 8
>> Configure options: PETSC_DIR=/home/4pf/repos/petsc
>> PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx
>> --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries
>> --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices
>> --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3
>> --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install
>> --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install
>>
>> -----------------------------------------
>> Libraries compiled on 2022-11-01 21:01:08 on PC0115427
>> Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
>> Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install
>> Using PETSc arch:
>> -----------------------------------------
>>
>> Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
>> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
>> -fvisibility=hidden -O3
>> -----------------------------------------
>>
>> Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include
>> -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include
>> -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include
>> -----------------------------------------
>>
>> Using C linker: mpicc
>> Using libraries:
>> -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib
>> -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc
>> -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib
>> -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib
>> -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib
>> -L/home/4pf/build/kokkos/cuda/install/lib
>> -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64
>> -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers
>> -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas
>> -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl
>> -----------------------------------------
>>
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>> ------------------------------
>> *From:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Sent:* Tuesday, November 15, 2022 13:03
>> *To:* Fackler, Philip <facklerpw at ornl.gov>
>> *Cc:* xolotl-psi-development at lists.sourceforge.net <
>> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth,
>> Philip <rothpc at ornl.gov>
>> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
>> Vec diverging when running on CUDA device.
>>
>> Can you paste -log_view result so I can see what functions are used?
>>
>> --Junchao Zhang
>>
>>
>> On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <facklerpw at ornl.gov>
>> wrote:
>>
>> Yes, most (but not all) of our system test cases fail with the
>> kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos
>> backend.
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>> ------------------------------
>> *From:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Sent:* Monday, November 14, 2022 19:34
>> *To:* Fackler, Philip <facklerpw at ornl.gov>
>> *Cc:* xolotl-psi-development at lists.sourceforge.net <
>> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Zhang,
>> Junchao <jczhang at mcs.anl.gov>; Roth, Philip <rothpc at ornl.gov>
>> *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec
>> diverging when running on CUDA device.
>>
>> Hi, Philip,
>>   Sorry to hear that.  It seems you could run the same code on CPUs but
>> not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it
>> right?
>>
>> --Junchao Zhang
>>
>>
>> On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>> This is an issue I've brought up before (and discussed in-person with
>> Richard). I wanted to bring it up again because I'm hitting the limits of
>> what I know to do, and I need help figuring this out.
>>
>> The problem can be reproduced using Xolotl's "develop" branch built
>> against a petsc build with kokkos and kokkos-kernels enabled. Then, either
>> add the relevant kokkos options to the "petscArgs=" line in the system test
>> parameter file(s), or just replace the system test parameter files with the
>> ones from the "feature-petsc-kokkos" branch. See here the files that
>> begin with "params_system_".
>>
>> Note that those files use the "kokkos" options, but the problem is
>> similar using the corresponding cuda/cusparse options. I've already tried
>> building kokkos-kernels with no TPLs and got slightly different results,
>> but the same problem.
>>
>> Any help would be appreciated.
>>
>> Thanks,
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230207/f8ae8c47/attachment-0001.html>


More information about the petsc-users mailing list