[petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device.

Fackler, Philip facklerpw at ornl.gov
Fri Jan 20 11:00:51 CST 2023


Any progress on this? Any info/help needed?

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Fackler, Philip <facklerpw at ornl.gov>
Sent: Thursday, December 8, 2022 09:07
To: Junchao Zhang <junchao.zhang at gmail.com>
Cc: xolotl-psi-development at lists.sourceforge.net <xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth, Philip <rothpc at ornl.gov>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

Great! Thank you!

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: Wednesday, December 7, 2022 18:47
To: Fackler, Philip <facklerpw at ornl.gov>
Cc: xolotl-psi-development at lists.sourceforge.net <xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth, Philip <rothpc at ornl.gov>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

Hi, Philip,
 I could reproduce the error. I need to find a  way to debug it.  Thanks.

/home/jczhang/xolotl/test/system/SystemTestCase.cpp(317): fatal error: in "System/PSI_1": absolute value of diffNorm{0.19704848134353209} exceeds 1e-10
*** 1 failure is detected in the test module "Regression"

--Junchao Zhang


On Tue, Dec 6, 2022 at 10:10 AM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
I think it would be simpler to use the develop branch for this issue. But you can still just build the SystemTester. Then (if you changed the PSI_1 case) run:

 ./test/system/SystemTester -t System/PSI_1 -- -v​

(No need for multiple MPI ranks)

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Monday, December 5, 2022 15:40
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>; Roth, Philip <rothpc at ornl.gov<mailto:rothpc at ornl.gov>>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

I configured with xolotl branch feature-petsc-kokkos, and typed `make` under ~/xolotl-build/.  Though there were errors,  a lot of *Tester were built.
[ 62%] Built target xolotlViz
[ 63%] Linking CXX executable TemperatureProfileHandlerTester
[ 64%] Linking CXX executable TemperatureGradientHandlerTester
[ 64%] Built target TemperatureProfileHandlerTester
[ 64%] Built target TemperatureConstantHandlerTester
[ 64%] Built target TemperatureGradientHandlerTester
[ 65%] Linking CXX executable HeatEquationHandlerTester
[ 65%] Built target HeatEquationHandlerTester
[ 66%] Linking CXX executable FeFitFluxHandlerTester
[ 66%] Linking CXX executable W111FitFluxHandlerTester
[ 67%] Linking CXX executable FuelFitFluxHandlerTester
[ 67%] Linking CXX executable W211FitFluxHandlerTester
Which Tester should I use to run with the parameter file benchmarks/params_system_PSI_2.txt? And how many ranks should I use?  Could you give an example command line?
Thanks.

--Junchao Zhang


On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>> wrote:
Hello, Philip,
   Do I still need to use the feature-petsc-kokkos branch?
--Junchao Zhang


On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
Junchao,

Thank you for working on this. If you open the parameter file for, say, the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type aijkokkos -dm_vec_type kokkos​` to the "petscArgs=" field (or the corresponding cusparse/cuda option).

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Thursday, December 1, 2022 17:05
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>; Roth, Philip <rothpc at ornl.gov<mailto:rothpc at ornl.gov>>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

Hi, Philip,
  Sorry for the long delay.  I could not get something useful from the -log_view output.  Since I have already built xolotl, could you give me instructions on how to do a xolotl test to reproduce the divergence with petsc GPU backends (but fine on CPU)?
  Thank you.
--Junchao Zhang


On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
------------------------------------------------------------------ PETSc Performance Summary: ------------------------------------------------------------------

Unknown Name on a  named PC0115427 with 1 processor, by 4pf Wed Nov 16 14:36:46 2022
Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date: 2022-10-28 14:39:41 +0000

                         Max       Max/Min     Avg       Total
Time (sec):           6.023e+00     1.000   6.023e+00
Objects:              1.020e+02     1.000   1.020e+02
Flops:                1.080e+09     1.000   1.080e+09  1.080e+09
Flops/sec:            1.793e+08     1.000   1.793e+08  1.793e+08
MPI Msg Count:        0.000e+00     0.000   0.000e+00  0.000e+00
MPI Msg Len (bytes):  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total
 0:      Main Stage: 6.0226e+00 100.0%  1.0799e+09 100.0%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
   GPU    - CpuToGpu -   - GpuToCpu - GPU

                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
 Mflop/s Count   Size   Count   Size  %F

------------------------------------------------------------------------------------------------------------------------
---------------------------------------


--- Event Stage 0: Main Stage

BuildTwoSided          3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

DMCreateMat            1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFSetGraph             3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFSetUp                3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFPack              4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SFUnpack            4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecDot               190 1.0   nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecMDot              775 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecNorm             1728 1.0   nan nan 1.92e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecScale            1983 1.0   nan nan 6.24e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecCopy              780 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecSet              4955 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecAXPY              190 1.0   nan nan 2.11e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecAYPX              597 1.0   nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecAXPBYCZ           643 1.0   nan nan 1.79e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecWAXPY             502 1.0   nan nan 5.58e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecMAXPY            1159 1.0   nan nan 3.68e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecScatterBegin     4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
    -nan      2 5.14e-03    0 0.00e+00  0

VecScatterEnd       4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecReduceArith       380 1.0   nan nan 4.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

VecReduceComm        190 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

VecNormalize         965 1.0   nan nan 1.61e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

TSStep                20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00 0.0e+00 97100  0  0  0  97100  0  0  0   184
    -nan      2 5.14e-03    0 0.00e+00 54

TSFunctionEval       597 1.0   nan nan 6.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 63  1  0  0  0  63  1  0  0  0  -nan
    -nan      1 3.36e-04    0 0.00e+00 100

TSJacobianEval       190 1.0   nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24  3  0  0  0  24  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 97

MatMult             1930 1.0   nan nan 4.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1 41  0  0  0   1 41  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

MatMultTranspose       1 1.0   nan nan 3.44e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

MatSolve             965 1.0   nan nan 5.04e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  5  0  0  0   1  5  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatSOR               965 1.0   nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4 31  0  0  0   4 31  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatLUFactorSym         1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatLUFactorNum       190 1.0   nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00  1 11  0  0  0   1 11  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatScale             190 1.0   nan nan 3.26e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

MatAssemblyBegin     761 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatAssemblyEnd       761 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatGetRowIJ            1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatCreateSubMats     380 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatGetOrdering         1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatZeroEntries       379 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatSetPreallCOO        1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

MatSetValuesCOO      190 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

KSPSetUp             760 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

KSPSolve             190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00 0.0e+00 10 86  0  0  0  10 86  0  0  0  1602
    -nan      1 4.80e-03    0 0.00e+00 46

KSPGMRESOrthog       775 1.0   nan nan 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00  1  2  0  0  0   1  2  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

SNESSolve             71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 95 99  0  0  0  95 99  0  0  0   188
    -nan      1 4.80e-03    0 0.00e+00 53

SNESSetUp              1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

SNESFunctionEval     573 1.0   nan nan 2.23e+07 1.0 0.0e+00 0.0e+00 0.0e+00 60  2  0  0  0  60  2  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

SNESJacobianEval     190 1.0   nan nan 3.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 24  3  0  0  0  24  3  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 97

SNESLineSearch       190 1.0   nan nan 1.05e+08 1.0 0.0e+00 0.0e+00 0.0e+00 53 10  0  0  0  53 10  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00 100

PCSetUp              570 1.0   nan nan 1.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2 11  0  0  0   2 11  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

PCApply              965 1.0   nan nan 6.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00  8 57  0  0  0   8 57  0  0  0  -nan
    -nan      1 4.80e-03    0 0.00e+00 19

KSPSolve_FS_0        965 1.0   nan nan 3.33e+08 1.0 0.0e+00 0.0e+00 0.0e+00  4 31  0  0  0   4 31  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0

KSPSolve_FS_1        965 1.0   nan nan 1.66e+08 1.0 0.0e+00 0.0e+00 0.0e+00  2 15  0  0  0   2 15  0  0  0  -nan
    -nan      0 0.00e+00    0 0.00e+00  0


--- Event Stage 1: Unknown

------------------------------------------------------------------------------------------------------------------------
---------------------------------------


Object Type          Creations   Destructions. Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container     5              5
    Distributed Mesh     2              2
           Index Set    11             11
   IS L to G Mapping     1              1
   Star Forest Graph     7              7
     Discrete System     2              2
           Weak Form     2              2
              Vector    49             49
             TSAdapt     1              1
                  TS     1              1
                DMTS     1              1
                SNES     1              1
              DMSNES     3              3
      SNESLineSearch     1              1
       Krylov Solver     4              4
     DMKSP interface     1              1
              Matrix     4              4
      Preconditioner     4              4
              Viewer     2              1

--- Event Stage 1: Unknown

========================================================================================================================
Average time to get PetscTime(): 3.14e-08
#PETSc Option Table entries:
-log_view
-log_view_gpu_times
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with 64 bit PetscInt
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 8
Configure options: PETSC_DIR=/home/4pf/repos/petsc PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3 --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install

-----------------------------------------
Libraries compiled on 2022-11-01 21:01:08 on PC0115427
Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install
Using PETSc arch:
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -O3
-----------------------------------------

Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include
-----------------------------------------

Using C linker: mpicc
Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib -L/home/4pf/build/kokkos/cuda/install/lib -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl
-----------------------------------------


Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Tuesday, November 15, 2022 13:03
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>; Roth, Philip <rothpc at ornl.gov<mailto:rothpc at ornl.gov>>
Subject: Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

Can you paste -log_view result so I can see what functions are used?

--Junchao Zhang


On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>> wrote:
Yes, most (but not all) of our system test cases fail with the kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos backend.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com<mailto:junchao.zhang at gmail.com>>
Sent: Monday, November 14, 2022 19:34
To: Fackler, Philip <facklerpw at ornl.gov<mailto:facklerpw at ornl.gov>>
Cc: xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net> <xolotl-psi-development at lists.sourceforge.net<mailto:xolotl-psi-development at lists.sourceforge.net>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Blondel, Sophie <sblondel at utk.edu<mailto:sblondel at utk.edu>>; Zhang, Junchao <jczhang at mcs.anl.gov<mailto:jczhang at mcs.anl.gov>>; Roth, Philip <rothpc at ornl.gov<mailto:rothpc at ornl.gov>>
Subject: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec diverging when running on CUDA device.

Hi, Philip,
  Sorry to hear that.  It seems you could run the same code on CPUs but not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it right?

--Junchao Zhang


On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
This is an issue I've brought up before (and discussed in-person with Richard). I wanted to bring it up again because I'm hitting the limits of what I know to do, and I need help figuring this out.

The problem can be reproduced using Xolotl's "develop" branch built against a petsc build with kokkos and kokkos-kernels enabled. Then, either add the relevant kokkos options to the "petscArgs=" line in the system test parameter file(s), or just replace the system test parameter files with the ones from the "feature-petsc-kokkos" branch. See here the files that begin with "params_system_".

Note that those files use the "kokkos" options, but the problem is similar using the corresponding cuda/cusparse options. I've already tried building kokkos-kernels with no TPLs and got slightly different results, but the same problem.

Any help would be appreciated.

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230120/507c83dc/attachment-0001.html>


More information about the petsc-users mailing list