[petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device.

Junchao Zhang junchao.zhang at gmail.com
Wed Dec 7 17:47:15 CST 2022


Hi, Philip,
 I could reproduce the error. I need to find a  way to debug it.  Thanks.

/home/jczhang/xolotl/test/system/SystemTestCase.cpp(317): fatal error: in
"System/PSI_1": absolute value of diffNorm{0.19704848134353209} exceeds
1e-10
*** 1 failure is detected in the test module "Regression"


--Junchao Zhang


On Tue, Dec 6, 2022 at 10:10 AM Fackler, Philip <facklerpw at ornl.gov> wrote:

> I think it would be simpler to use the develop branch for this issue. But
> you can still just build the SystemTester. Then (if you changed the PSI_1
> case) run:
>
>  ./test/system/SystemTester -t System/PSI_1 -- -v​
>
> (No need for multiple MPI ranks)
>
> Thanks,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Monday, December 5, 2022 15:40
> *To:* Fackler, Philip <facklerpw at ornl.gov>
> *Cc:* xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth,
> Philip <rothpc at ornl.gov>
> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
> Vec diverging when running on CUDA device.
>
> I configured with xolotl branch feature-petsc-kokkos, and typed `make`
> under ~/xolotl-build/.  Though there were errors,  a lot of *Tester were
> built.
>
> [ 62%] Built target xolotlViz
> [ 63%] Linking CXX executable TemperatureProfileHandlerTester
> [ 64%] Linking CXX executable TemperatureGradientHandlerTester
> [ 64%] Built target TemperatureProfileHandlerTester
> [ 64%] Built target TemperatureConstantHandlerTester
> [ 64%] Built target TemperatureGradientHandlerTester
> [ 65%] Linking CXX executable HeatEquationHandlerTester
> [ 65%] Built target HeatEquationHandlerTester
> [ 66%] Linking CXX executable FeFitFluxHandlerTester
> [ 66%] Linking CXX executable W111FitFluxHandlerTester
> [ 67%] Linking CXX executable FuelFitFluxHandlerTester
> [ 67%] Linking CXX executable W211FitFluxHandlerTester
>
> Which Tester should I use to run with the parameter file
> benchmarks/params_system_PSI_2.txt? And how many ranks should I use?
> Could you give an example command line?
> Thanks.
>
> --Junchao Zhang
>
>
> On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
> Hello, Philip,
>    Do I still need to use the feature-petsc-kokkos branch?
> --Junchao Zhang
>
>
> On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <facklerpw at ornl.gov>
> wrote:
>
> Junchao,
>
> Thank you for working on this. If you open the parameter file for, say,
> the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type
> aijkokkos -dm_vec_type kokkos​` to the "petscArgs=" field (or the
> corresponding cusparse/cuda option).
>
> Thanks,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Thursday, December 1, 2022 17:05
> *To:* Fackler, Philip <facklerpw at ornl.gov>
> *Cc:* xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth,
> Philip <rothpc at ornl.gov>
> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
> Vec diverging when running on CUDA device.
>
> Hi, Philip,
>   Sorry for the long delay.  I could not get something useful from the
> -log_view output.  Since I have already built xolotl, could you give me
> instructions on how to do a xolotl test to reproduce the divergence with
> petsc GPU backends (but fine on CPU)?
>   Thank you.
> --Junchao Zhang
>
>
> On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <facklerpw at ornl.gov>
> wrote:
>
> ------------------------------------------------------------------ PETSc
> Performance Summary:
> ------------------------------------------------------------------
>
> Unknown Name on a  named PC0115427 with 1 processor, by 4pf Wed Nov 16
> 14:36:46 2022
> Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a  GIT Date:
> 2022-10-28 14:39:41 +0000
>
>                          Max       Max/Min     Avg       Total
> Time (sec):           6.023e+00     1.000   6.023e+00
> Objects:              1.020e+02     1.000   1.020e+02
> Flops:                1.080e+09     1.000   1.080e+09  1.080e+09
> Flops/sec:            1.793e+08     1.000   1.793e+08  1.793e+08
> MPI Msg Count:        0.000e+00     0.000   0.000e+00  0.000e+00
> MPI Msg Len (bytes):  0.000e+00     0.000   0.000e+00  0.000e+00
> MPI Reductions:       0.000e+00     0.000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total    Count
> %Total     Avg         %Total    Count   %Total
>  0:      Main Stage: 6.0226e+00 100.0%  1.0799e+09 100.0%  0.000e+00
> 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flop: Max - maximum over all processors
>                   Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    AvgLen: average message length (bytes)
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %F - percent flop in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
> all processors)
>    GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU
> time over all processors)
>    CpuToGpu Count: total number of CPU to GPU copies per processor
>    CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per
> processor)
>    GpuToCpu Count: total number of GPU to CPU copies per processor
>    GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per
> processor)
>    GPU %F: percent flops on GPU in this event
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flop
>        --- Global ---  --- Stage ----  Total
>    GPU    - CpuToGpu -   - GpuToCpu - GPU
>
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen
>  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
>  Mflop/s Count   Size   Count   Size  %F
>
>
> ------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------
>
>
> --- Event Stage 0: Main Stage
>
> BuildTwoSided          3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> DMCreateMat            1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> SFSetGraph             3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> SFSetUp                3 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> SFPack              4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> SFUnpack            4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> VecDot               190 1.0   nan nan 2.11e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> VecMDot              775 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> VecNorm             1728 1.0   nan nan 1.92e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> VecScale            1983 1.0   nan nan 6.24e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> VecCopy              780 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> VecSet              4955 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  2  0  0  0  0   2  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> VecAXPY              190 1.0   nan nan 2.11e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> VecAYPX              597 1.0   nan nan 6.64e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> VecAXPBYCZ           643 1.0   nan nan 1.79e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> VecWAXPY             502 1.0   nan nan 5.58e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> VecMAXPY            1159 1.0   nan nan 3.68e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  3  0  0  0   0  3  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> VecScatterBegin     4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
>     -nan      2 5.14e-03    0 0.00e+00  0
>
> VecScatterEnd       4647 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> VecReduceArith       380 1.0   nan nan 4.23e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> VecReduceComm        190 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> VecNormalize         965 1.0   nan nan 1.61e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  1  0  0  0   0  1  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> TSStep                20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 97100  0  0  0  97100  0  0  0   184
>     -nan      2 5.14e-03    0 0.00e+00 54
>
> TSFunctionEval       597 1.0   nan nan 6.64e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 63  1  0  0  0  63  1  0  0  0  -nan
>     -nan      1 3.36e-04    0 0.00e+00 100
>
> TSJacobianEval       190 1.0   nan nan 3.37e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 24  3  0  0  0  24  3  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 97
>
> MatMult             1930 1.0   nan nan 4.46e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  1 41  0  0  0   1 41  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> MatMultTranspose       1 1.0   nan nan 3.44e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> MatSolve             965 1.0   nan nan 5.04e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  5  0  0  0   1  5  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> MatSOR               965 1.0   nan nan 3.33e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  4 31  0  0  0   4 31  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> MatLUFactorSym         1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> MatLUFactorNum       190 1.0   nan nan 1.16e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  1 11  0  0  0   1 11  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> MatScale             190 1.0   nan nan 3.26e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  3  0  0  0   0  3  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> MatAssemblyBegin     761 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> MatAssemblyEnd       761 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> MatGetRowIJ            1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> MatCreateSubMats     380 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> MatGetOrdering         1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> MatZeroEntries       379 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> MatSetPreallCOO        1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> MatSetValuesCOO      190 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> KSPSetUp             760 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> KSPSolve             190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 10 86  0  0  0  10 86  0  0  0  1602
>     -nan      1 4.80e-03    0 0.00e+00 46
>
> KSPGMRESOrthog       775 1.0   nan nan 2.27e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  1  2  0  0  0   1  2  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> SNESSolve             71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 95 99  0  0  0  95 99  0  0  0   188
>     -nan      1 4.80e-03    0 0.00e+00 53
>
> SNESSetUp              1 1.0   nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> SNESFunctionEval     573 1.0   nan nan 2.23e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 60  2  0  0  0  60  2  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> SNESJacobianEval     190 1.0   nan nan 3.37e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 24  3  0  0  0  24  3  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 97
>
> SNESLineSearch       190 1.0   nan nan 1.05e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 53 10  0  0  0  53 10  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00 100
>
> PCSetUp              570 1.0   nan nan 1.16e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  2 11  0  0  0   2 11  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> PCApply              965 1.0   nan nan 6.14e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  8 57  0  0  0   8 57  0  0  0  -nan
>     -nan      1 4.80e-03    0 0.00e+00 19
>
> KSPSolve_FS_0        965 1.0   nan nan 3.33e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  4 31  0  0  0   4 31  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
> KSPSolve_FS_1        965 1.0   nan nan 1.66e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  2 15  0  0  0   2 15  0  0  0  -nan
>     -nan      0 0.00e+00    0 0.00e+00  0
>
>
> --- Event Stage 1: Unknown
>
>
> ------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------
>
>
> Object Type          Creations   Destructions. Reports information only
> for process 0.
>
> --- Event Stage 0: Main Stage
>
>            Container     5              5
>     Distributed Mesh     2              2
>            Index Set    11             11
>    IS L to G Mapping     1              1
>    Star Forest Graph     7              7
>      Discrete System     2              2
>            Weak Form     2              2
>               Vector    49             49
>              TSAdapt     1              1
>                   TS     1              1
>                 DMTS     1              1
>                 SNES     1              1
>               DMSNES     3              3
>       SNESLineSearch     1              1
>        Krylov Solver     4              4
>      DMKSP interface     1              1
>               Matrix     4              4
>       Preconditioner     4              4
>               Viewer     2              1
>
> --- Event Stage 1: Unknown
>
>
> ========================================================================================================================
> Average time to get PetscTime(): 3.14e-08
> #PETSc Option Table entries:
> -log_view
> -log_view_gpu_times
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with 64 bit PetscInt
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 8
> Configure options: PETSC_DIR=/home/4pf/repos/petsc
> PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx
> --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries
> --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices
> --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3
> --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install
> --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install
>
> -----------------------------------------
> Libraries compiled on 2022-11-01 21:01:08 on PC0115427
> Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
> Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install
> Using PETSc arch:
> -----------------------------------------
>
> Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
> -fvisibility=hidden -O3
> -----------------------------------------
>
> Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include
> -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include
> -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include
> -----------------------------------------
>
> Using C linker: mpicc
> Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib
> -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc
> -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib
> -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib
> -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib
> -L/home/4pf/build/kokkos/cuda/install/lib
> -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64
> -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers
> -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas
> -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl
> -----------------------------------------
>
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Tuesday, November 15, 2022 13:03
> *To:* Fackler, Philip <facklerpw at ornl.gov>
> *Cc:* xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth,
> Philip <rothpc at ornl.gov>
> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
> Vec diverging when running on CUDA device.
>
> Can you paste -log_view result so I can see what functions are used?
>
> --Junchao Zhang
>
>
> On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <facklerpw at ornl.gov>
> wrote:
>
> Yes, most (but not all) of our system test cases fail with the kokkos/cuda
> or cuda backends. All of them pass with the CPU-only kokkos backend.
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Monday, November 14, 2022 19:34
> *To:* Fackler, Philip <facklerpw at ornl.gov>
> *Cc:* xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Zhang,
> Junchao <jczhang at mcs.anl.gov>; Roth, Philip <rothpc at ornl.gov>
> *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec
> diverging when running on CUDA device.
>
> Hi, Philip,
>   Sorry to hear that.  It seems you could run the same code on CPUs but
> not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it
> right?
>
> --Junchao Zhang
>
>
> On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> This is an issue I've brought up before (and discussed in-person with
> Richard). I wanted to bring it up again because I'm hitting the limits of
> what I know to do, and I need help figuring this out.
>
> The problem can be reproduced using Xolotl's "develop" branch built
> against a petsc build with kokkos and kokkos-kernels enabled. Then, either
> add the relevant kokkos options to the "petscArgs=" line in the system test
> parameter file(s), or just replace the system test parameter files with the
> ones from the "feature-petsc-kokkos" branch. See here the files that
> begin with "params_system_".
>
> Note that those files use the "kokkos" options, but the problem is similar
> using the corresponding cuda/cusparse options. I've already tried building
> kokkos-kernels with no TPLs and got slightly different results, but the
> same problem.
>
> Any help would be appreciated.
>
> Thanks,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221207/6c3f8b9d/attachment-0001.html>


More information about the petsc-users mailing list