[petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device.
Junchao Zhang
junchao.zhang at gmail.com
Mon Dec 5 14:22:59 CST 2022
Hello, Philip,
Do I still need to use the feature-petsc-kokkos branch?
--Junchao Zhang
On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <facklerpw at ornl.gov> wrote:
> Junchao,
>
> Thank you for working on this. If you open the parameter file for, say,
> the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type
> aijkokkos -dm_vec_type kokkos` to the "petscArgs=" field (or the
> corresponding cusparse/cuda option).
>
> Thanks,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Thursday, December 1, 2022 17:05
> *To:* Fackler, Philip <facklerpw at ornl.gov>
> *Cc:* xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth,
> Philip <rothpc at ornl.gov>
> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
> Vec diverging when running on CUDA device.
>
> Hi, Philip,
> Sorry for the long delay. I could not get something useful from the
> -log_view output. Since I have already built xolotl, could you give me
> instructions on how to do a xolotl test to reproduce the divergence with
> petsc GPU backends (but fine on CPU)?
> Thank you.
> --Junchao Zhang
>
>
> On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <facklerpw at ornl.gov>
> wrote:
>
> ------------------------------------------------------------------ PETSc
> Performance Summary:
> ------------------------------------------------------------------
>
> Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16
> 14:36:46 2022
> Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date:
> 2022-10-28 14:39:41 +0000
>
> Max Max/Min Avg Total
> Time (sec): 6.023e+00 1.000 6.023e+00
> Objects: 1.020e+02 1.000 1.020e+02
> Flops: 1.080e+09 1.000 1.080e+09 1.080e+09
> Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08
> MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00
> MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00
> MPI Reductions: 0.000e+00 0.000
>
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N
> --> 2N flops
> and VecAXPY() for complex vectors of length N
> --> 8N flops
>
> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages
> --- -- Message Lengths -- -- Reductions --
> Avg %Total Avg %Total Count
> %Total Avg %Total Count %Total
> 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00
> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
> Count: number of times phase was executed
> Time and Flop: Max - maximum over all processors
> Ratio - ratio of maximum to minimum over all processors
> Mess: number of messages sent
> AvgLen: average message length (bytes)
> Reduct: number of global reductions
> Global: entire computation
> Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
> %T - percent time in this phase %F - percent flop in this
> phase
> %M - percent messages in this phase %L - percent message lengths
> in this phase
> %R - percent reductions in this phase
> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over
> all processors)
> GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU
> time over all processors)
> CpuToGpu Count: total number of CPU to GPU copies per processor
> CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per
> processor)
> GpuToCpu Count: total number of GPU to CPU copies per processor
> GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per
> processor)
> GPU %F: percent flops on GPU in this event
>
> ------------------------------------------------------------------------------------------------------------------------
> Event Count Time (sec) Flop
> --- Global --- --- Stage ---- Total
> GPU - CpuToGpu - - GpuToCpu - GPU
>
> Max Ratio Max Ratio Max Ratio Mess AvgLen
> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
> Mflop/s Count Size Count Size %F
>
>
> ------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------
>
>
> --- Event Stage 0: Main Stage
>
> BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan
> -nan 2 5.14e-03 0 0.00e+00 0
>
> VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 97100 0 0 0 97100 0 0 0 184
> -nan 2 5.14e-03 0 0.00e+00 54
>
> TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00
> 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan
> -nan 1 3.36e-04 0 0.00e+00 100
>
> TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 97
>
> MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602
> -nan 1 4.80e-03 0 0.00e+00 46
>
> KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 95 99 0 0 0 95 99 0 0 0 188
> -nan 1 4.80e-03 0 0.00e+00 53
>
> SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 97
>
> SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 100
>
> PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan
> -nan 1 4.80e-03 0 0.00e+00 19
>
> KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
> KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan
> -nan 0 0.00e+00 0 0.00e+00 0
>
>
> --- Event Stage 1: Unknown
>
>
> ------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------
>
>
> Object Type Creations Destructions. Reports information only
> for process 0.
>
> --- Event Stage 0: Main Stage
>
> Container 5 5
> Distributed Mesh 2 2
> Index Set 11 11
> IS L to G Mapping 1 1
> Star Forest Graph 7 7
> Discrete System 2 2
> Weak Form 2 2
> Vector 49 49
> TSAdapt 1 1
> TS 1 1
> DMTS 1 1
> SNES 1 1
> DMSNES 3 3
> SNESLineSearch 1 1
> Krylov Solver 4 4
> DMKSP interface 1 1
> Matrix 4 4
> Preconditioner 4 4
> Viewer 2 1
>
> --- Event Stage 1: Unknown
>
>
> ========================================================================================================================
> Average time to get PetscTime(): 3.14e-08
> #PETSc Option Table entries:
> -log_view
> -log_view_gpu_times
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with 64 bit PetscInt
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 8
> Configure options: PETSC_DIR=/home/4pf/repos/petsc
> PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx
> --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries
> --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices
> --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3
> --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install
> --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install
>
> -----------------------------------------
> Libraries compiled on 2022-11-01 21:01:08 on PC0115427
> Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
> Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install
> Using PETSc arch:
> -----------------------------------------
>
> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
> -fvisibility=hidden -O3
> -----------------------------------------
>
> Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include
> -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include
> -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include
> -----------------------------------------
>
> Using C linker: mpicc
> Using libraries: -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib
> -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc
> -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib
> -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib
> -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib
> -L/home/4pf/build/kokkos/cuda/install/lib
> -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64
> -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers
> -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas
> -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl
> -----------------------------------------
>
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Tuesday, November 15, 2022 13:03
> *To:* Fackler, Philip <facklerpw at ornl.gov>
> *Cc:* xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth,
> Philip <rothpc at ornl.gov>
> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
> Vec diverging when running on CUDA device.
>
> Can you paste -log_view result so I can see what functions are used?
>
> --Junchao Zhang
>
>
> On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <facklerpw at ornl.gov>
> wrote:
>
> Yes, most (but not all) of our system test cases fail with the kokkos/cuda
> or cuda backends. All of them pass with the CPU-only kokkos backend.
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Monday, November 14, 2022 19:34
> *To:* Fackler, Philip <facklerpw at ornl.gov>
> *Cc:* xolotl-psi-development at lists.sourceforge.net <
> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Zhang,
> Junchao <jczhang at mcs.anl.gov>; Roth, Philip <rothpc at ornl.gov>
> *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec
> diverging when running on CUDA device.
>
> Hi, Philip,
> Sorry to hear that. It seems you could run the same code on CPUs but
> not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it
> right?
>
> --Junchao Zhang
>
>
> On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> This is an issue I've brought up before (and discussed in-person with
> Richard). I wanted to bring it up again because I'm hitting the limits of
> what I know to do, and I need help figuring this out.
>
> The problem can be reproduced using Xolotl's "develop" branch built
> against a petsc build with kokkos and kokkos-kernels enabled. Then, either
> add the relevant kokkos options to the "petscArgs=" line in the system test
> parameter file(s), or just replace the system test parameter files with the
> ones from the "feature-petsc-kokkos" branch. See here the files that
> begin with "params_system_".
>
> Note that those files use the "kokkos" options, but the problem is similar
> using the corresponding cuda/cusparse options. I've already tried building
> kokkos-kernels with no TPLs and got slightly different results, but the
> same problem.
>
> Any help would be appreciated.
>
> Thanks,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221205/9dd449f3/attachment-0001.html>
More information about the petsc-users
mailing list