[petsc-users] [EXTERNAL] Re: Kokkos backend for Mat and Vec diverging when running on CUDA device.
Junchao Zhang
junchao.zhang at gmail.com
Mon Dec 5 14:40:11 CST 2022
I configured with xolotl branch feature-petsc-kokkos, and typed `make`
under ~/xolotl-build/. Though there were errors, a lot of *Tester were
built.
[ 62%] Built target xolotlViz
[ 63%] Linking CXX executable TemperatureProfileHandlerTester
[ 64%] Linking CXX executable TemperatureGradientHandlerTester
[ 64%] Built target TemperatureProfileHandlerTester
[ 64%] Built target TemperatureConstantHandlerTester
[ 64%] Built target TemperatureGradientHandlerTester
[ 65%] Linking CXX executable HeatEquationHandlerTester
[ 65%] Built target HeatEquationHandlerTester
[ 66%] Linking CXX executable FeFitFluxHandlerTester
[ 66%] Linking CXX executable W111FitFluxHandlerTester
[ 67%] Linking CXX executable FuelFitFluxHandlerTester
[ 67%] Linking CXX executable W211FitFluxHandlerTester
Which Tester should I use to run with the parameter file
benchmarks/params_system_PSI_2.txt? And how many ranks should I use? Could
you give an example command line?
Thanks.
--Junchao Zhang
On Mon, Dec 5, 2022 at 2:22 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:
> Hello, Philip,
> Do I still need to use the feature-petsc-kokkos branch?
> --Junchao Zhang
>
>
> On Mon, Dec 5, 2022 at 11:08 AM Fackler, Philip <facklerpw at ornl.gov>
> wrote:
>
>> Junchao,
>>
>> Thank you for working on this. If you open the parameter file for, say,
>> the PSI_2 system test case (benchmarks/params_system_PSI_2.txt), simply add -dm_mat_type
>> aijkokkos -dm_vec_type kokkos` to the "petscArgs=" field (or the
>> corresponding cusparse/cuda option).
>>
>> Thanks,
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>> ------------------------------
>> *From:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Sent:* Thursday, December 1, 2022 17:05
>> *To:* Fackler, Philip <facklerpw at ornl.gov>
>> *Cc:* xolotl-psi-development at lists.sourceforge.net <
>> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth,
>> Philip <rothpc at ornl.gov>
>> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
>> Vec diverging when running on CUDA device.
>>
>> Hi, Philip,
>> Sorry for the long delay. I could not get something useful from the
>> -log_view output. Since I have already built xolotl, could you give me
>> instructions on how to do a xolotl test to reproduce the divergence with
>> petsc GPU backends (but fine on CPU)?
>> Thank you.
>> --Junchao Zhang
>>
>>
>> On Wed, Nov 16, 2022 at 1:38 PM Fackler, Philip <facklerpw at ornl.gov>
>> wrote:
>>
>> ------------------------------------------------------------------ PETSc
>> Performance Summary:
>> ------------------------------------------------------------------
>>
>> Unknown Name on a named PC0115427 with 1 processor, by 4pf Wed Nov 16
>> 14:36:46 2022
>> Using Petsc Development GIT revision: v3.18.1-115-gdca010e0e9a GIT Date:
>> 2022-10-28 14:39:41 +0000
>>
>> Max Max/Min Avg Total
>> Time (sec): 6.023e+00 1.000 6.023e+00
>> Objects: 1.020e+02 1.000 1.020e+02
>> Flops: 1.080e+09 1.000 1.080e+09 1.080e+09
>> Flops/sec: 1.793e+08 1.000 1.793e+08 1.793e+08
>> MPI Msg Count: 0.000e+00 0.000 0.000e+00 0.000e+00
>> MPI Msg Len (bytes): 0.000e+00 0.000 0.000e+00 0.000e+00
>> MPI Reductions: 0.000e+00 0.000
>>
>> Flop counting convention: 1 flop = 1 real number operation of type
>> (multiply/divide/add/subtract)
>> e.g., VecAXPY() for real vectors of length N
>> --> 2N flops
>> and VecAXPY() for complex vectors of length N
>> --> 8N flops
>>
>> Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages
>> --- -- Message Lengths -- -- Reductions --
>> Avg %Total Avg %Total Count
>> %Total Avg %Total Count %Total
>> 0: Main Stage: 6.0226e+00 100.0% 1.0799e+09 100.0% 0.000e+00
>> 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> See the 'Profiling' chapter of the users' manual for details on
>> interpreting output.
>> Phase summary info:
>> Count: number of times phase was executed
>> Time and Flop: Max - maximum over all processors
>> Ratio - ratio of maximum to minimum over all processors
>> Mess: number of messages sent
>> AvgLen: average message length (bytes)
>> Reduct: number of global reductions
>> Global: entire computation
>> Stage: stages of a computation. Set stages with PetscLogStagePush()
>> and PetscLogStagePop().
>> %T - percent time in this phase %F - percent flop in this
>> phase
>> %M - percent messages in this phase %L - percent message
>> lengths in this phase
>> %R - percent reductions in this phase
>> Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time
>> over all processors)
>> GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU
>> time over all processors)
>> CpuToGpu Count: total number of CPU to GPU copies per processor
>> CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per
>> processor)
>> GpuToCpu Count: total number of GPU to CPU copies per processor
>> GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per
>> processor)
>> GPU %F: percent flops on GPU in this event
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Event Count Time (sec) Flop
>> --- Global --- --- Stage ---- Total
>> GPU - CpuToGpu - - GpuToCpu - GPU
>>
>> Max Ratio Max Ratio Max Ratio Mess AvgLen
>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
>> Mflop/s Count Size Count Size %F
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> ---------------------------------------
>>
>>
>> --- Event Stage 0: Main Stage
>>
>> BuildTwoSided 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> SFSetGraph 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> SFSetUp 3 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> SFPack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> SFUnpack 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> VecDot 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> VecMDot 775 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> VecNorm 1728 1.0 nan nan 1.92e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> VecScale 1983 1.0 nan nan 6.24e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> VecCopy 780 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> VecSet 4955 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> VecAXPY 190 1.0 nan nan 2.11e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> VecAYPX 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> VecAXPBYCZ 643 1.0 nan nan 1.79e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> VecWAXPY 502 1.0 nan nan 5.58e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> VecMAXPY 1159 1.0 nan nan 3.68e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> VecScatterBegin 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan
>> -nan 2 5.14e-03 0 0.00e+00 0
>>
>> VecScatterEnd 4647 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> VecReduceArith 380 1.0 nan nan 4.23e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> VecReduceComm 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> VecNormalize 965 1.0 nan nan 1.61e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> TSStep 20 1.0 5.8699e+00 1.0 1.08e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00 97100 0 0 0 97100 0 0 0 184
>> -nan 2 5.14e-03 0 0.00e+00 54
>>
>> TSFunctionEval 597 1.0 nan nan 6.64e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 63 1 0 0 0 63 1 0 0 0 -nan
>> -nan 1 3.36e-04 0 0.00e+00 100
>>
>> TSJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 97
>>
>> MatMult 1930 1.0 nan nan 4.46e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 1 41 0 0 0 1 41 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> MatMultTranspose 1 1.0 nan nan 3.44e+05 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> MatSolve 965 1.0 nan nan 5.04e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 1 5 0 0 0 1 5 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> MatSOR 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> MatLUFactorSym 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> MatLUFactorNum 190 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 1 11 0 0 0 1 11 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> MatScale 190 1.0 nan nan 3.26e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> MatAssemblyBegin 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> MatAssemblyEnd 761 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> MatGetRowIJ 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> MatCreateSubMats 380 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> MatGetOrdering 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> MatZeroEntries 379 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> MatSetValuesCOO 190 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> KSPSetUp 760 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> KSPSolve 190 1.0 5.8052e-01 1.0 9.30e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 10 86 0 0 0 10 86 0 0 0 1602
>> -nan 1 4.80e-03 0 0.00e+00 46
>>
>> KSPGMRESOrthog 775 1.0 nan nan 2.27e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 1 2 0 0 0 1 2 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> SNESSolve 71 1.0 5.7117e+00 1.0 1.07e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00 95 99 0 0 0 95 99 0 0 0 188
>> -nan 1 4.80e-03 0 0.00e+00 53
>>
>> SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> SNESFunctionEval 573 1.0 nan nan 2.23e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 60 2 0 0 0 60 2 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> SNESJacobianEval 190 1.0 nan nan 3.37e+07 1.0 0.0e+00 0.0e+00
>> 0.0e+00 24 3 0 0 0 24 3 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 97
>>
>> SNESLineSearch 190 1.0 nan nan 1.05e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 53 10 0 0 0 53 10 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 100
>>
>> PCSetUp 570 1.0 nan nan 1.16e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 2 11 0 0 0 2 11 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> PCApply 965 1.0 nan nan 6.14e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 8 57 0 0 0 8 57 0 0 0 -nan
>> -nan 1 4.80e-03 0 0.00e+00 19
>>
>> KSPSolve_FS_0 965 1.0 nan nan 3.33e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 4 31 0 0 0 4 31 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>> KSPSolve_FS_1 965 1.0 nan nan 1.66e+08 1.0 0.0e+00 0.0e+00
>> 0.0e+00 2 15 0 0 0 2 15 0 0 0 -nan
>> -nan 0 0.00e+00 0 0.00e+00 0
>>
>>
>> --- Event Stage 1: Unknown
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> ---------------------------------------
>>
>>
>> Object Type Creations Destructions. Reports information only
>> for process 0.
>>
>> --- Event Stage 0: Main Stage
>>
>> Container 5 5
>> Distributed Mesh 2 2
>> Index Set 11 11
>> IS L to G Mapping 1 1
>> Star Forest Graph 7 7
>> Discrete System 2 2
>> Weak Form 2 2
>> Vector 49 49
>> TSAdapt 1 1
>> TS 1 1
>> DMTS 1 1
>> SNES 1 1
>> DMSNES 3 3
>> SNESLineSearch 1 1
>> Krylov Solver 4 4
>> DMKSP interface 1 1
>> Matrix 4 4
>> Preconditioner 4 4
>> Viewer 2 1
>>
>> --- Event Stage 1: Unknown
>>
>>
>> ========================================================================================================================
>> Average time to get PetscTime(): 3.14e-08
>> #PETSc Option Table entries:
>> -log_view
>> -log_view_gpu_times
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with 64 bit PetscInt
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8 sizeof(PetscInt) 8
>> Configure options: PETSC_DIR=/home/4pf/repos/petsc
>> PETSC_ARCH=arch-kokkos-cuda-no-tpls --with-cc=mpicc --with-cxx=mpicxx
>> --with-fc=0 --with-cuda --with-debugging=0 --with-shared-libraries
>> --prefix=/home/4pf/build/petsc/cuda-no-tpls/install --with-64-bit-indices
>> --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --CUDAOPTFLAGS=-O3
>> --with-kokkos-dir=/home/4pf/build/kokkos/cuda/install
>> --with-kokkos-kernels-dir=/home/4pf/build/kokkos-kernels/cuda-no-tpls/install
>>
>> -----------------------------------------
>> Libraries compiled on 2022-11-01 21:01:08 on PC0115427
>> Machine characteristics: Linux-5.15.0-52-generic-x86_64-with-glibc2.35
>> Using PETSc directory: /home/4pf/build/petsc/cuda-no-tpls/install
>> Using PETSc arch:
>> -----------------------------------------
>>
>> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas
>> -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector
>> -fvisibility=hidden -O3
>> -----------------------------------------
>>
>> Using include paths: -I/home/4pf/build/petsc/cuda-no-tpls/install/include
>> -I/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/include
>> -I/home/4pf/build/kokkos/cuda/install/include -I/usr/local/cuda-11.8/include
>> -----------------------------------------
>>
>> Using C linker: mpicc
>> Using libraries:
>> -Wl,-rpath,/home/4pf/build/petsc/cuda-no-tpls/install/lib
>> -L/home/4pf/build/petsc/cuda-no-tpls/install/lib -lpetsc
>> -Wl,-rpath,/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib
>> -L/home/4pf/build/kokkos-kernels/cuda-no-tpls/install/lib
>> -Wl,-rpath,/home/4pf/build/kokkos/cuda/install/lib
>> -L/home/4pf/build/kokkos/cuda/install/lib
>> -Wl,-rpath,/usr/local/cuda-11.8/lib64 -L/usr/local/cuda-11.8/lib64
>> -L/usr/local/cuda-11.8/lib64/stubs -lkokkoskernels -lkokkoscontainers
>> -lkokkoscore -llapack -lblas -lm -lcudart -lnvToolsExt -lcufft -lcublas
>> -lcusparse -lcusolver -lcurand -lcuda -lquadmath -lstdc++ -ldl
>> -----------------------------------------
>>
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>> ------------------------------
>> *From:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Sent:* Tuesday, November 15, 2022 13:03
>> *To:* Fackler, Philip <facklerpw at ornl.gov>
>> *Cc:* xolotl-psi-development at lists.sourceforge.net <
>> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Roth,
>> Philip <rothpc at ornl.gov>
>> *Subject:* Re: [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and
>> Vec diverging when running on CUDA device.
>>
>> Can you paste -log_view result so I can see what functions are used?
>>
>> --Junchao Zhang
>>
>>
>> On Tue, Nov 15, 2022 at 10:24 AM Fackler, Philip <facklerpw at ornl.gov>
>> wrote:
>>
>> Yes, most (but not all) of our system test cases fail with the
>> kokkos/cuda or cuda backends. All of them pass with the CPU-only kokkos
>> backend.
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>> ------------------------------
>> *From:* Junchao Zhang <junchao.zhang at gmail.com>
>> *Sent:* Monday, November 14, 2022 19:34
>> *To:* Fackler, Philip <facklerpw at ornl.gov>
>> *Cc:* xolotl-psi-development at lists.sourceforge.net <
>> xolotl-psi-development at lists.sourceforge.net>; petsc-users at mcs.anl.gov <
>> petsc-users at mcs.anl.gov>; Blondel, Sophie <sblondel at utk.edu>; Zhang,
>> Junchao <jczhang at mcs.anl.gov>; Roth, Philip <rothpc at ornl.gov>
>> *Subject:* [EXTERNAL] Re: [petsc-users] Kokkos backend for Mat and Vec
>> diverging when running on CUDA device.
>>
>> Hi, Philip,
>> Sorry to hear that. It seems you could run the same code on CPUs but
>> not no GPUs (with either petsc/Kokkos backend or petsc/cuda backend, is it
>> right?
>>
>> --Junchao Zhang
>>
>>
>> On Mon, Nov 14, 2022 at 12:13 PM Fackler, Philip via petsc-users <
>> petsc-users at mcs.anl.gov> wrote:
>>
>> This is an issue I've brought up before (and discussed in-person with
>> Richard). I wanted to bring it up again because I'm hitting the limits of
>> what I know to do, and I need help figuring this out.
>>
>> The problem can be reproduced using Xolotl's "develop" branch built
>> against a petsc build with kokkos and kokkos-kernels enabled. Then, either
>> add the relevant kokkos options to the "petscArgs=" line in the system test
>> parameter file(s), or just replace the system test parameter files with the
>> ones from the "feature-petsc-kokkos" branch. See here the files that
>> begin with "params_system_".
>>
>> Note that those files use the "kokkos" options, but the problem is
>> similar using the corresponding cuda/cusparse options. I've already tried
>> building kokkos-kernels with no TPLs and got slightly different results,
>> but the same problem.
>>
>> Any help would be appreciated.
>>
>> Thanks,
>>
>>
>> *Philip Fackler *
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> *Oak Ridge National Laboratory*
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20221205/db16db47/attachment-0001.html>
More information about the petsc-users
mailing list