[petsc-dev] cuda with kokkos-cuda build fail

Barry Smith bsmith at petsc.dev
Fri Jan 7 10:23:25 CST 2022


  Could easily be an out of resource issue. Does a slightly smaller problem run? 

> On Jan 6, 2022, at 10:32 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> I seem to have a regression with using aijcusprase in a kokkos build. It's OK with a straight CUDA build.
> 
> # [0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
> # [0]PETSC ERROR: GPU error 
> # [0]PETSC ERROR: cuBLAS error 13 (CUBLAS_STATUS_EXECUTION_FAILED)
> # [0]PETSC ERROR: See https://petsc.org/release/faq/ <https://petsc.org/release/faq/> for trouble shooting.
> # [0]PETSC ERROR: Petsc Development GIT revision: v3.16.3-511-g96172674f3  GIT Date: 2022-01-06 23:44:32 +0000
> # [0]PETSC ERROR: /global/u2/m/madams/petsc_install/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/tests/ts/utils/dmplexlandau/tutorials/runex1_cuda/../ex1 on a arch-perlmutter-opt-gcc-kokkos-cuda named nid003188 by madams Thu Jan  6 19:29:06 2022
> # [0]PETSC ERROR: Configure options --CFLAGS="   -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --with-cc=cc --with-cxx=CC --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91 --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc --COPTFLAGS="   -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS="   -O3" --with-debugging=0 --download-metis --download-parmetis --with-cuda=1 --with-cuda-arch=80 --with-mpiexec=srun --with-batch=0 --download-p4est=1 --with-zlib=1 --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 --with-make-np=8 PETSC_DIR=/global/homes/m/madams/petsc_install/petsc PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda
> # [0]PETSC ERROR: #1 VecNorm_SeqCUDA() at /global/u2/m/madams/petsc_install/petsc/src/vec/vec/impls/seq/seqcuda/veccuda2.cu:994 <http://veccuda2.cu:994/>
> # [0]PETSC ERROR: #2 VecNorm() at /global/u2/m/madams/petsc_install/petsc/src/vec/vec/interface/rvector.c:228
> # [0]PETSC ERROR: #3 SNESSolve_NEWTONLS() at /global/u2/m/madams/petsc_install/petsc/src/snes/impls/ls/ls.c:179
> # [0]PETSC ERROR: #4 SNESSolve() at /global/u2/m/madams/petsc_install/petsc/src/snes/interface/snes.c:4810
> # [0]PETSC ERROR: #5 TSStep_ARKIMEX() at /global/u2/m/madams/petsc_install/petsc/src/ts/impls/arkimex/arkimex.c:845
> # [0]PETSC ERROR: #6 TSStep() at /global/u2/m/madams/petsc_install/petsc/src/ts/interface/ts.c:3572
> # [0]PETSC ERROR: #7 TSSolve() at /global/u2/m/madams/petsc_install/petsc/src/ts/interface/ts.c:3971
> # [0]PETSC ERROR: #8 main() at /global/u2/m/madams/petsc_install/petsc/src/ts/utils/dmplexlandau/tutorials/ex1.c:45
> # [0]PETSC ERROR: PETSc Option Table entries:
> # [0]PETSC ERROR: -check_pointer_intensity 0
> # [0]PETSC ERROR: -dm_landau_amr_levels_max 2,1
> # [0]PETSC ERROR: -dm_landau_device_type cuda
> # [0]PETSC ERROR: -dm_landau_ion_charges 1,18
> # [0]PETSC ERROR: -dm_landau_ion_masses 2,4
> # [0]PETSC ERROR: -dm_landau_n 1.00018,1,1e-5
> # [0]PETSC ERROR: -dm_landau_n_0 1e20
> # [0]PETSC ERROR: -dm_landau_num_species_grid 1,2
> # [0]PETSC ERROR: -dm_landau_thermal_temps 5,5,.5
> # [0]PETSC ERROR: -dm_landau_type p4est
> # [0]PETSC ERROR: -dm_mat_type aijcusparse
> # [0]PETSC ERROR: -dm_preallocate_only false
> # [0]PETSC ERROR: -dm_vec_type cuda
> # [0]PETSC ERROR: -error_output_stdout
> # [0]PETSC ERROR: -ksp_type preonly
> # [0]PETSC ERROR: -malloc_dump
> # [0]PETSC ERROR: -mat_cusparse_use_cpu_solve
> # [0]PETSC ERROR: -nox
> # [0]PETSC ERROR: -nox_warning
> # [0]PETSC ERROR: -pc_type lu
> # [0]PETSC ERROR: -petscspace_degree 3
> # [0]PETSC ERROR: -petscspace_poly_tensor 1
> # [0]PETSC ERROR: -snes_converged_reason
> # [0]PETSC ERROR: -snes_monitor
> # [0]PETSC ERROR: -snes_rtol 1.e-14
> # [0]PETSC ERROR: -snes_stol 1.e-14
> # [0]PETSC ERROR: -ts_adapt_clip .5,1.25
> # [0]PETSC ERROR: -ts_adapt_scale_solve_failed 0.75
> # [0]PETSC ERROR: -ts_adapt_time_step_increase_delay 5
> # [0]PETSC ERROR: -ts_arkimex_type 1bee
> # [0]PETSC ERROR: -ts_dt 1.e-1
> # [0]PETSC ERROR: -ts_max_snes_failures -1
> # [0]PETSC ERROR: -ts_max_steps 1
> # [0]PETSC ERROR: -ts_max_time 1
> # [0]PETSC ERROR: -ts_monitor
> # [0]PETSC ERROR: -ts_rtol 1e-1
> # [0]PETSC ERROR: -ts_type arkimex
> # [0]PETSC ERROR: -use_gpu_aware_mpi 0
> # [0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov----------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20220107/59df1a10/attachment-0001.html>


More information about the petsc-dev mailing list