[petsc-dev] Hypre error on updated Perlmutter

Mark Adams mfadams at lbl.gov
Fri Dec 10 09:41:01 CST 2021


On Fri, Dec 10, 2021 at 10:39 AM Paul Lin <paullin at lbl.gov> wrote:

> Hi Mark,
>
> Regarding the error:
> "PETSC ERROR: cuda error 46 (cudaErrorDevicesUnavailable) : all
> CUDA-capable devices are busy or unavailable"
>
> how are you requesting a perlmutter compute node?
>

I can run CUDA jobs so I think it is a problem with the (hypre) build.


>
> thanks
> -paul
>
>
>
> On Fri, Dec 10, 2021 at 6:54 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>> And more Perlmutter weirdness.
>>
>> If I configure with the above CRAY_ACCEL_TARGET=nvidia80 I get this
>> (configure.log) error. (some CUDA aware MPI related errors)
>>
>> But if I configure with CRAY_ACCEL_TARGET="" it gets into Kokkos and I
>> get this configure2.log with:
>>
>>   #error -- unsupported pgc++ configuration! Only pgc++ 18, 19 and 20 are
>> supported!
>>
>> I have not seen this before.
>>
>> As far as the first problem, If I load the cudatoolkit, which they say
>> you can do *or* set CRAY_ACCEL_TARGET=nvidia80 , the problems go away or
>> maybe fails before it gets to the first error, but it fails.
>> I get the configure3 error that has these old warnings, but I'm not sure
>> why it failed exactly.
>>
>> This was sort of working yesterday. I did rebase today, but even when
>> working this has been fragile.
>>
>> Any suggestions?
>> Thanks,
>> Mark
>>
>>
>> On Thu, Dec 9, 2021 at 1:59 PM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> Well I found, accidentally, that turning CUDA aware MPI on with export
>>> CRAY_ACCEL_TARGET=nvidia80
>>>  seems to have fixed this.
>>> Not sure what is going on.
>>>
>>> On Thu, Dec 9, 2021 at 11:21 AM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> I am getting this error. I have built this w/o hypre and the test are
>>>> fine, including the CUDA tests.
>>>> Any ideas?
>>>>
>>>> I notice that the tests use -dm_mat_type aijcusparse with hypre.
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>> 08:13 nid003929 adams/fix_mat_ex5k= perlmutter:~/petsc$ make
>>>> PETSC_DIR=/global/homes/m/madams/petsc
>>>> PETSC_ARCH=arch-perlmutter-opt-nvidia-cuda check
>>>> Running check examples to verify correct installation
>>>> Using PETSC_DIR=/global/homes/m/madams/petsc and
>>>> PETSC_ARCH=arch-perlmutter-opt-nvidia-cuda
>>>> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI
>>>> process
>>>> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI
>>>> processes
>>>> 1,5c1,70
>>>> < lid velocity = 0.0016, prandtl # = 1., grashof # = 1.
>>>> <   0 SNES Function norm 0.0406612
>>>> <   1 SNES Function norm 4.12227e-06
>>>> <   2 SNES Function norm 6.098e-11
>>>> < Number of SNES iterations = 2
>>>> ---
>>>> > [0]PETSC ERROR: --------------------- Error Message
>>>> --------------------------------------------------------------
>>>> > [0]PETSC ERROR: GPU error
>>>> > [0]PETSC ERROR: cuda error 46 (cudaErrorDevicesUnavailable) : all
>>>> CUDA-capable devices are busy or unavailable
>>>> > [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
>>>> shooting.
>>>> > [0]PETSC ERROR: Petsc Development GIT revision:
>>>> v3.16.1-442-gebb4a459f5  GIT Date: 2021-12-08 08:59:23 -0500
>>>> > [0]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tutorials/./ex19
>>>> on a  named nid003929 by madams Thu Dec  9 08:13:49 2021
>>>> > [0]PETSC ERROR: Configure options --CFLAGS="   -g -DLANDAU_DIM=2
>>>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -mp=gpu" --CXXFLAGS=" -g
>>>> -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -mp=gpu" --FFLAGS="
>>>>   -g -mp=gpu" --with-cc=cc --with-cxx=CC --with-fc=ftn
>>>> --with-cudac=/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvcc
>>>> --with-debugging=0 --download-hypre=1 --with-cuda=1 --with-cuda-arch=80
>>>> --with-mpiexec=srun --with-batch=0 --download-p4est=1 --with-zlib=1
>>>> --with-make-np=8
>>>> --prefix=/global/cfs/projectdirs/m3904/petsc/current/perlmutter-opt-nvidia21.9
>>>> PETSC_ARCH=arch-perlmutter-opt-nvidia-cuda
>>>> > [0]PETSC ERROR: #1 initialize() at
>>>> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:70
>>>> > [0]PETSC ERROR: #2 getDevice() at
>>>> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:360
>>>> > [0]PETSC ERROR: #3 PetscDeviceCreate() at
>>>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:102
>>>> > [0]PETSC ERROR: #4 PetscDeviceInitializeDefaultDevice_Internal() at
>>>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:266
>>>> > [1]PETSC ERROR: --------------------- Error Message
>>>> --------------------------------------------------------------
>>>> > [1]PETSC ERROR: GPU error
>>>> > [1]PETSC ERROR: cuda error 46 (cudaErrorDevicesUnavailable) : all
>>>> CUDA-capable devices are busy or unavailable
>>>> > [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble
>>>> shooting.
>>>> > [1]PETSC ERROR: Petsc Development GIT revision:
>>>> v3.16.1-442-gebb4a459f5  GIT Date: 2021-12-08 08:59:23 -0500
>>>> > [1]PETSC ERROR: /global/u2/m/madams/petsc/src/snes/tutorials/./ex19
>>>> on a  named nid003929 by madams Thu Dec  9 08:13:49 2021
>>>> > [1]PETSC ERROR: Configure options --CFLAGS="   -g -DLANDAU_DIM=2
>>>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -mp=gpu" --CXXFLAGS=" -g
>>>> -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4 -mp=gpu" --FFLAGS="
>>>>   -g -mp=gpu" --with-cc=cc --with-cxx=CC --with-fc=ftn
>>>> --with-cudac=/opt/nvidia/hpc_sdk/Linux_x86_64/21.9/compilers/bin/nvcc
>>>> --with-debugging=0 --download-hypre=1 --with-cuda=1 --with-cuda-arch=80
>>>> --with-mpiexec=srun --with-batch=0 --download-p4est=1 --with-zlib=1
>>>> --with-make-np=8
>>>> --prefix=/global/cfs/projectdirs/m3904/petsc/current/perlmutter-opt-nvidia21.9
>>>> PETSC_ARCH=arch-perlmutter-opt-nvidia-cuda
>>>> > [1]PETSC ERROR: #1 initialize() at
>>>> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:70
>>>> > [1]PETSC ERROR: #2 getDevice() at
>>>> /global/u2/m/madams/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:360
>>>> > [1]PETSC ERROR: #3 PetscDeviceCreate() at
>>>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:102
>>>> > [1]PETSC ERROR: #4 PetscDeviceInitializeDefaultDevice_Internal() at
>>>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:266
>>>> > [1]PETSC ERROR: #5 PetscDeviceInitialize() at
>>>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:227
>>>> > [1]PETSC ERROR: #6 PCCreate_HYPRE() at
>>>> /global/u2/m/madams/petsc/src/ksp/pc/impls/hypre/hypre.c:2224
>>>> > [1]PETSC ERROR: #7 PCSetType() at
>>>> /global/u2/m/madams/petsc/src/ksp/pc/interface/pcset.c:84
>>>> > [1]PETSC ERROR: #8 PCSetFromOptions() at
>>>> /global/u2/m/madams/petsc/src/ksp/pc/interface/pcset.c:154
>>>> > [1]PETSC ERROR: #9 KSPSetFromOptions() at
>>>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itcl.c:356
>>>> > [1]PETSC ERROR: #10 SNESSetFromOptions() at
>>>> /global/u2/m/madams/petsc/src/snes/interface/snes.c:1113
>>>> > [1]PETSC ERROR: #11 main() at ex19.c:150
>>>> > [1]PETSC ERROR: PETSc Option Table entries:
>>>> > [1]PETSC ERROR: -da_refine 3
>>>> > [1]PETSC ERROR: -dm_mat_type aijcusparse
>>>> > [1]PETSC ERROR: -dm_vec_type cuda
>>>> > [0]PETSC ERROR: #5 PetscDeviceInitialize() at
>>>> /global/u2/m/madams/petsc/src/sys/objects/device/interface/device.cxx:227
>>>> > [0]PETSC ERROR: #6 PCCreate_HYPRE() at
>>>> /global/u2/m/madams/petsc/src/ksp/pc/impls/hypre/hypre.c:2224
>>>> > [0]PETSC ERROR: #7 PCSetType() at
>>>> /global/u2/m/madams/petsc/src/ksp/pc/interface/pcset.c:84
>>>> > [0]PETSC ERROR: #8 PCSetFromOptions() at
>>>> /global/u2/m/madams/petsc/src/ksp/pc/interface/pcset.c:154
>>>> > [0]PETSC ERROR: #9 KSPSetFromOptions() at
>>>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itcl.c:356
>>>> > [0]PETSC ERROR: #10 SNESSetFromOptions() at
>>>> /global/u2/m/madams/petsc/src/snes/interface/snes.c:1113
>>>> > [0]PETSC ERROR: #11 main() at ex19.c:150
>>>> > [0]PETSC ERROR: PETSc Option Table entries:
>>>> > [0]PETSC ERROR: -da_refine 3
>>>> > [0]PETSC ERROR: -dm_mat_type aijcusparse
>>>> > [0]PETSC ERROR: -dm_vec_type cuda
>>>> > [0]PETSC ERROR: -ksp_norm_type unpreconditioned
>>>> > [0]PETSC ERROR: -nox
>>>> > [0]PETSC ERROR: -nox_warning
>>>> > [0]PETSC ERROR: -pc_type hypre
>>>> > [0]PETSC ERROR: -snes_monitor_short
>>>> > [0]PETSC ERROR: -use_gpu_aware_mpi 0
>>>> > [0]PETSC ERROR: ----------------End of Error Message -------send
>>>> entire error message to petsc-maint at mcs.anl.gov----------
>>>> > [1]PETSC ERROR: -ksp_norm_type unpreconditioned
>>>> > [1]PETSC ERROR: -nox
>>>> > [1]PETSC ERROR: -nox_warning
>>>> > [1]PETSC ERROR: -pc_type hypre
>>>> > [1]PETSC ERROR: -snes_monitor_short
>>>> > [1]PETSC ERROR: -use_gpu_aware_mpi 0
>>>> > [1]PETSC ERROR: ----------------End of Error Message -------send
>>>> entire error message to petsc-maint at mcs.anl.gov----------
>>>> > MPICH Notice [Rank 0] [job id 832277.2] [Thu Dec  9 08:13:50 2021]
>>>> [nid003929] - Abort(97) (rank 0 in comm 0): application called
>>>> MPI_Abort(MPI_COMM_WORLD, 97) - process 0
>>>> >
>>>> > aborting job:
>>>> > application called MPI_Abort(MPI_COMM_WORLD, 97) - process 0
>>>> > MPICH Notice [Rank 1] [job id 832277.2] [Thu Dec  9 08:13:50 2021]
>>>> [nid003929] - Abort(97) (rank 1 in comm 0): application called
>>>> MPI_Abort(MPI_COMM_WORLD, 97) - process 1
>>>> >
>>>> > aborting job:
>>>> > application called MPI_Abort(MPI_COMM_WORLD, 97) - process 1
>>>> > srun: error: nid003929: task 1: Exited with exit code 255
>>>> > srun: launch/slurm: _step_signal: Terminating StepId=832277.2
>>>> > slurmstepd: error: *** STEP 832277.2 ON nid003929 CANCELLED AT
>>>> 2021-12-09T16:13:50 ***
>>>> > srun: error: nid003929: task 0: Exited with exit code 255
>>>> /global/homes/m/madams/petsc/src/snes/tutorials
>>>> Possible problem with ex19 running with hypre, diffs above
>>>> =========================================
>>>> C/C++ example src/snes/tutorials/ex19 run successfully with cuda
>>>> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI
>>>> process
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20211210/7511bc49/attachment.html>


More information about the petsc-dev mailing list