[petsc-users] CUDA running out of memory in PtAP

Mark Adams mfadams at lbl.gov
Wed Jul 7 09:24:24 CDT 2021


I think that is a good idea. I am trying to do it myself but it is getting
messy.
Thanks,

On Wed, Jul 7, 2021 at 9:50 AM Stefano Zampini <stefano.zampini at gmail.com>
wrote:

> Do you want me to open an MR to handle the sequential case?
>
> On Jul 7, 2021, at 3:39 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
> OK, I found where its not protected in sequential.
>
> On Wed, Jul 7, 2021 at 9:25 AM Mark Adams <mfadams at lbl.gov> wrote:
>
>> Thanks, but that did not work.
>>
>> It looks like this is just in MPIAIJ, but I am using SeqAIJ. ex2 (below)
>> uses PETSC_COMM_SELF everywhere.
>>
>> + srun -G 1 -n 16 -c 1 --cpu-bind=cores --ntasks-per-core=2
>> /global/homes/m/madams/mps-wrapper.sh ../ex2 -dm_landau_device_type cuda
>> -dm_mat_type aijcusparse -dm_vec_type cuda -log_view -pc_type gamg
>> -ksp_type gmres -pc_gamg_reuse_interpolation *-matmatmult_backend_cpu
>> -matptap_backend_cpu *-dm_landau_ion_masses .0005,1,1,1,1,1,1,1,1
>> -dm_landau_ion_charges 1,2,3,4,5,6,7,8,9 -dm_landau_thermal_temps
>> 1,1,1,1,1,1,1,1,1,1 -dm_landau_n
>> 1.000003,.5,1e-7,1e-7,1e-7,1e-7,1e-7,1e-7,1e-7,1e-7
>> 0 starting nvidia-cuda-mps-control on cgpu17
>> mps ready: 2021-07-07T06:17:36-07:00
>> masses:        e= 9.109e-31; ions in proton mass units:    5.000e-04
>>  1.000e+00 ...
>> charges:       e=-1.602e-19; charges in elementary units:  1.000e+00
>>  2.000e+00
>> thermal T (K): e= 1.160e+07 i= 1.160e+07 imp= 1.160e+07. v_0= 1.326e+07
>> n_0= 1.000e+20 t_0= 5.787e-06 domain= 5.000e+00
>> CalculateE j0=0. Ec = 0.050991
>> 0 TS dt 1. time 0.
>>   0) species-0: charge density= -1.6054532569865e+01 z-momentum=
>> -1.9059929215360e-19 energy=  2.4178543516210e+04
>>   0) species-1: charge density=  8.0258396545108e+00 z-momentum=
>>  7.0660527288120e-20 energy=  1.2082380663859e+04
>>   0) species-2: charge density=  6.3912608577597e-05 z-momentum=
>> -1.1513901010709e-24 energy=  3.5799558195524e-01
>>   0) species-3: charge density=  9.5868912866395e-05 z-momentum=
>> -1.1513901010709e-24 energy=  3.5799558195524e-01
>>   0) species-4: charge density=  1.2782521715519e-04 z-momentum=
>> -1.1513901010709e-24 energy=  3.5799558195524e-01
>> [7]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [7]PETSC ERROR: GPU resources unavailable
>> [7]PETSC ERROR: CUDA error 2 (cudaErrorMemoryAllocation) : out of memory.
>> Reports alloc failed; this indicates the GPU has run out resources
>> [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
>> for trouble shooting.
>> [7]PETSC ERROR: Petsc Development GIT revision: v3.15.1-569-g270a066c1e
>>  GIT Date: 2021-07-06 03:22:54 -0700
>> [7]PETSC ERROR: ../ex2 on a arch-cori-gpu-opt-gcc named cgpu17 by madams
>> Wed Jul  7 06:17:38 2021
>> [7]PETSC ERROR: Configure options
>> --with-mpi-dir=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc
>> --with-cuda-dir=/usr/common/software/sles15_cgpu/cuda/11.1.1 --CFLAGS="
>> -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g
>> -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g
>> -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10
>> -DLANDAU_MAX_Q=4" --FFLAGS="   -g " --COPTFLAGS="   -O3" --CXXOPTFLAGS="
>> -O3" --FOPTFLAGS="   -O3" --download-fblaslapack=1 --with-debugging=0
>> --with-mpiexec="srun -G 1" --with-cuda-gencodearch=70 --with-batch=0
>> --with-cuda=1 --download-p4est=1 --download-hypre=1 --with-zlib=1
>> PETSC_ARCH=arch-cori-gpu-opt-gcc
>>
>> *[7]PETSC ERROR: #1 MatProductSymbolic_SeqAIJCUSPARSE_SeqAIJCUSPARSE() at
>> /global/u2/m/madams/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2622
>> <http://aijcusparse.cu:2622/>*[7]PETSC ERROR: #2
>> MatProductSymbolic_ABC_Basic() at
>> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:1146
>> [7]PETSC ERROR: #3 MatProductSymbolic() at
>> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:799
>> [7]PETSC ERROR: #4 MatPtAP() at
>> /global/u2/m/madams/petsc/src/mat/interface/matrix.c:9626
>> [7]PETSC ERROR: #5 PCGAMGCreateLevel_GAMG() at
>> /global/u2/m/madams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
>> [7]PETSC ERROR: #6 PCSetUp_GAMG() at
>> /global/u2/m/madams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
>> [7]PETSC ERROR: #7 PCSetUp() at
>> /global/u2/m/madams/petsc/src/ksp/pc/interface/precon.c:1014
>> [7]PETSC ERROR: #8 KSPSetUp() at
>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itfunc.c:406
>> [7]PETSC ERROR: #9 KSPSolve_Private() at
>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itfunc.c:850
>> [7]PETSC ERROR: #10 KSPSolve() at
>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itfunc.c:1084
>> [7]PETSC ERROR: #11 SNESSolve_NEWTONLS() at
>> /global/u2/m/madams/petsc/src/snes/impls/ls/ls.c:225
>> [7]PETSC ERROR: #12 SNESSolve() at
>> /global/u2/m/madams/petsc/src/snes/interface/snes.c:4769
>> [7]PETSC ERROR: #13 TSTheta_SNESSolve() at
>> /global/u2/m/madams/petsc/src/ts/impls/implicit/theta/theta.c:185
>> [7]PETSC ERROR: #14 TSStep_Theta() at
>> /global/u2/m/madams/petsc/src/ts/impls/implicit/theta/theta.c:223
>> [7]PETSC ERROR: #15 TSStep() at
>> /global/u2/m/madams/petsc/src/ts/interface/ts.c:3571
>> [7]PETSC ERROR: #16 TSSolve() at
>> /global/u2/m/madams/petsc/src/ts/interface/ts.c:3968
>> [7]PETSC ERROR: #17 main() at ex2.c:699
>> [7]PETSC ERROR: PETSc Option Table entries:
>> [7]PETSC ERROR: -dm_landau_amr_levels_max 0
>> [7]PETSC ERROR: -dm_landau_amr_post_refine 5
>> [7]PETSC ERROR: -dm_landau_device_type cuda
>> [7]PETSC ERROR: -dm_landau_domain_radius 5
>> [7]PETSC ERROR: -dm_landau_Ez 0
>> [7]PETSC ERROR: -dm_landau_ion_charges 1,2,3,4,5,6,7,8,9
>> [7]PETSC ERROR: -dm_landau_ion_masses .0005,1,1,1,1,1,1,1,1
>> [7]PETSC ERROR: -dm_landau_n
>> 1.000003,.5,1e-7,1e-7,1e-7,1e-7,1e-7,1e-7,1e-7,1e-7
>> [7]PETSC ERROR: -dm_landau_thermal_temps 1,1,1,1,1,1,1,1,1,1
>> [7]PETSC ERROR: -dm_landau_type p4est
>> [7]PETSC ERROR: -dm_mat_type aijcusparse
>> [7]PETSC ERROR: -dm_preallocate_only
>> [7]PETSC ERROR: -dm_vec_type cuda
>> [7]PETSC ERROR: -ex2_connor_e_field_units
>> [7]PETSC ERROR: -ex2_impurity_index 1
>> [7]PETSC ERROR: -ex2_plot_dt 200
>> [7]PETSC ERROR: -ex2_test_type none
>> [7]PETSC ERROR: -ksp_type gmres
>> [7]PETSC ERROR: -log_view
>>
>> *[7]PETSC ERROR: -matmatmult_backend_cpu[7]PETSC ERROR:
>> -matptap_backend_cpu*
>> [7]PETSC ERROR: -pc_gamg_reuse_interpolation
>> [7]PETSC ERROR: -pc_type gamg
>> [7]PETSC ERROR: -petscspace_degree 1
>> [7]PETSC ERROR: -snes_max_it 15
>> [7]PETSC ERROR: -snes_rtol 1.e-6
>> [7]PETSC ERROR: -snes_stol 1.e-6
>> [7]PETSC ERROR: -ts_adapt_scale_solve_failed 0.5
>> [7]PETSC ERROR: -ts_adapt_time_step_increase_delay 5
>> [7]PETSC ERROR: -ts_dt 1
>> [7]PETSC ERROR: -ts_exact_final_time stepover
>> [7]PETSC ERROR: -ts_max_snes_failures -1
>> [7]PETSC ERROR: -ts_max_steps 10
>> [7]PETSC ERROR: -ts_max_time 300
>> [7]PETSC ERROR: -ts_rtol 1e-2
>> [7]PETSC ERROR: -ts_type beuler
>>
>> On Wed, Jul 7, 2021 at 4:07 AM Stefano Zampini <stefano.zampini at gmail.com>
>> wrote:
>>
>>> This will select the CPU path
>>>
>>> -matmatmult_backend_cpu -matptap_backend_cpu
>>>
>>> On Jul 7, 2021, at 2:43 AM, Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>> Can I turn off using cuSprarse for RAP?
>>>
>>> On Tue, Jul 6, 2021 at 6:25 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>
>>>>
>>>>   Stefano has mentioned this before. He reported cuSparse matrix-matrix
>>>> vector products use a very amount of memory.
>>>>
>>>> On Jul 6, 2021, at 4:33 PM, Mark Adams <mfadams at lbl.gov> wrote:
>>>>
>>>> I am running out of memory in GAMG. It looks like this is from the new
>>>> cuSparse RAP.
>>>> I was able to run Hypre with twice as much work on the GPU as this run.
>>>> Are there parameters to tweek for this perhaps or can I disable it?
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>>    0 SNES Function norm 5.442539952302e-04
>>>> [2]PETSC ERROR: --------------------- Error Message
>>>> --------------------------------------------------------------
>>>> [2]PETSC ERROR: GPU resources unavailable
>>>> [2]PETSC ERROR: CUDA error 2 (cudaErrorMemoryAllocation) : out of
>>>> memory. Reports alloc failed; this indicates the GPU has run out resources
>>>> [2]PETSC ERROR: See
>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>>>> shooting.
>>>> [2]PETSC ERROR: Petsc Development GIT revision: v3.15.1-569-g270a066c1e
>>>>  GIT Date: 2021-07-06 03:22:54 -0700
>>>> [2]PETSC ERROR: ../ex2 on a arch-cori-gpu-opt-gcc named cgpu11 by
>>>> madams Tue Jul  6 13:37:43 2021
>>>> [2]PETSC ERROR: Configure options
>>>> --with-mpi-dir=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc
>>>> --with-cuda-dir=/usr/common/software/sles15_cgpu/cuda/11.1.1 --CFLAGS="
>>>> -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECI
>>>> ES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2
>>>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler
>>>> -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4"
>>>> --FFLAGS="   -g " -
>>>> -COPTFLAGS="   -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS="   -O3"
>>>> --download-fblaslapack=1 --with-debugging=0 --with-mpiexec="srun -G 1"
>>>> --with-cuda-gencodearch=70 --with-batch=0 --with-cuda=1 --download-p4est=1
>>>> --
>>>> download-hypre=1 --with-zlib=1 PETSC_ARCH=arch-cori-gpu-opt-gcc
>>>> [2]PETSC ERROR: #1 MatProductSymbolic_SeqAIJCUSPARSE_SeqAIJCUSPARSE()
>>>> at /global/u2/m/madams/petsc/src/mat/impls/aij/seq/seqcusparse/
>>>> aijcusparse.cu:2622
>>>> [2]PETSC ERROR: #2 MatProductSymbolic_ABC_Basic() at
>>>> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:1159
>>>> [2]PETSC ERROR: #3 MatProductSymbolic() at
>>>> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:799
>>>> [2]PETSC ERROR: #4 MatPtAP() at
>>>> /global/u2/m/madams/petsc/src/mat/interface/matrix.c:9626
>>>> [2]PETSC ERROR: #5 PCGAMGCreateLevel_GAMG() at
>>>> /global/u2/m/madams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
>>>> [2]PETSC ERROR: #6 PCSetUp_GAMG() at
>>>> /global/u2/m/madams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
>>>> [2]PETSC ERROR: #7 PCSetUp() at
>>>> /global/u2/m/madams/petsc/src/ksp/pc/interface/precon.c:1014
>>>> [2]PETSC ERROR: #8 KSPSetUp() at
>>>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itfunc.c:406
>>>> [2]PETSC ERROR: #9 KSPSolve_Private() at
>>>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itfunc.c:850
>>>> [2]PETSC ERROR: #10 KSPSolve() at
>>>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itfunc.c:1084
>>>> [2]PETSC ERROR: #11 SNESSolve_NEWTONLS() at
>>>> /global/u2/m/madams/petsc/src/snes/impls/ls/ls.c:225
>>>> [2]PETSC ERROR: #12 SNESSolve() at
>>>> /global/u2/m/madams/petsc/src/snes/interface/snes.c:4769
>>>> [2]PETSC ERROR: #13 TSTheta_SNESSolve() at
>>>> /global/u2/m/madams/petsc/src/ts/impls/implicit/theta/theta.c:185
>>>> [2]PETSC ERROR: #14 TSStep_Theta() at
>>>> /global/u2/m/madams/petsc/src/ts/impls/implicit/theta/theta.c:223
>>>> [2]PETSC ERROR: #15 TSStep() at
>>>> /global/u2/m/madams/petsc/src/ts/interface/ts.c:3571
>>>> [2]PETSC ERROR: #16 TSSolve() at
>>>> /global/u2/m/madams/petsc/src/ts/interface/ts.c:3968
>>>> [2]PETSC ERROR: #17 main() at ex2.c:699
>>>>
>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210707/eb23c2c5/attachment-0001.html>


More information about the petsc-users mailing list