[petsc-users] CUDA error trying to run a job with two mpi processes and 1 GPU

Junchao Zhang junchao.zhang at gmail.com
Mon Aug 14 15:37:30 CDT 2023


I don't see a problem in the matrix assembly.
If you point me to your repo and show me how to build it, I can try to
reproduce.

--Junchao Zhang


On Mon, Aug 14, 2023 at 2:53 PM Vanella, Marcos (Fed) <
marcos.vanella at nist.gov> wrote:

> Hi Junchao, I've tried for my case using the -ksp_type gmres and -pc_type
> asm with -mat_type aijcusparse -sub_pc_factor_mat_solver_type cusparse as
> (I understand) is done in the ex60. The error is always the same, so it
> seems it is not related to ksp,pc. Indeed it seems to happen when trying to
> offload the Matrix to the GPU:
>
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> terminate called after throwing an instance of
> 'thrust::system::system_error'
>   what():  parallel_for failed: cudaErrorInvalidConfiguration: invalid
> configuration argument
>   what():  parallel_for failed: cudaErrorInvalidConfiguration: invalid
> configuration argument
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
> #0  0x2000397fcd8f in ???
> ...
> #8  0x20003935fc6b in ???
> #9  0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc
> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225
> #10  0x11ec769b in
> _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_
> at
> /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88
> #11  0x11efd6a3 in
> _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_
> #9  0x11ec769b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc
> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225
> #10  0x11ec769b in
> _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_
> at
> /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88
> #11  0x11efd6a3 in
> _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_
> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55
> #12  0x11efd6a3 in
> _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93
> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55
> #12  0x11efd6a3 in
> _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93
> #13  0x11efd6a3 in
> _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104
> #14  0x11efd6a3 in
> _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm
> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254
> #15  0x11efd6a3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> #13  0x11efd6a3 in
> _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104
> #14  0x11efd6a3 in
> _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm
> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254
> #15  0x11efd6a3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220
> #16  0x11efd6a3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213
> #17  0x11efd6a3 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65
> #18  0x11edb287 in
> _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em
> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88
> #19  0x11edb287 in *MatSeqAIJCUSPARSECopyToGPU*
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:2488
> #20  0x11edfd1b in *MatSeqAIJCUSPARSEGetIJ*
> ...
> ...
>
> This is the piece of fortran code I have doing this within my Poisson
> solver:
>
> ! Create Parallel PETSc Sparse matrix for this ZSL: Set diag/off diag
> blocks nonzeros per row to 5.
> CALL MATCREATEAIJ(MPI_COMM_WORLD,ZSL%NUNKH_LOCAL,ZSL%NUNKH_LOCAL,ZSL%
> NUNKH_TOTAL,ZSL%NUNKH_TOTAL,&
>                   7,PETSC_NULL_INTEGER,7,PETSC_NULL_INTEGER,ZSL%PETSC_ZS%
> A_H,PETSC_IERR)
> CALL MATSETFROMOPTIONS(ZSL%PETSC_ZS%A_H,PETSC_IERR)
> DO IROW=1,ZSL%NUNKH_LOCAL
>    DO JCOL=1,ZSL%NNZ_D_MAT_H(IROW)
>       ! PETSC expects zero based indexes.1,Global I position (zero
> base),1,Global J position (zero base)
>       CALL MATSETVALUES(ZSL%PETSC_ZS%A_H,1,ZSL%UNKH_IND(NM_START)+IROW-1,1
> ,ZSL%JD_MAT_H(JCOL,IROW)-1,&
>                         ZSL%D_MAT_H(JCOL,IROW),INSERT_VALUES,PETSC_IERR)
>    ENDDO
> ENDDO
> CALL MATASSEMBLYBEGIN(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR)
> CALL MATASSEMBLYEND(ZSL%PETSC_ZS%A_H, MAT_FINAL_ASSEMBLY, PETSC_IERR)
>
> Note that I allocate d_nz=7 and o_nz=7 per row (more than enough size),
> and add nonzero values one by one. I wonder if there is something related
> to this that the copying to GPU does not like.
> Thanks,
> Marcos
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Monday, August 14, 2023 3:24 PM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
> *Cc:* PETSc users list <petsc-users at mcs.anl.gov>; Satish Balay <
> balay at mcs.anl.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> Yeah, it looks like ex60 was run correctly.
> Double check your code again and if you still run into errors, we can try
> to reproduce on our end.
>
> Thanks.
> --Junchao Zhang
>
>
> On Mon, Aug 14, 2023 at 1:05 PM Vanella, Marcos (Fed) <
> marcos.vanella at nist.gov> wrote:
>
> Hi Junchao, I compiled and run ex60 through slurm in our Enki system. The
> batch script for slurm submission, ex60.log and gpu stats files are
> attached.
> Nothing stands out as wrong to me but please have a look.
> I'll revisit running the original 2 MPI process + 1 GPU Poisson problem.
> Thanks!
> Marcos
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Friday, August 11, 2023 5:52 PM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
> *Cc:* PETSc users list <petsc-users at mcs.anl.gov>; Satish Balay <
> balay at mcs.anl.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> Before digging into the details, could you try to run
> src/ksp/ksp/tests/ex60.c to make sure the environment is ok.
>
> The comment at the end shows how to run it
>    test:
>       requires: cuda
>       suffix: 1_cuda
>       nsize: 4
>       args: -ksp_view -mat_type aijcusparse -sub_pc_factor_mat_solver_type
> cusparse
>
> --Junchao Zhang
>
>
> On Fri, Aug 11, 2023 at 4:36 PM Vanella, Marcos (Fed) <
> marcos.vanella at nist.gov> wrote:
>
> Hi Junchao, thank you for the info. I compiled the main branch of PETSc in
> another machine that has the  openmpi/4.1.4/gcc-11.2.1-cuda-11.7 toolchain
> and don't see the fortran compilation error. It might have been related to
> gcc-9.3.
> I tried the case again, 2 CPUs and one GPU and get this error now:
>
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> terminate called after throwing an instance of
> 'thrust::system::system_error'
>   what():  parallel_for failed: cudaErrorInvalidConfiguration: invalid
> configuration argument
>   what():  parallel_for failed: cudaErrorInvalidConfiguration: invalid
> configuration argument
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
> #0  0x2000397fcd8f in ???
> #1  0x2000397fb657 in ???
> #0  0x2000397fcd8f in ???
> #1  0x2000397fb657 in ???
> #2  0x2000000604d7 in ???
> #2  0x2000000604d7 in ???
> #3  0x200039cb9628 in ???
> #4  0x200039c93eb3 in ???
> #5  0x200039364a97 in ???
> #6  0x20003935f6d3 in ???
> #7  0x20003935f78f in ???
> #8  0x20003935fc6b in ???
> #3  0x200039cb9628 in ???
> #4  0x200039c93eb3 in ???
> #5  0x200039364a97 in ???
> #6  0x20003935f6d3 in ???
> #7  0x20003935f78f in ???
> #8  0x20003935fc6b in ???
> #9  0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc
> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225
> #10  0x11ec425b in
> _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_
> #9  0x11ec425b in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc
> at /usr/local/cuda-11.7/include/thrust/system/cuda/detail/util.h:225
> #10  0x11ec425b in
> _ZN6thrust8cuda_cub20uninitialized_fill_nINS0_3tagENS_10device_ptrIiEEmiEET0_RNS0_16execution_policyIT_EES5_T1_RKT2_
> at
> /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88
> #11  0x11efa263 in
> _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_
> at
> /usr/local/cuda-11.7/include/thrust/system/cuda/detail/uninitialized_fill.h:88
> #11  0x11efa263 in
> _ZN6thrust20uninitialized_fill_nINS_8cuda_cub3tagENS_10device_ptrIiEEmiEET0_RKNS_6detail21execution_policy_baseIT_EES5_T1_RKT2_
> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55
> #12  0x11efa263 in
> _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93
> #13  0x11efa263 in
> _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104
> at /usr/local/cuda-11.7/include/thrust/detail/uninitialized_fill.inl:55
> #12  0x11efa263 in
> _ZN6thrust6detail23allocator_traits_detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEENS0_10disable_ifIXsrNS1_37needs_default_construct_via_allocatorIT_NS0_15pointer_elementIT0_E4typeEEE5valueEvE4typeERS9_SB_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:93
> #13  0x11efa263 in
> _ZN6thrust6detail23default_construct_rangeINS_16device_allocatorIiEENS_10device_ptrIiEEmEEvRT_T0_T1_
> at
> /usr/local/cuda-11.7/include/thrust/detail/allocator/default_construct_range.inl:104
> #14  0x11efa263 in
> _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm
> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254
> #15  0x11efa263 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220
> #14  0x11efa263 in
> _ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE19default_construct_nENS0_15normal_iteratorINS_10device_ptrIiEEEEm
> at /usr/local/cuda-11.7/include/thrust/detail/contiguous_storage.inl:254
> #15  0x11efa263 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:220
> #16  0x11efa263 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213
> #17  0x11efa263 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65
> #18  0x11ed7e47 in
> _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em
> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88
> #19  0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:2488
> #20  0x11eef623 in MatSeqAIJCUSPARSEMergeMats
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:4696
> #16  0x11efa263 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE12default_initEm
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:213
> #17  0x11efa263 in
> _ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC2Em
> at /usr/local/cuda-11.7/include/thrust/detail/vector_base.inl:65
> #18  0x11ed7e47 in
> _ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC4Em
> at /usr/local/cuda-11.7/include/thrust/device_vector.h:88
> #19  0x11ed7e47 in MatSeqAIJCUSPARSECopyToGPU
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:2488
> #20  0x11eef623 in MatSeqAIJCUSPARSEMergeMats
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:4696
> #21  0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/
> mpiaijcusparse.cu:251
> #21  0x11f0682b in MatMPIAIJGetLocalMatMerge_MPIAIJCUSPARSE
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/
> mpiaijcusparse.cu:251
> #22  0x133f141f in MatMPIAIJGetLocalMatMerge
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342
> #22  0x133f141f in MatMPIAIJGetLocalMatMerge
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:5342
> #23  0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368
> #23  0x133fe9cb in MatProductSymbolic_MPIAIJBACKEND
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7368
> #24  0x1377e1df in MatProductSymbolic
> at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795
> #24  0x1377e1df in MatProductSymbolic
> at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:795
> #25  0x11e4dd1f in MatPtAP
> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934
> #25  0x11e4dd1f in MatPtAP
> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9934
> #26  0x130d792f in MatCoarsenApply_MISK_private
> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283
> #26  0x130d792f in MatCoarsenApply_MISK_private
> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283
> #27  0x130db89b in MatCoarsenApply_MISK
> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368
> #27  0x130db89b in MatCoarsenApply_MISK
> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368
> #28  0x130bf5a3 in MatCoarsenApply
> at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97
> #28  0x130bf5a3 in MatCoarsenApply
> at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97
> #29  0x141518ff in PCGAMGCoarsen_AGG
> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524
> #29  0x141518ff in PCGAMGCoarsen_AGG
> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524
> #30  0x13b3a43f in PCSetUp_GAMG
> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631
> #30  0x13b3a43f in PCSetUp_GAMG
> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631
> #31  0x1276845b in PCSetUp
> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069
> #31  0x1276845b in PCSetUp
> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:1069
> #32  0x127d6cbb in KSPSetUp
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415
> #32  0x127d6cbb in KSPSetUp
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:415
> #33  0x127dddbf in KSPSolve_Private
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836
> #33  0x127dddbf in KSPSolve_Private
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:836
> #34  0x127e4987 in KSPSolve
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082
> #34  0x127e4987 in KSPSolve
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1082
> #35  0x1280b18b in kspsolve_
> at
> /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335
> #35  0x1280b18b in kspsolve_
> at
> /home/mnv/Software/petsc/arch-linux-c-dbg/src/ksp/ksp/interface/ftn-auto/itfuncf.c:335
> #36  0x1140945f in __globmat_solver_MOD_glmat_solver
> at ../../Source/pres.f90:3128
> #36  0x1140945f in __globmat_solver_MOD_glmat_solver
> at ../../Source/pres.f90:3128
> #37  0x119f8853 in pressure_iteration_scheme
> at ../../Source/main.f90:1449
> #37  0x119f8853 in pressure_iteration_scheme
> at ../../Source/main.f90:1449
> #38  0x11969bd3 in fds
> at ../../Source/main.f90:688
> #38  0x11969bd3 in fds
> at ../../Source/main.f90:688
> #39  0x11a10167 in main
> at ../../Source/main.f90:6
> #39  0x11a10167 in main
> at ../../Source/main.f90:6
> srun: error: enki12: tasks 0-1: Aborted (core dumped)
>
>
> This was the slurm submission script in this case:
>
> #!/bin/bash
> # ../../Utilities/Scripts/qfds.sh -p 2  -T db -d test.fds
> #SBATCH -J test
> #SBATCH -e /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.err
> #SBATCH -o /home/mnv/Firemodels_fork/fds/Issues/PETSc/test.log
> #SBATCH --partition=debug
> #SBATCH --ntasks=2
> #SBATCH --nodes=1
> #SBATCH --cpus-per-task=1
> #SBATCH --ntasks-per-node=2
> #SBATCH --time=01:00:00
> #SBATCH --gres=gpu:1
>
> export OMP_NUM_THREADS=1
>
> # PETSc dir and arch:
> export PETSC_DIR=/home/mnv/Software/petsc
> export PETSC_ARCH=arch-linux-c-dbg
>
> # SYSTEM name:
> export MYSYSTEM=enki
>
> # modules
> module load cuda/11.7
> module load gcc/11.2.1/toolset
> module load openmpi/4.1.4/gcc-11.2.1-cuda-11.7
>
> cd /home/mnv/Firemodels_fork/fds/Issues/PETSc
> srun -N 1 -n 2 --ntasks-per-node 2 --mpi=pmi2
> /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db
> test.fds -vec_type mpicuda -mat_type mpiaijcusparse -pc_type gamg
>
> The configure.log for the PETSc build is attached.  Another clue to what
> is happening is that even setting the matrices/vectors to be mpi (-vec_type
> mpi -mat_type mpiaij) and not requesting a gpu I get a GPU warning :
>
> 0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [1]PETSC ERROR: GPU error
> [1]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100
> (cudaErrorNoDevice) : no CUDA-capable device is detected
> [1]PETSC ERROR: WARNING! There are unused option(s) set! Could be the
> program crashed before usage or a spelling mistake, etc!
> [0]PETSC ERROR: GPU error
> [0]PETSC ERROR: Cannot lazily initialize PetscDevice: cuda error 100
> (cudaErrorNoDevice) : no CUDA-capable device is detected
> [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the
> program crashed before usage or a spelling mistake, etc!
> [0]PETSC ERROR:   Option left: name:-pc_type value: gamg source: command
> line
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [1]PETSC ERROR:   Option left: name:-pc_type value: gamg source: command
> line
> [1]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [1]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad
>  GIT Date: 2023-08-11 15:13:02 +0000
> [0]PETSC ERROR: Petsc Development GIT revision: v3.19.4-946-g590ad0f52ad
>  GIT Date: 2023-08-11 15:13:02 +0000
> [0]PETSC ERROR:
> /home/mnv/Firemodels_fork/fds/Build/ompi_gnu_linux_db/fds_ompi_gnu_linux_db
> on a arch-linux-c-dbg named enki11.adlp by mnv Fri Aug 11 17:04:55 2023
> [0]PETSC ERROR: Configure options COPTFLAGS="-g -O2" CXXOPTFLAGS="-g -O2"
> FOPTFLAGS="-g -O2" FCOPTFLAGS="-g -O2" CUDAOPTFLAGS="-g -O2"
> --with-debugging=yes --with-shared-libraries=0 --download-suitesparse
> --download-hypre --download-fblaslapack --with-cuda
> ...
>
> I would have expected not to see GPU errors being printed out, given I did
> not request cuda matrix/vectors. The case run anyways, I assume it
> defaulted to the CPU solver.
> Let me know if you have any ideas as to what is happening. Thanks,
> Marcos
>
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Friday, August 11, 2023 3:35 PM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>; PETSc users list <
> petsc-users at mcs.anl.gov>; Satish Balay <balay at mcs.anl.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> Marcos,
>   We do not have good petsc/gpu documentation, but see
> https://petsc.org/main/faq/#doc-faq-gpuhowto, and also search "requires:
> cuda" in petsc tests and you will find examples using GPU.
>   For the Fortran compile errors, attach your configure.log and Satish
> (Cc'ed) or others should know how to fix them.
>
>   Thanks.
> --Junchao Zhang
>
>
> On Fri, Aug 11, 2023 at 2:22 PM Vanella, Marcos (Fed) <
> marcos.vanella at nist.gov> wrote:
>
> Hi Junchao, thanks for the explanation. Is there some development
> documentation on the GPU work? I'm interested learning about it.
> I checked out the main branch and configured petsc. when compiling with
> gcc/gfortran I come across this error:
>
> ....
>       CUDAC
> arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o
>   CUDAC.dep
> arch-linux-c-opt/obj/src/mat/impls/aij/seq/seqcusparse/aijcusparse.o
>          FC arch-linux-c-opt/obj/src/ksp/f90-mod/petsckspdefmod.o
>          FC arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:37:61:
>
>    37 |       subroutine PCASMCreateSubdomains2D(a,b,c,d,e,f,g,h,i,z)
>       |                                                             1
> *Error: Symbol ‘pcasmcreatesubdomains2d’ at (1) already has an explicit
> interface*
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:38:13:
>
>    38 |        import tIS
>       |             1
> Error: IMPORT statement at (1) only permitted in an INTERFACE body
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:39:80:
>
>    39 |        PetscInt a ! PetscInt
>       |
>              1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:40:80:
>
>    40 |        PetscInt b ! PetscInt
>       |
>              1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:41:80:
>
>    41 |        PetscInt c ! PetscInt
>       |
>              1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:42:80:
>
>    42 |        PetscInt d ! PetscInt
>       |
>              1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:43:80:
>
>    43 |        PetscInt e ! PetscInt
>       |
>              1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:44:80:
>
>    44 |        PetscInt f ! PetscInt
>       |
>              1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:45:80:
>
>    45 |        PetscInt g ! PetscInt
>       |
>              1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:46:30:
>
>    46 |        IS h ! IS
>       |                              1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:47:30:
>
>    47 |        IS i ! IS
>       |                              1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:48:43:
>
>    48 |        PetscErrorCode z
>       |                                           1
> Error: Unexpected data declaration statement in INTERFACE block at (1)
>
> /home/mnv/Software/petsc/include/../src/ksp/f90-mod/ftn-auto-interfaces/petscpc.h90:49:10:
>
>    49 |        end subroutine PCASMCreateSubdomains2D
>       |          1
> Error: Expecting END INTERFACE statement at (1)
> make[3]: *** [gmakefile:225:
> arch-linux-c-opt/obj/src/ksp/f90-mod/petscpcmod.o] Error 1
> make[3]: *** Waiting for unfinished jobs....
>          CC
> arch-linux-c-opt/obj/src/tao/leastsquares/impls/pounders/pounders.o
>          CC arch-linux-c-opt/obj/src/ksp/pc/impls/bddc/bddcprivate.o
>       CUDAC
> arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o
>   CUDAC.dep
> arch-linux-c-opt/obj/src/vec/vec/impls/seq/cupm/cuda/vecseqcupm.o
> make[3]: Leaving directory '/home/mnv/Software/petsc'
> make[2]: *** [/home/mnv/Software/petsc/lib/petsc/conf/rules.doc:28: libs]
> Error 2
> make[2]: Leaving directory '/home/mnv/Software/petsc'
> **************************ERROR*************************************
>   Error during compile, check arch-linux-c-opt/lib/petsc/conf/make.log
>   Send it and arch-linux-c-opt/lib/petsc/conf/configure.log to
> petsc-maint at mcs.anl.gov
> ********************************************************************
> make[1]: *** [makefile:45: all] Error 1
> make: *** [GNUmakefile:9: all] Error 2
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Friday, August 11, 2023 3:04 PM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> Hi, Macros,
>   I saw MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic() in the error stack.
> We recently refactored the COO code and got rid of that function.  So could
> you try petsc/main?
>   We map MPI processes to GPUs in a round-robin fashion. We query the
> number of visible CUDA devices (g), and assign the device (rank%g) to the
> MPI process (rank).   In that sense, the work distribution is totally
> determined by your MPI work partition (i.e, yourself).
>   On clusters, this MPI process to GPU binding is usually done by the job
> scheduler like slurm.  You need to check your cluster's users' guide to see
> how to bind MPI processes to GPUs. If the job scheduler has done that, the
> number of visible CUDA devices to a process might just appear to be 1,
> making petsc's own mapping void.
>
>    Thanks.
> --Junchao Zhang
>
>
> On Fri, Aug 11, 2023 at 12:43 PM Vanella, Marcos (Fed) <
> marcos.vanella at nist.gov> wrote:
>
> Hi Junchao, thank you for replying. I compiled petsc in debug mode and
> this is what I get for the case:
>
> terminate called after throwing an instance of
> 'thrust::system::system_error'
>   what():  merge_sort: failed to synchronize: cudaErrorIllegalAddress: an
> illegal memory access was encountered
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
> #0  0x15264731ead0 in ???
> #1  0x15264731dc35 in ???
> #2  0x15264711551f in ???
> #3  0x152647169a7c in ???
> #4  0x152647115475 in ???
> #5  0x1526470fb7f2 in ???
> #6  0x152647678bbd in ???
> #7  0x15264768424b in ???
> #8  0x1526476842b6 in ???
> #9  0x152647684517 in ???
> #10  0x55bb46342ebb in _ZN6thrust8cuda_cub14throw_on_errorE9cudaErrorPKc
> at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:224
> #11  0x55bb46342ebb in
> _ZN6thrust8cuda_cub12__merge_sort10merge_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESB_NS_9null_typeESC_SC_SC_SC_SC_SC_SC_EEEENS3_15normal_iteratorISB_EE9IJCompareEEvRNS0_16execution_policyIT1_EET2_SM_T3_T4_
> at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1316
> #12  0x55bb46342ebb in
> _ZN6thrust8cuda_cub12__smart_sort10smart_sortINS_6detail17integral_constantIbLb1EEENS4_IbLb0EEENS0_16execution_policyINS0_3tagEEENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEENS3_15normal_iteratorISD_EE9IJCompareEENS1_25enable_if_comparison_sortIT2_T4_E4typeERT1_SL_SL_T3_SM_
> at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1544
> #13  0x55bb46342ebb in
> _ZN6thrust8cuda_cub11sort_by_keyINS0_3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRNS0_16execution_policyIT_EET0_SI_T1_T2_
> at /usr/local/cuda/include/thrust/system/cuda/detail/sort.h:1669
> #14  0x55bb46317bc5 in
> _ZN6thrust11sort_by_keyINS_8cuda_cub3tagENS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES6_NS_9null_typeES7_S7_S7_S7_S7_S7_S7_EEEENS_6detail15normal_iteratorIS6_EE9IJCompareEEvRKNSA_21execution_policy_baseIT_EET0_SJ_T1_T2_
> at /usr/local/cuda/include/thrust/detail/sort.inl:115
> #15  0x55bb46317bc5 in
> _ZN6thrust11sort_by_keyINS_12zip_iteratorINS_5tupleINS_10device_ptrIiEES4_NS_9null_typeES5_S5_S5_S5_S5_S5_S5_EEEENS_6detail15normal_iteratorIS4_EE9IJCompareEEvT_SC_T0_T1_
> at /usr/local/cuda/include/thrust/detail/sort.inl:305
> #16  0x55bb46317bc5 in MatSetPreallocationCOO_SeqAIJCUSPARSE_Basic
> at /home/mnv/Software/petsc/src/mat/impls/aij/seq/seqcusparse/
> aijcusparse.cu:4452
> #17  0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE_Basic
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/
> mpiaijcusparse.cu:173
> #18  0x55bb46c5b27c in MatSetPreallocationCOO_MPIAIJCUSPARSE
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpicusparse/
> mpiaijcusparse.cu:222
> #19  0x55bb468e01cf in MatSetPreallocationCOO
> at /home/mnv/Software/petsc/src/mat/utils/gcreate.c:606
> #20  0x55bb46b39c9b in MatProductSymbolic_MPIAIJBACKEND
> at /home/mnv/Software/petsc/src/mat/impls/aij/mpi/mpiaij.c:7547
> #21  0x55bb469015e5 in MatProductSymbolic
> at /home/mnv/Software/petsc/src/mat/interface/matproduct.c:803
> #22  0x55bb4694ade2 in MatPtAP
> at /home/mnv/Software/petsc/src/mat/interface/matrix.c:9897
> #23  0x55bb4696d3ec in MatCoarsenApply_MISK_private
> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:283
> #24  0x55bb4696eb67 in MatCoarsenApply_MISK
> at /home/mnv/Software/petsc/src/mat/coarsen/impls/misk/misk.c:368
> #25  0x55bb4695bd91 in MatCoarsenApply
> at /home/mnv/Software/petsc/src/mat/coarsen/coarsen.c:97
> #26  0x55bb478294d8 in PCGAMGCoarsen_AGG
> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/agg.c:524
> #27  0x55bb471d1cb4 in PCSetUp_GAMG
> at /home/mnv/Software/petsc/src/ksp/pc/impls/gamg/gamg.c:631
> #28  0x55bb464022cf in PCSetUp
> at /home/mnv/Software/petsc/src/ksp/pc/interface/precon.c:994
> #29  0x55bb4718b8a7 in KSPSetUp
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:406
> #30  0x55bb4718f22e in KSPSolve_Private
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:824
> #31  0x55bb47192c0c in KSPSolve
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/itfunc.c:1070
> #32  0x55bb463efd35 in kspsolve_
> at /home/mnv/Software/petsc/src/ksp/ksp/interface/ftn-auto/itfuncf.c:320
> #33  0x55bb45e94b32 in ???
> #34  0x55bb46048044 in ???
> #35  0x55bb46052ea1 in ???
> #36  0x55bb45ac5f8e in ???
> #37  0x1526470fcd8f in ???
> #38  0x1526470fce3f in ???
> #39  0x55bb45aef55d in ???
> #40  0xffffffffffffffff in ???
> --------------------------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 1771753 on node dgx02 exited
> on signal 6 (Aborted).
> --------------------------------------------------------------------------
>
> BTW, I'm curious. If I set n MPI processes, each of them building a part
> of the linear system, and g GPUs, how does PETSc distribute those n pieces
> of system matrix and rhs in the g GPUs? Does it do some load balancing
> algorithm? Where can I read about this?
> Thank you and best Regards, I can also point you to my code repo in GitHub
> if you want to take a closer look.
>
> Best Regards,
> Marcos
>
> ------------------------------
> *From:* Junchao Zhang <junchao.zhang at gmail.com>
> *Sent:* Friday, August 11, 2023 10:52 AM
> *To:* Vanella, Marcos (Fed) <marcos.vanella at nist.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] CUDA error trying to run a job with two mpi
> processes and 1 GPU
>
> Hi, Marcos,
>   Could you build petsc in debug mode and then copy and paste the whole
> error stack message?
>
>    Thanks
> --Junchao Zhang
>
>
> On Thu, Aug 10, 2023 at 5:51 PM Vanella, Marcos (Fed) via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
> Hi, I'm trying to run a parallel matrix vector build and linear solution
> with PETSc on 2 MPI processes + one V100 GPU. I tested that the matrix
> build and solution is successful in CPUs only. I'm using cuda 11.5 and cuda
> enabled openmpi and gcc 9.3. When I run the job with GPU enabled I get the
> following error:
>
> terminate called after throwing an instance of
> 'thrust::system::system_error'
>   *what():  merge_sort: failed to synchronize: cudaErrorIllegalAddress:
> an illegal memory access was encountered*
>
> Program received signal SIGABRT: Process abort signal.
>
> Backtrace for this error:
> terminate called after throwing an instance of
> 'thrust::system::system_error'
>   what():  merge_sort: failed to synchronize: cudaErrorIllegalAddress: an
> illegal memory access was encountered
>
> Program received signal SIGABRT: Process abort signal.
>
> I'm new to submitting jobs in slurm that also use GPU resources, so I
> might be doing something wrong in my submission script. This is it:
>
> #!/bin/bash
> #SBATCH -J test
> #SBATCH -e /home/Issues/PETSc/test.err
> #SBATCH -o /home/Issues/PETSc/test.log
> #SBATCH --partition=batch
> #SBATCH --ntasks=2
> #SBATCH --nodes=1
> #SBATCH --cpus-per-task=1
> #SBATCH --ntasks-per-node=2
> #SBATCH --time=01:00:00
> #SBATCH --gres=gpu:1
>
> export OMP_NUM_THREADS=1
> module load cuda/11.5
> module load openmpi/4.1.1
>
> cd /home/Issues/PETSc
> *mpirun -n 2 */home/fds/Build/ompi_gnu_linux/fds_ompi_gnu_linux test.fds *-vec_type
> mpicuda -mat_type mpiaijcusparse -pc_type gamg*
>
> If anyone has any suggestions on how o troubleshoot this please let me
> know.
> Thanks!
> Marcos
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230814/5608973d/attachment-0001.html>


More information about the petsc-users mailing list