[petsc-dev] Question on PETSc + CUDA configuration with MPI on cluster

Tue Sep 23 00:13:34 CDT 2025

orte-info output does suggest OpenMPI is built with cuda enabled.

Are you able to run PETSc examples? What do you get for:

>>>>
balay at petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$ make ex19
/scratch/balay/petsc/arch-linux-c-debug/bin/mpicc -fPIC -Wall
-Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch
-Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g3 -O0 
-I/scratch/balay/petsc/include
-I/scratch/balay/petsc/arch-linux-c-debug/include
-I/nfs/gce/projects/petsc/soft/u22.04/spack-2024-11-27-cuda/opt/spack/linux-ubuntu22.04-x86_64/gcc-11.4.0/cuda-12.0.1-gy7foq57oi6wzltombtsdy5eqz5gkjgc/include
    -Wl,-export-dynamic ex19.c 
-Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib
-L/scratch/balay/petsc/arch-linux-c-debug/lib
-Wl,-rpath,/nfs/gce/projects/petsc/soft/u22.04/spack-2024-11-27-cuda/opt/spack/linux-ubuntu22.04-x86_64/gcc-11.4.0/cuda-12.0.1-gy7foq57oi6wzltombtsdy5eqz5gkjgc/lib64
-L/nfs/gce/projects/petsc/soft/u22.04/spack-2024-11-27-cuda/opt/spack/linux-ubuntu22.04-x86_64/gcc-11.4.0/cuda-12.0.1-gy7foq57oi6wzltombtsdy5eqz5gkjgc/lib64
-L/nfs/gce/projects/petsc/soft/u22.04/spack-2024-11-27-cuda/opt/spack/linux-ubuntu22.04-x86_64/gcc-11.4.0/cuda-12.0.1-gy7foq57oi6wzltombtsdy5eqz5gkjgc/lib64/stubs
-Wl,-rpath,/scratch/balay/petsc/arch-linux-c-debug/lib
-L/scratch/balay/petsc/arch-linux-c-debug/lib
-Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11
-L/usr/lib/gcc/x86_64-linux-gnu/11 -lpetsc -llapack -lblas -lm -lcudart
-lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda
-lX11 -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
-lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -o ex19
balay at petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$ ./ex19 -snes_monitor -dm_mat_type seqaijcusparse -dm_vec_type seqcuda -pc_type gamg -pc_gamg_esteig_ksp_max_it 10 -ksp_monitor -mg_levels_ksp_max_it 3 
lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
  0 SNES Function norm 2.391552133017e-01 
    0 KSP Residual norm 2.013462697105e-01 
    1 KSP Residual norm 5.027022294231e-02 
    2 KSP Residual norm 7.248258907839e-03 
    3 KSP Residual norm 8.590847505363e-04 
    4 KSP Residual norm 1.511762118013e-05 
    5 KSP Residual norm 1.410585959219e-06 
  1 SNES Function norm 6.812362089434e-05 
    0 KSP Residual norm 2.315252918142e-05 
    1 KSP Residual norm 2.351994603807e-06 
    2 KSP Residual norm 3.882072626158e-07 
    3 KSP Residual norm 2.227447016095e-08 
    4 KSP Residual norm 2.200353394658e-09 
    5 KSP Residual norm 1.147903850265e-10 
  2 SNES Function norm 3.411489611752e-10 
Number of SNES iterations = 2
balay at petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$ 
<<<<

So what issue are you seeing with your code? And does it go away with the option: "-use_gpu_aware_mpi 0"? for example:

>>>>
balay at petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$ ./ex19 -snes_monitor -dm_mat_type seqaijcusparse -dm_vec_type seqcuda -pc_type gamg -pc_gamg_esteig_ksp_max_it 10 -ksp_monitor -mg_levels_ksp_max_it 3 -use_gpu_aware_mpi 0
lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
  0 SNES Function norm 2.391552133017e-01 
    0 KSP Residual norm 2.013462697105e-01 
    1 KSP Residual norm 5.027022294231e-02 
    2 KSP Residual norm 7.248258907839e-03 
    3 KSP Residual norm 8.590847505363e-04 
    4 KSP Residual norm 1.511762118013e-05 
    5 KSP Residual norm 1.410585959219e-06 
  1 SNES Function norm 6.812362089434e-05 
    0 KSP Residual norm 2.315252918142e-05 
    1 KSP Residual norm 2.351994603807e-06 
    2 KSP Residual norm 3.882072626158e-07 
    3 KSP Residual norm 2.227447016095e-08 
    4 KSP Residual norm 2.200353394658e-09 
    5 KSP Residual norm 1.147903850265e-10 
  2 SNES Function norm 3.411489611752e-10 
Number of SNES iterations = 2
balay at petsc-gpu-01:/scratch/balay/petsc/src/snes/tutorials$ 
<<<<

Satish

On Tue, 23 Sep 2025, 岳新海 wrote:

> I get:
> [mae_yuexh at login01 ~]$ orte-info |grep 'MCA btl'
>                  MCA btl: smcuda (MCA v2.1, API v3.1, Component v4.1.5)
>                  MCA btl: tcp (MCA v2.1, API v3.1, Component v4.1.5)
>                  MCA btl: self (MCA v2.1, API v3.1, Component v4.1.5)
>                  MCA btl: vader (MCA v2.1, API v3.1, Component v4.1.5)
> 
> 
> 
> Xinhai
> 
> 
> 
> 
> 
> 岳新海
> 
> 
> 
> 南方科技大学/学生/研究生/2023级研究生
> 
> 
> 
> 广东省深圳市南山区学苑大道1088号
> 
> 
> 
> 
>  
>  
>  
> ------------------ Original ------------------
> From:  "Satish Balay"<balay.anl at fastmail.org>;
> Date:  Tue, Sep 23, 2025 03:25 AM
> To:  "岳新海"<12332508 at mail.sustech.edu.cn>; 
> Cc:  "petsc-dev"<petsc-dev at mcs.anl.gov>; 
> Subject:  Re: [petsc-dev] Question on PETSc + CUDA configuration with MPI on  cluster
> 
>  
> 
>  
> What do you get for (with your openmpi install) :orte-info |grep 'MCA btl'
> 
> With cuda built openmpi - I get:
> balay at petsc-gpu-01:/scratch/balay/petsc$ ./arch-linux-c-debug/bin/orte-info |grep 'MCA btl'
>                  MCA btl: smcuda (MCA v2.1, API v3.1, Component v4.1.6)
>                  MCA btl: openib (MCA v2.1, API v3.1, Component v4.1.6)
>                  MCA btl: self (MCA v2.1, API v3.1, Component v4.1.6)
>                  MCA btl: tcp (MCA v2.1, API v3.1, Component v4.1.6)
>                  MCA btl: vader (MCA v2.1, API v3.1, Component v4.1.6)
> 
> And without cuda:
> balay at petsc-gpu-01:/scratch/balay/petsc.x$ ./arch-test/bin/orte-info  | grep 'MCA btl'
>                  MCA btl: openib (MCA v2.1, API v3.1, Component v4.1.6)
>                  MCA btl: self (MCA v2.1, API v3.1, Component v4.1.6)
>                  MCA btl: tcp (MCA v2.1, API v3.1, Component v4.1.6)
>                  MCA btl: vader (MCA v2.1, API v3.1, Component v4.1.6)
> 
> i.e "smcuda" should be listed for a cuda enabled openmpi.
> 
> Its not clear if GPU-aware MPI makes a difference for all MPI impls (or versions) - so good to verify. [its a performance issue anyway - so primarily useful when performing timing measurements]
> 
> Satish
> 
> On Mon, 22 Sep 2025, 岳新海 wrote:
> 
> > Dear PETSc Team,
> >  
> > I am encountering an issue when running PETSc with CUDA support on a cluster. When I set the vector type to VECCUDA, PETSc reports that my MPI is not GPU-aware. However, the MPI library (OpenMPI 4.1.5) I used to configure PETSc was built with the --with-cuda option enabled.
> > 
> > 
> > Here are some details:
> > PETSc version: 3.20.6
> > MPI: OpenMPI 4.1.5, configured with --with-cuda
> > GPU: RTX3090
> > CUDA version: 12.1 
> > I have attached both my PETSc configure command and OpenMPI configure command for reference.
> > 
> > My questions are:
> > 
> >  
> >  
> >  
> > Even though I enabled --with-cuda in OpenMPI, why does PETSc still report that MPI is not GPU-aware?
> >  
> >  
> >  
> > Are there additional steps or specific configuration flags required (either in OpenMPI or PETSc) to ensure GPU-aware MPI is correctly detected?
> > 
> > 
> > Any guidance or suggestions would be greatly appreciated.
> > 
> >  
> > 
> > Best regards,
> > 
> > Xinhai Yue
> > 
> > 
> > 
> >  
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 岳新海
> > 
> > 
> > 
> > 南方科技大学/学生/研究生/2023级研究生
> > 
> > 
> > 
> > 广东省深圳市南山区学苑大道1088号
> > 
> > 
> > 
> > 
> >