[petsc-users] errors with hypre with MPI and multiple GPUs on a node

Junchao Zhang junchao.zhang at gmail.com
Thu Feb 1 21:07:58 CST 2024


Hi, Anna,
  Do you have other CUDA machines to try?  If you can share your test, then
I will run on Polaris at Argonne to see if it is a petsc/hypre issue.  If not,
then it must be a GPU-MPI binding problem on TACC.

  Thanks
--Junchao Zhang


On Thu, Feb 1, 2024 at 5:31 PM Yesypenko, Anna <anna at oden.utexas.edu> wrote:

> Hi Victor, Junchao,
>
> Thank you for providing the script, it is very useful!
> There are still issues with hypre not binding correctly, and I'm getting
> the error message occasionally (but much less often).
> I added some additional environment variables to the script that seem to
> make the behavior more consistent.
>
> export CUDA_DEVICE_ORDER=PCI_BUS_ID
> export CUDA_VISIBLE_DEVICES=$MV2_COMM_WORLD_LOCAL_RANK    ## as Victor
> suggested
> export HYPRE_MEMORY_DEVICE=$MV2_COMM_WORLD_LOCAL_RANK
>
> The last environment variable is from hypre's documentation on GPUs.
> In 30 runs for a small problem size, 4 fail with a hypre-related error. Do
> you have any other thoughts or suggestions?
>
> Best,
> Anna
>
> ------------------------------
> *From:* Victor Eijkhout <eijkhout at tacc.utexas.edu>
> *Sent:* Thursday, February 1, 2024 11:26 AM
> *To:* Junchao Zhang <junchao.zhang at gmail.com>; Yesypenko, Anna <
> anna at oden.utexas.edu>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
> *Subject:* Re: [petsc-users] errors with hypre with MPI and multiple GPUs
> on a node
>
>
> Only for mvapich2-gdr:
>
>
>
> #!/bin/bash
>
> # Usage: mpirun -n <num_proc> MV2_USE_AFFINITY=0 MV2_ENABLE_AFFINITY=0
> ./launch ./bin
>
>
>
> export CUDA_VISIBLE_DEVICES=$MV2_COMM_WORLD_LOCAL_RANK
>
> case $MV2_COMM_WORLD_LOCAL_RANK in
>
>         [0]) cpus=0-3 ;;
>
>         [1]) cpus=64-67 ;;
>
>         [2]) cpus=72-75 ;;
>
> esac
>
>
>
> numactl --physcpubind=$cpus $@
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240201/1d7232f8/attachment-0001.html>


More information about the petsc-users mailing list