[petsc-users] Status of PETScSF failures with GPU-aware MPI on Perlmutter
Jed Brown
jed at jedbrown.org
Thu Nov 2 16:02:01 CDT 2023
What modules do you have loaded. I don't know if it currently works with cuda-11.7. I assume you're following these instructions carefully.
https://docs.nersc.gov/development/programming-models/mpi/cray-mpich/#cuda-aware-mpi
In our experience, GPU-aware MPI continues to be brittle on these machines. Maybe you can inquire with NERSC exactly which CUDA versions are tested with GPU-aware MPI.
Sajid Ali <sajidsyed2021 at u.northwestern.edu> writes:
> Hi PETSc-developers,
>
> I had posted about crashes within PETScSF when using GPU-aware MPI on
> Perlmutter a while ago (
> https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2022-February/045585.html).
> Now that the software stacks have stabilized, I was wondering if there was
> a fix for the same as I am still observing similar crashes.
>
> I am attaching the trace of the latest crash (with PETSc-3.20.0) for
> reference.
>
> Thank You,
> Sajid Ali (he/him) | Research Associate
> Data Science, Simulation, and Learning Division
> Fermi National Accelerator Laboratory
> s-sajid-ali.github.io
More information about the petsc-users
mailing list