[petsc-dev] https://developer.nvidia.com/nccl
Karl Rupp
rupp at iue.tuwien.ac.at
Tue Jun 16 22:19:51 CDT 2020
From a practical standpoint it seems to me that NCCL is an offering to
a community that isn't used to MPI. It's categorized as 'Deep Learning
Software' on the NVIDIA page ;-)
The section 'NCCL and MPI' has some interesting bits:
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html
At the bottom of the page there is
"Using NCCL to perform inter-GPU communication concurrently with
CUDA-aware MPI may create deadlocks. (...) Using both MPI and NCCL to
perform transfers between the same sets of CUDA devices concurrently is
therefore not guaranteed to be safe."
While I'm impressed that NVIDIA even 'reinvents' MPI for their GPUs to
serve the deep learning community, I don't think NCCL provides enough
beyond MPI for PETSc.
Best regards,
Karli
On 6/17/20 4:13 AM, Junchao Zhang wrote:
> It should be renamed as NCL (NVIDIA Communications Library) as it adds
> point-to-point, in addition to collectives. I am not sure whether to
> implement it in petsc as none exscale machine uses nvidia GPUs.
>
> --Junchao Zhang
>
>
> On Tue, Jun 16, 2020 at 6:44 PM Matthew Knepley <knepley at gmail.com
> <mailto:knepley at gmail.com>> wrote:
>
> It would seem to make more sense to just reverse-engineering this as
> another MPI impl.
>
> Matt
>
> On Tue, Jun 16, 2020 at 6:22 PM Barry Smith <bsmith at petsc.dev
> <mailto:bsmith at petsc.dev>> wrote:
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
More information about the petsc-dev
mailing list