[petsc-dev] https://developer.nvidia.com/nccl
Scott Kruger
kruger at txcorp.com
Wed Jun 17 11:47:49 CDT 2020
Here's a paper from a few years ago that uses NCCL to give a better
mpi_bcast:
https://arxiv.org/pdf/1707.09414.pdf
But what's interesting is that they have this statement:
In general, NCCL integration with MPI runtimes might lead to very
complicated designs. Thus, the proposed work is a step towards achieving
similar or better performance without utilizing NCCL.
Scott
On 6/16/20 9:19 PM, Karl Rupp wrote:
> From a practical standpoint it seems to me that NCCL is an offering to
> a community that isn't used to MPI. It's categorized as 'Deep Learning
> Software' on the NVIDIA page ;-)
>
> The section 'NCCL and MPI' has some interesting bits:
> https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html
>
> At the bottom of the page there is
> "Using NCCL to perform inter-GPU communication concurrently with
> CUDA-aware MPI may create deadlocks. (...) Using both MPI and NCCL to
> perform transfers between the same sets of CUDA devices concurrently is
> therefore not guaranteed to be safe."
>
> While I'm impressed that NVIDIA even 'reinvents' MPI for their GPUs to
> serve the deep learning community, I don't think NCCL provides enough
> beyond MPI for PETSc.
>
> Best regards,
> Karli
>
>
>
>
>
> On 6/17/20 4:13 AM, Junchao Zhang wrote:
>> It should be renamed as NCL (NVIDIA Communications Library) as it adds
>> point-to-point, in addition to collectives. I am not sure whether to
>> implement it in petsc as none exscale machine uses nvidia GPUs.
>>
>> --Junchao Zhang
>>
>>
>> On Tue, Jun 16, 2020 at 6:44 PM Matthew Knepley <knepley at gmail.com
>> <mailto:knepley at gmail.com>> wrote:
>>
>> It would seem to make more sense to just reverse-engineering this as
>> another MPI impl.
>>
>> Matt
>>
>> On Tue, Jun 16, 2020 at 6:22 PM Barry Smith <bsmith at petsc.dev
>> <mailto:bsmith at petsc.dev>> wrote:
>>
>>
>>
>>
>> -- What most experimenters take for granted before they begin
>> their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
--
Tech-X Corporation kruger at txcorp.com
5621 Arapahoe Ave, Suite A Phone: (720) 974-1841
Boulder, CO 80303 Fax: (303) 448-7756
More information about the petsc-dev
mailing list