[petsc-dev] https://developer.nvidia.com/nccl

Scott Kruger kruger at txcorp.com
Wed Jun 17 11:47:49 CDT 2020




Here's a paper from a few years ago that uses NCCL to give a better 
mpi_bcast:

https://arxiv.org/pdf/1707.09414.pdf

But what's interesting is that they have this statement:

In general, NCCL integration with MPI runtimes might lead to very 
complicated designs. Thus, the proposed work is a step towards achieving 
similar or better performance without utilizing NCCL.

Scott

On 6/16/20 9:19 PM, Karl Rupp wrote:
>  From a practical standpoint it seems to me that NCCL is an offering to 
> a community that isn't used to MPI. It's categorized as 'Deep Learning 
> Software' on the NVIDIA page ;-)
> 
> The section 'NCCL and MPI' has some interesting bits:
>   https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html
> 
> At the bottom of the page there is
>   "Using NCCL to perform inter-GPU communication concurrently with 
> CUDA-aware MPI may create deadlocks. (...) Using both MPI and NCCL to 
> perform transfers between the same sets of CUDA devices concurrently is 
> therefore not guaranteed to be safe."
> 
> While I'm impressed that NVIDIA even 'reinvents' MPI for their GPUs to 
> serve the deep learning community, I don't think NCCL provides enough 
> beyond MPI for PETSc.
> 
> Best regards,
> Karli
> 
> 
> 
> 
> 
> On 6/17/20 4:13 AM, Junchao Zhang wrote:
>> It should be renamed as NCL (NVIDIA Communications Library) as it adds 
>> point-to-point, in addition to collectives. I am not sure whether to 
>> implement it in petsc as none exscale machine uses nvidia GPUs.
>>
>> --Junchao Zhang
>>
>>
>> On Tue, Jun 16, 2020 at 6:44 PM Matthew Knepley <knepley at gmail.com 
>> <mailto:knepley at gmail.com>> wrote:
>>
>>     It would seem to make more sense to just reverse-engineering this as
>>     another MPI impl.
>>
>>         Matt
>>
>>     On Tue, Jun 16, 2020 at 6:22 PM Barry Smith <bsmith at petsc.dev
>>     <mailto:bsmith at petsc.dev>> wrote:
>>
>>
>>
>>
>>     --     What most experimenters take for granted before they begin 
>> their
>>     experiments is infinitely more interesting than any results to which
>>     their experiments lead.
>>     -- Norbert Wiener
>>
>>     https://www.cse.buffalo.edu/~knepley/
>>     <http://www.cse.buffalo.edu/~knepley/>
>>

-- 
Tech-X Corporation               kruger at txcorp.com
5621 Arapahoe Ave, Suite A       Phone: (720) 974-1841
Boulder, CO 80303                Fax:   (303) 448-7756


More information about the petsc-dev mailing list