[petsc-dev] Unification approach for OpenMP/Threads/OpenCL/CUDA: Part 1: Memory

Mon Oct 8 17:34:59 CDT 2012

On Oct 8, 2012 12:35 PM, "Karl Rupp" <rupp at mcs.anl.gov> wrote:
>
>
>>         Okay, the matrix will have to partition itself. What is the
>>         advantage of
>>         having a single CPU process addressing multiple GPUs? Why not use
>>         different MPI processes? (We can have the MPI processes sharing
>>         a node
>>         create a subcomm so they can decide which process is driving
>>         which device.)
>>
>>
>>     Making MPI a prerequisite for multi-GPU usage would be a unnecessary
>>     restriction, wouldn't it?
>>
>>
>> Small point: I don't believe this, in fact the opposite. There are many
>> equivalent ways of doing these
>> things, and we should use the simplest and most structured that can
>> accomplish our goal. We have
>> already bought into MPI and should never fall into the trap of trying to
>> support another paradigm at
>> the expense of the simplicity.
>
>
> From the simplicity point of view, you're absolutely right. However, MPI
is only an implementation of the model used for distributed
computing/memory (including 'faking' distributed memory on a shared memory
system). With all the complex memory hierarchies introduced in the last
years, we may have to adapt our programming approach in order to get
reasonable performance, even though MPI would be able to accomplish this
(yet at higher costs - even on shared memory systems, MPI messaging is not
a free lunch).
>

MPI is shared-none not distributed, so its not faking but rather adopting a
programming model. Code correctness is much easier in this model, and the
only penalty I have seen is in local memory

  Matt
> Best regards,
> Karli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20121008/26b59792/attachment.html>