[petsc-dev] Unification approach for OpenMP/Threads/OpenCL/CUDA: Part 1: Memory
Karl Rupp
rupp at mcs.anl.gov
Sun Oct 7 07:58:53 CDT 2012
> Okay, the matrix will have to partition itself. What is the
> advantage of
> having a single CPU process addressing multiple GPUs? Why not use
> different MPI processes? (We can have the MPI processes sharing
> a node
> create a subcomm so they can decide which process is driving
> which device.)
>
>
> Making MPI a prerequisite for multi-GPU usage would be a unnecessary
> restriction, wouldn't it?
>
>
> Small point: I don't believe this, in fact the opposite. There are many
> equivalent ways of doing these
> things, and we should use the simplest and most structured that can
> accomplish our goal. We have
> already bought into MPI and should never fall into the trap of trying to
> support another paradigm at
> the expense of the simplicity.
From the simplicity point of view, you're absolutely right. However,
MPI is only an implementation of the model used for distributed
computing/memory (including 'faking' distributed memory on a shared
memory system). With all the complex memory hierarchies introduced in
the last years, we may have to adapt our programming approach in order
to get reasonable performance, even though MPI would be able to
accomplish this (yet at higher costs - even on shared memory systems,
MPI messaging is not a free lunch).
Best regards,
Karli
More information about the petsc-dev
mailing list