[petsc-dev] Unification approach for OpenMP/Threads/OpenCL/CUDA: Part 1: Memory

Sat Oct 6 08:31:41 CDT 2012

Hi Matt,

 >     In a purely CPU-driven execution, there is a pointer to the data
>     (*data), which is assumed to reside in a single linear piece of
>     memory (please correct me if I'm wrong), yet may be managed by some
>     external routines (VecOps).
>
>
> No, the 'data' is actually a pointer to the implementation class (it is
> helpful to compare this to other class headers, which all have
> the data pointer). In this case, it would be Vec_Seq or Vec_MPI
>
> http://petsc.cs.iit.edu/petsc/petsc-dev/annotate/0b92fc173218/src/vec/vec/impls/dvecimpl.h#l14
>
> In fact is VECHEADER that has the array:
>
> http://petsc.cs.iit.edu/petsc/petsc-dev/annotate/0b92fc173218/include/petsc-private/vecimpl.h#l435
>
> Jed started the practice of linking to code, and I think its the bees
> knees. You are correct that all these implementations
> assume a piece of linear memory on the CPU. On the GPU, we synchronize
> some linear memory with Cusp vectors.

I'm aware of the redirection to Vec_Seq and Vec_MPI (see Section 3), my 
sentence just took a slight shortcut here. Anyway, thanks for pointing 
that out :-)

>     As accelerators enter the game (indicated by PETSC_HAVE_CUSP), the
>     concept of a vector having one pointer to its data is undermined.
>     Now, Vec can possibly have data on CPU RAM, and on one (multiple
>     with txpetscgpu) CUDA accelerator. 'valid_GPU_array' indicates which
>     of the two memory domains holds the most recent data, possibly both.
>
>
> There is an implementation of PETSc Vecs with non-contiguous memory for
> SAMRAI.
>

Thanks, I'll have a look at this.

> (...)
>     -- 4. Concluding remarks --
>
>     Even though the mere question of how to hold memory handles is
>     certainly less complex than a full unification of actual operations
>     at runtime, this first step needs to be done right in order to have
>     a solid foundation to built on. Thus, if you guys spot any
>     weaknesses in the proposed modifications, please let me know. I
>     tried to align everything such that integrates nicely into Petsc,
>     yet I don't know many of the implementation details yet...
>
>
> I can't tell from the above how we would synchronize memory. Perhaps it
> would be easy to show with an example
> of how this would work, as opposed to the current system.

The memory synchronization is something that interferes with the actual 
runtime (data manipulation), so I just focused on the datastructure. 
Basically, the synchronizations would be accomplished in essentially the 
same way as now (VecCUSPCopyToGPU(), VecCUSPCopyToGPU(), etc., I can't 
look up the exact names), but possibly with finer granularity (cf. 
VecCUSPCopyFromGPUSome()). The important point here, however, is the 
independence from the implementation libraries, otherwise we would have 
to maintain a separate memory management implementation for each GPU 
library we possibly interface with.

Thanks and best regards,
Karli