[petsc-dev] Unification approach for OpenMP/Threads/OpenCL/CUDA: Part 1: Memory
Karl Rupp
rupp at mcs.anl.gov
Sat Oct 6 08:31:41 CDT 2012
Hi Matt,
> In a purely CPU-driven execution, there is a pointer to the data
> (*data), which is assumed to reside in a single linear piece of
> memory (please correct me if I'm wrong), yet may be managed by some
> external routines (VecOps).
>
>
> No, the 'data' is actually a pointer to the implementation class (it is
> helpful to compare this to other class headers, which all have
> the data pointer). In this case, it would be Vec_Seq or Vec_MPI
>
> http://petsc.cs.iit.edu/petsc/petsc-dev/annotate/0b92fc173218/src/vec/vec/impls/dvecimpl.h#l14
>
> In fact is VECHEADER that has the array:
>
> http://petsc.cs.iit.edu/petsc/petsc-dev/annotate/0b92fc173218/include/petsc-private/vecimpl.h#l435
>
> Jed started the practice of linking to code, and I think its the bees
> knees. You are correct that all these implementations
> assume a piece of linear memory on the CPU. On the GPU, we synchronize
> some linear memory with Cusp vectors.
I'm aware of the redirection to Vec_Seq and Vec_MPI (see Section 3), my
sentence just took a slight shortcut here. Anyway, thanks for pointing
that out :-)
> As accelerators enter the game (indicated by PETSC_HAVE_CUSP), the
> concept of a vector having one pointer to its data is undermined.
> Now, Vec can possibly have data on CPU RAM, and on one (multiple
> with txpetscgpu) CUDA accelerator. 'valid_GPU_array' indicates which
> of the two memory domains holds the most recent data, possibly both.
>
>
> There is an implementation of PETSc Vecs with non-contiguous memory for
> SAMRAI.
>
Thanks, I'll have a look at this.
> (...)
> -- 4. Concluding remarks --
>
> Even though the mere question of how to hold memory handles is
> certainly less complex than a full unification of actual operations
> at runtime, this first step needs to be done right in order to have
> a solid foundation to built on. Thus, if you guys spot any
> weaknesses in the proposed modifications, please let me know. I
> tried to align everything such that integrates nicely into Petsc,
> yet I don't know many of the implementation details yet...
>
>
> I can't tell from the above how we would synchronize memory. Perhaps it
> would be easy to show with an example
> of how this would work, as opposed to the current system.
The memory synchronization is something that interferes with the actual
runtime (data manipulation), so I just focused on the datastructure.
Basically, the synchronizations would be accomplished in essentially the
same way as now (VecCUSPCopyToGPU(), VecCUSPCopyToGPU(), etc., I can't
look up the exact names), but possibly with finer granularity (cf.
VecCUSPCopyFromGPUSome()). The important point here, however, is the
independence from the implementation libraries, otherwise we would have
to maintain a separate memory management implementation for each GPU
library we possibly interface with.
Thanks and best regards,
Karli
More information about the petsc-dev
mailing list