[petsc-dev] code review request : txpetscgpu package removal
Karl Rupp
rupp at mcs.anl.gov
Tue Jun 25 16:46:25 CDT 2013
Hi Paul,
> Not too heavy. I've already converted much of this code to remove this
> package while supporting existing features, though I haven't pushed it
> into the fork. The real question is whether we want to go down this path
> or not.
I see two options: Either txpetscgpu is a self-contained package and
brings its own set of implementation files along, or it should be
integrated. The current model of injected
PETSC_HAVE_TXPETSCGPU-preprocessor switches will not be able to compete
in any code beauty contest... ;-) Either way, there is presumably also
some licensing issue involved, so you guys need to agree to have
txpetscgpu integrated (or not).
> Right now, I think CUSP does not support SpMVs in streams. Thus, in
> order to get an effective multi GPU SpMV (for all the different storage
> formats), one has to rewrite all the SpMV kernels (for all the different
> storage formats) to use streams. This adds a lot of additional code to
> support. I would prefer to just call some CUSP API with a stream as an
> input argument but I don't think that exists at the moment. I'm not sure
> what to do here. Once the other code is accepted, perhaps we can
> address this problem then?
The CUSP API needs to provide streams for that, yes.
As I addressed in my comments on your commits on Bitbucket, I'd prefer
to see CUSP being separated from CUSPARSE and instead use a
CUSPARSE-native matrix datastructure (a simple collection of handles).
This way one can already use the CUSPARSE interface if only the CUDA SDK
is installed, and hook in CUSP later for preconditioners, etc.
> It works across node but you have to know what you're doing. This is a
> tough problem to solve universally because its (almost) impossible to
> determine the number of mpi ranks per node in an mpi run. I've never
> seen an MPI function that returns this information.
>
> Right now, a 1-1 pairing between CPU core and GPU will work across any
> system with any number of nodes. I've tested this on a system with 2
> nodes, 4 GPUs per node (so "mpirun -n 8 -npernode 4" would work)
Thanks, I see. Apparently I'm not the only one struggling with this
abstraction issue...
Best regards,
Karli
More information about the petsc-dev
mailing list