[petsc-dev] a few issues with current CUDA code for Mat.

Fri Aug 20 14:56:54 CDT 2010

On Aug 19, 2010, at 6:59 PM, Lisandro Dalcin wrote:

> I bet all involved people is aware of most of this issues, but just in case.
> 
> * MatAssemblyEnd_SeqAIJCUDA: What about mode=MAT_FLUSH_ASSEMBLY?
> What's the point of coping to the GPU?

  Fixed.

> 
> * MatAssemblyEnd_SeqAIJCUDA: the 'tempvec'  cusp array is always
> allocated, but not used for MatMult when no commpressed row. Of
> course, this issue is very low priority.

  Removed seemingly unneeded allocation
> 
> * MatAssemblyEnd_SeqAIJCUDA: Perhaps memory allocation on the GPU is
> cheap, but if nonzeros do not change, we could avoid re-creating the
> GPU mat from scratch.

  I have noted this in a comment in the code. I want to keep things simple for now and think the performance hit is likely tiny.

> 
> * There are some calls that operate on assembled matrices (MatScale,
> MatZeroRows, MatDiagonalScale, etc.). These operations need GPU
> syncing. Am I missing something?
> 
> * MatShift: seqaij does not implement MatShift, then MatSetValues is
> used in a loop, next the matrix is re-assembled. This will cause an
> extra copy to the GPU (take into account that used code already
> assembled the matrix before the MatShift call). Other calls will
> suffer from this issue: MatDiagonalSet
> 
> * MatGetArray: if the user updates values, we are in trouble.
> 
> All that being said, I'm still unsure why the GPU coping was
> implemented at MatAssemblyEnd_SeqAIJ. What about using a
> valid_GPU_data flag for Mat, set it to false in MatAssembly_Begin, and
> make the GPU coping at the time MatMult_SeqAIJ is called? Of course,
> such appoach would not solve all the previous issues... I'm just
> asking the rationale for the current approach.
> 

   You could be right. We may have to change to this model. One reason we adopted the current model was its simplicity (of course, if it is wrong, then being simple is not much help). Also we might have to set the valid_GPU_data flag in a bunch of routines that currently don't know about GPUs.  

   I suspect that we will have to change this. Thanks for pointing out the flaws with the current code.

   Barry

> 
> -- 
> Lisandro Dalcin
> ---------------
> CIMEC (INTEC/CONICET-UNL)
> Predio CONICET-Santa Fe
> Colectora RN 168 Km 472, Paraje El Pozo
> Tel: +54-342-4511594 (ext 1011)
> Tel/Fax: +54-342-4511169