[petsc-dev] refactoring petsccusp.h needed

Sat Mar 16 09:25:32 CDT 2013

Here's a trivial version that gives substantially better performance. 
I'm not sure if I have the conjugation right for the complex case.

ierr = VecCUSPGetArrayRead(xin,&xarray);CHKERRQ(ierr);
for (int i=0; i<nv; ++i) {
     ierr = VecCUSPGetArrayRead(yin[i],&yy);CHKERRQ(ierr);
#if defined(PETSC_USE_COMPLEX)
     z[i] = cusp::blas::dotc(*yy,*xarray);
#else
     z[i] = cusp::blas::dot(*yy,*xarray);
#endif
     ierr = VecCUSPRestoreArrayRead(yin[i],&yy);CHKERRQ(ierr);
}
ierr = VecCUSPRestoreArrayRead(xin,&xarray);CHKERRQ(ierr);

-Paul
> Hi Jose,
>
>>> Since I just stumbled over VecMDot_SeqCUSP() when interfacing 
>>> ViennaCL: Do you know what was the reason why the 'old' version was 
>>> replaced by this expensive call to gemv() including the creation of 
>>> temporaries, etc.? Just writing a custom kernel with one work group 
>>> per dot-product should do the job perfectly, shouldn't it?
>>
>> My fault: 
>> https://bitbucket.org/petsc/petsc-hg/commits/ec7a7de2acd477e5edd24cc5a3af441ce7a68a36
>>
>> The motivation was that the previous version was even worse for me 
>> (VecMDot is used a lot in SLEPc and GPU performance was really bad). 
>> At that time I did not have the time to write a custom kernel. If you 
>> write one, I could help in testing and measuring performance.
>
> Thanks for providing the context. It makes sense to me now, because 
> for eigenvalue computations you typically have a lot more vectors 
> taking part in mdot as compared to GMRES. This looks like an 
> archetypal example for using two different kernels: The first is 
> suitable for 'small' numbers of vectors (GMRES), while the second is 
> more gemv-like and good for larger vector counts (SLEPc). I'll let you 
> know as soon as it's ready for testing.
>
> Thanks and best regards,
> Karli