[petsc-dev] refactoring petsccusp.h needed

Karl Rupp rupp at mcs.anl.gov
Sat Mar 16 09:40:56 CDT 2013


Hi Paul,

 > Here's a trivial version that gives substantially better performance.
> I'm not sure if I have the conjugation right for the complex case.
>
> ierr = VecCUSPGetArrayRead(xin,&xarray);CHKERRQ(ierr);
> for (int i=0; i<nv; ++i) {
>      ierr = VecCUSPGetArrayRead(yin[i],&yy);CHKERRQ(ierr);
> #if defined(PETSC_USE_COMPLEX)
>      z[i] = cusp::blas::dotc(*yy,*xarray);
> #else
>      z[i] = cusp::blas::dot(*yy,*xarray);
> #endif
>      ierr = VecCUSPRestoreArrayRead(yin[i],&yy);CHKERRQ(ierr);
> }
> ierr = VecCUSPRestoreArrayRead(xin,&xarray);CHKERRQ(ierr);
>

thanks, this should work well for nv = 1, 2, maybe 3. However, it won't 
help Jose a lot. Clearly, for nv >> 1, there are a bunch of unnecessary 
loads of xarray. Thus, a two-kernel approach is necessary to handle both 
extremes (nv \approx 1 and nv >> 1).

Best regards,
Karli




>> Hi Jose,
>>
>>>> Since I just stumbled over VecMDot_SeqCUSP() when interfacing
>>>> ViennaCL: Do you know what was the reason why the 'old' version was
>>>> replaced by this expensive call to gemv() including the creation of
>>>> temporaries, etc.? Just writing a custom kernel with one work group
>>>> per dot-product should do the job perfectly, shouldn't it?
>>>
>>> My fault:
>>> https://bitbucket.org/petsc/petsc-hg/commits/ec7a7de2acd477e5edd24cc5a3af441ce7a68a36
>>>
>>>
>>> The motivation was that the previous version was even worse for me
>>> (VecMDot is used a lot in SLEPc and GPU performance was really bad).
>>> At that time I did not have the time to write a custom kernel. If you
>>> write one, I could help in testing and measuring performance.
>>
>> Thanks for providing the context. It makes sense to me now, because
>> for eigenvalue computations you typically have a lot more vectors
>> taking part in mdot as compared to GMRES. This looks like an
>> archetypal example for using two different kernels: The first is
>> suitable for 'small' numbers of vectors (GMRES), while the second is
>> more gemv-like and good for larger vector counts (SLEPc). I'll let you
>> know as soon as it's ready for testing.
>>
>> Thanks and best regards,
>> Karli
>




More information about the petsc-dev mailing list