[petsc-dev] Performance of VecMDot_SeqCUSP

Jose E. Roman jroman at dsic.upv.es
Tue Apr 24 11:55:16 CDT 2012


It seems that VecMDot_SeqCUSP has rather poor performance. This has a lot of impact in SLEPc because it is the main kernel used in the orthogonalization of vectors.

Is this due to the version of Thrust? I am using CUDA Toolkit 4.0.

I tried a naive replacement that copies the contents of the vectors into a matrix and calls CUBLAS dgemv. The improvement is significant, despite the data movement overhead. In some tests I see a reduction of time (VecReduceArith) from 24.5 seconds to 9.6 seconds (with up to 200 vectors of length 10000) on a Fermi.

I can send the code for you to try.

Jose




More information about the petsc-dev mailing list