[petsc-dev] [GPU - slepc] Hands-on exercise 4 (SVD) not working with GPU and default configurations

Karl Rupp rupp at iue.tuwien.ac.at
Mon Aug 10 08:13:20 CDT 2015


Hi,

 >> The use of aijcusp instead of a dense matrix type certainly adds to 
the issue.
> I know, but I couldn't find a dense gpu type in the petsc manual, please correct me if there is any.

There is indeed no dense GPU matrix type in PETSc (yet).


>> Please send the output of -log_summary so that we can see where most
>> time is spent.
> I am unable to do that as somehow I am having no output when I use that option. I also tried to explicitly call PetscLogView but still nothing is printed out.
> If I try with one of the slepc examples, I get the output.
> Why is this happening? If I run my code with -info or -log_trace I see their output, only -log_summary is shy!

Maybe you forgot to call SlepcFinalize()?


>> If you have good (recent) CPUs in dual-socket configuration, it's more than
>> unlikely that you will gain anything beyond ~2x with an optimized GPU setup.
>> Even that ~2x may only be possible with heavily tweaking the current SVD-
>> implementation in SLEPc, of which I don't know the details.
> I used Xeon processors from 2010, just like the GPUs.

Ok, this is actually a relatively GPU-friendly setup, because CPUs have 
reduced the gap in terms of FLOPs quite a bit (see for example
http://www.karlrupp.net/2013/06/cpu-gpu-and-mic-hardware-characteristics-over-time/ 
)

> This is not good news, as my supervisor is really optimist about using GPUs and getting high speed-ups!
> Anyway, at the moment my gpu version is several times slower than the cpu version, so even a 2x would be a win now :D

I'd suggest to convince your supervisor into buying/using a cluster with 
current hardware and enjoy a higher speedup compared to what you could 
get in an ideal setting with a GPU from 2010 anyway ;-)

(Having said that, I carefully estimate that you can get some 
performance gains for SVD if you deep-dive into the existing SVD 
implementation, carefully redesign it to minimize CPU<->GPU 
communication, and use optimized library routines from the BLAS 3 
operations. Currently there is not enough GPU-infrastructure in PETSc to 
achieve this via command line parameters only.)

Best regards,
Karli




More information about the petsc-dev mailing list