[petsc-dev] [GPU - slepc] Hands-on exercise 4 (SVD) not working with GPU and default configurations
Karl Rupp
rupp at iue.tuwien.ac.at
Mon Aug 10 08:13:20 CDT 2015
Hi,
>> The use of aijcusp instead of a dense matrix type certainly adds to
the issue.
> I know, but I couldn't find a dense gpu type in the petsc manual, please correct me if there is any.
There is indeed no dense GPU matrix type in PETSc (yet).
>> Please send the output of -log_summary so that we can see where most
>> time is spent.
> I am unable to do that as somehow I am having no output when I use that option. I also tried to explicitly call PetscLogView but still nothing is printed out.
> If I try with one of the slepc examples, I get the output.
> Why is this happening? If I run my code with -info or -log_trace I see their output, only -log_summary is shy!
Maybe you forgot to call SlepcFinalize()?
>> If you have good (recent) CPUs in dual-socket configuration, it's more than
>> unlikely that you will gain anything beyond ~2x with an optimized GPU setup.
>> Even that ~2x may only be possible with heavily tweaking the current SVD-
>> implementation in SLEPc, of which I don't know the details.
> I used Xeon processors from 2010, just like the GPUs.
Ok, this is actually a relatively GPU-friendly setup, because CPUs have
reduced the gap in terms of FLOPs quite a bit (see for example
http://www.karlrupp.net/2013/06/cpu-gpu-and-mic-hardware-characteristics-over-time/
)
> This is not good news, as my supervisor is really optimist about using GPUs and getting high speed-ups!
> Anyway, at the moment my gpu version is several times slower than the cpu version, so even a 2x would be a win now :D
I'd suggest to convince your supervisor into buying/using a cluster with
current hardware and enjoy a higher speedup compared to what you could
get in an ideal setting with a GPU from 2010 anyway ;-)
(Having said that, I carefully estimate that you can get some
performance gains for SVD if you deep-dive into the existing SVD
implementation, carefully redesign it to minimize CPU<->GPU
communication, and use optimized library routines from the BLAS 3
operations. Currently there is not enough GPU-infrastructure in PETSc to
achieve this via command line parameters only.)
Best regards,
Karli
More information about the petsc-dev
mailing list