[petsc-dev] [GPU - slepc] Hands-on exercise 4 (SVD) not working with GPU and default configurations

Leoni, Massimiliano Massimiliano.Leoni at Rolls-Royce.com
Tue Aug 11 05:17:22 CDT 2015


Jose,

I have a doubt I made myself unclear earlier: when I said the GPU version was slower than the CPU version, I meant single GPU vs single CPU multithreaded [i.e. 12 threads].

The single GPU version is, at the moment, performing slightly better than the serial [1 CPU with one thread] version.
For example, I ran my code reading a 40000x400 matrix I created sampling from a function [a sum of sines with different periods].
The average execution time on a single CPU is 13.6s, the one on a single GPU is 8.4s; these are similar to the ones I get running the hands-on exercise on SVD out-of-the-box [accordingly to the fact that this portion of my code follows the outline of that example].

I am running on what I think is an optimised build, here are my configure options:
PETSC_ARCH=linux-gpu-optimised
--with-clanguage=c++
--COPTFLAGS=-O3
--CXXOPTFLAGS=-O3
--CUDAOPTFLAGS=-O3
--FOPTFLAGS=-O3
--with-debugging=no
--with-log=1
--with-blas-lapack-dir=/opt/intel/mkl/
--with-mpi-dir=/path/to/openmpi-1.8.6-gcc
--with-openmp=1
--with-hdf5-dir=/path/to/hdf5-1.8.15-patch1/
--with-cuda=1
--with-cuda-dir=/path/to/cuda-7.0
--CUDAC=/path/to/nvcc
--with-cusp=1
--with-cusp-dir=/path/to/cusplibrary
--with-cgns-dir=/path/to/CGNS/
--with-cmake-dir=/path/to/cmake-3.2.3-Linux-x86_64/

Addressing the other point you raised: I am not scared of low-level programming, but I have quite a tight deadline to present results.

Best,

Massimiliano


> -----Original Message-----
> From: Jose E. Roman [mailto:jroman at dsic.upv.es]
> Sent: 10 August 2015 17:59
> To: Leoni, Massimiliano
> Cc: slepc-maint at upv.es; petsc-dev at mcs.anl.gov
> Subject: Re: [petsc-dev] [GPU - slepc] Hands-on exercise 4 (SVD) not working
> with GPU and default configurations
> 
> Massimiliano,
> 
> You should not be getting slower times on the GPU. I tried with a hardware
> similar to what you mention, running SVD on a dense square matrix stored as
> aij, and also with sparse rectangular matrices. In all cases, executions on the
> GPU were roughly 2x faster than on the CPU. Are you running with an
> optimized build? There might be something wrong with your code. I would
> need to know the exact options that you are using. Maybe you can share
> your code with us, or even the matrix.
> 
> For the case of a dense matrix, one could create a customized shell matrix
> that stores data on the GPU and uses cuBLAS for the matrix-vector product.
> We have recently done this on a different problem and results were quite
> good. However, it is much more low-level programming compared to just
> setting AIJCUSP type for the matrix.
> 
> Jose
> 
> 
> 
> > El 10/8/2015, a las 15:55, Leoni, Massimiliano <Massimiliano.Leoni at rolls-
> royce.com> escribió:
> >
> > > -----Original Message-----
> > > From: Karl Rupp [mailto:rupp at iue.tuwien.ac.at]
> > > Sent: 10 August 2015 14:13
> > > To: Leoni, Massimiliano
> > > Cc: slepc-maint at upv.es; petsc-dev at mcs.anl.gov
> > > Subject: Re: [petsc-dev] [GPU - slepc] Hands-on exercise 4 (SVD) not
> > > working with GPU and default configurations
> >
> > > Maybe you forgot to call SlepcFinalize()?
> > Unfortunately it's not it, if I omit SlepcFinalize() an error message shows up
> at runtime to remind me.
> >
> > > Ok, this is actually a relatively GPU-friendly setup, because CPUs
> > > have reduced the gap in terms of FLOPs quite a bit (see for example
> > > http://www.karlrupp.net/2013/06/cpu-gpu-and-mic-hardware-
> > > characteristics-over-time/  )
> > Read, thanks for sharing!
> > > I'd suggest to convince your supervisor into buying/using a cluster
> > > with current hardware and enjoy a higher speedup compared to what
> > > you could get in an ideal setting with a GPU from 2010 anyway ;-)
> > This could partly be overcome as I was told I *might*, eventually, have
> access to a big cluster with many NVIDIA Tesla K20.
> >
> > > (Having said that, I carefully estimate that you can get some
> > > performance gains for SVD if you deep-dive into the existing SVD
> > > implementation, carefully redesign it to minimize CPU<->GPU
> > > communication, and use optimized library routines from the BLAS 3
> > > operations. Currently there is not enough GPU-infrastructure in
> > > PETSc to achieve this via command line parameters only.)
> > Mmm, can you give a rough estimate of the effort involved in this?
> >
> >
> > >From: Matthew Knepley [mailto:knepley at gmail.com]
> > >Sent: 10 August 2015 14:28
> > >To: Leoni, Massimiliano
> > >Cc: Karl Rupp; slepc-maint at upv.es; petsc-dev at mcs.anl.gov
> > >Subject: Re: [petsc-dev] [GPU - slepc] Hands-on exercise 4 (SVD) not
> > >working with GPU and default configurations
> >
> > >Try calling PetscLogBegin() after PetscInitialize(). We have now put in an
> error if this is not initialized correctly.
> > This didn’t do the trick, unfortunately L Do I have to pull from the
> > repo and rebuild?
> >
> > [In general, can I pull and rebuild without running configure again?]
> >
> > >I agree with Karl that not much speedup can be expected with GPUs.
> > >This is the fault of dishonest marketing. None of the computations in
> > >PETSc are limited by the computation rate, rather they are limited by
> > >memory bandwidth. The bandwidth is at best 2-3x better, and less for
> modern CPUs. The dense SVD can be better than this, but you are eventually
> limited by offload times and memory latency. The story of 100x, or even 10x,
> speedups is just a fraud.
> > I remember reading this in one of the petsc reports [the “Preliminary
> evaluation” one?].
> > I’ll see what I can do
> >
> > Best regards,
> > Massimiliano
> >
> >
> > The data contained in, or attached to, this e-mail, may contain confidential
> information. If you have received it in error you should notify the sender
> immediately by reply e-mail, delete the message from your system and
> contact +44 (0) 1332 622800(Security Operations Centre) if you need
> assistance. Please do not copy it for any purpose, or disclose its contents to
> any other person.
> >
> > An e-mail response to this address may be subject to interception or
> monitoring for operational reasons or for lawful business practices.
> >
> > (c) 2015 Rolls-Royce plc
> >
> > Registered office: 62 Buckingham Gate, London SW1E 6AT Company
> number: 1003142. Registered in England.
> >

The data contained in, or attached to, this e-mail, may contain confidential information. If you have received it in error you should notify the sender immediately by reply e-mail, delete the message from your system and contact +44 (0) 1332 622800(Security Operations Centre) if you need assistance. Please do not copy it for any purpose, or disclose its contents to any other person.

An e-mail response to this address may be subject to interception or monitoring for operational reasons or for lawful business practices.

(c) 2015 Rolls-Royce plc

Registered office: 62 Buckingham Gate, London SW1E 6AT Company number: 1003142. Registered in England.


More information about the petsc-dev mailing list