[petsc-dev] how to use GPU, is my speed up sufficient?

Thu Jul 12 22:19:08 CDT 2012

On Jul 12, 2012, at 10:42 AM, Olga Tramontano wrote:

> Hi all
> I am very new to PETSc. I just learnt its class structure and I just understood that the interface Vec has been reimplemented with the class VecCUSP that is able to support GPU. My question is: if I have a simple code that allocates a vector and sets it with the database options, if I specify -vec_type seqcusp then this vector is allocated AND computed on the GPU?
> Because I compared the execution time of the same algorithm with the database options -vec_type seq and -vec_type seqcusp: these two times are very close.
> 
> This is the code: I'm just trying to compute how long does it take to execute the function VecScale:
> #include <petscvec.h>
> #include <string.h>
> 
> #undef __FUNCT__
> #define __FUNCT__ "main"
> int main(int argc,char **argv)
> {
>   Vec            x;
>   PetscInt       n;
>   PetscErrorCode ierr;
>   PetscBool exists;
>   PetscLogDouble  t1,t2;
>   PetscInt a=10;
> 
>   ierr = PetscInitialize(&argc,&argv,(char*)0,help);CHKERRQ(ierr); 
> 
>   ierr = PetscOptionsGetInt(PETSC_NULL,"-size",&n,&exists); CHKERRQ(ierr);
> 

   The following code creates the vector

>   ierr = VecCreate(PETSC_COMM_SELF,&x);CHKERRQ(ierr);
>   ierr = VecSetSizes(x,PETSC_DECIDE,n);CHKERRQ(ierr);
>   ierr = VecSetFromOptions(x);CHKERRQ(ierr);
> 

   The next piece of code fills the vector with numbers ON THE CPU!
>   ierr = VecSetRandom(x, PETSC_NULL); CHKERRQ(ierr);
> 
>   PetscGetTime(&t1);

    When you run this with a GPU vector it first copies the vector down to the GPU and then does the scaling.  This is likely to be slow because copying to the GPU takes lots of time.

>   ierr = VecScale(x,a); CHKERRQ(ierr);
>   PetscGetTime(&t2);
>   ierr = PetscPrintf(PETSC_COMM_WORLD, "%2.5f\n",(t2-t1));

   The above code is not useful for computing GPU performance since you are measuring something completely artificial: the time to copy down and then the scale on the GPU time.

   Try adding an extra VecScale() call before the timing, this will cause the copy down to the GPU and hence the second call to VecScale will not copy down and will be much faster.
Please see our paper 

@article{minden2010preliminary,
          title={Preliminary implementation of PETSc using GPUs},
          author={Minden, V. and Smith, B.F. and Knepley, M.G.},
          journal={Proceedings of the 2010 International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering},
          year={2010}
        }

it discusses the importance of having many operations take place on the GPU without copying the data back and forth between the CPU and the GPU all the time (which eliminates the advantage of the GPU).

   Barry

> 
>   ierr = VecDestroy(&x);CHKERRQ(ierr);
> 
>   ierr = PetscFinalize();
>   return 0;
> }
> 
> I use a  vector of about 500000 numbers. With the database option -vec_type seqcusp , the method VecScale should call the method VecScale_SeqCUSP, right? Am I wrong? Should I do it expressly?
> Anyway with the option -vec_type seq , the method VecScale takes 0.05681 seconds, 
> with the option -vec_type seqcusp,  the method VecScale takes 0.04242 seconds.
> 
> Is this speed up sufficient and realistic?
> thanks
> 
> Olga