[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

Sun Oct 2 13:56:36 CDT 2011

Matthew Knepley writes:
 > On Sat, Oct 1, 2011 at 11:26 PM, Dave Nystrom <Dave.Nystrom at tachyonlogic.com> wrote:
 > > Barry Smith writes:
 > >  > On Oct 1, 2011, at 9:22 PM, Dave Nystrom wrote:
 > >  > > Hi Barry,
 > >  > >
 > >  > > I've sent a couple more emails on this topic.  What I am trying to do at the
 > >  > > moment is to figure out how to have a problem run on only one gpu if it will
 > >  > > fit in the memory of that gpu.  Back in April when I had built petsc-dev with
 > >  > > Cuda 3.2, petsc would only use one gpu if you had multiple gpus on your
 > >  > > machine.  In order to use multiple gpus for a problem, one had to use
 > >  > > multiple threads with a separate thread assigned to control each gpu.  But
 > >  > > Cuda 4.0 has, I believe, made that transparent and under the hood.  So now
 > >  > > when I run a small example problem such as
 > >  > > src/ksp/ksp/examples/tutorials/ex2f.F with an 800x800 problem, it gets
 > >  > > partitioned to run on both of the gpus in my machine.  The result is a very
 > >  > > large performance hit because of communication back and forth from one gpu to
 > >  > > the other via the cpu.
 > >  >
 > >  > How do you know there is lots of communication from the GPU to the CPU? In
 > >  > the -log_summary? Nope because PETSc does not manage anything like that
 > >  > (that is one CPU process using both GPUs).
 > >
 > > What I believe is that it is being managed by Cuda 4.0, not by petsc.
 > >
 > >  > > So this problem with a 3200x3200 grid runs 5x slower
 > >  > > now than it did with Cuda 3.2.  I believe if one is programming down at the
 > >  > > cuda level, it is possible to have a smaller problem run on only one gpu so
 > >  > > that there is communication only between the cpu and gpu and only at the
 > >  > > start and end of the calculation.
 > >  > >
 > >  > > To me, it seems like what is needed is a petsc option to specify the number
 > >  > > of gpus to run on that can somehow get passed down to the cuda level through
 > >  > > cusp and thrust.  I fear that the short term solution is going to have to be
 > >  > > for me to pull one of the gpus out of my desktop system but it would be nice
 > >  > > if there was a way to tell petsc and friends to just use one gpu when I want
 > >  > > it to.
 > >  > >
 > >  > > If necessary, I can send a couple of log files to demonstrate what I am
 > >  > > trying to describe regarding the performance hit.
 > >  >
 > >  > I am not convinced that the poor performance you are getting now has
 > >  > anything to do with using both GPUs. Please run a PETSc program with the
 > >  > command -cuda_show_devices
 > >
 > > I ran the following command:
 > >
 > > ex2f -m 8 -n 8 -ksp_type cg -pc_type jacobi -log_summary -cuda_show_devices
 > > -mat_type aijcusp -vec_type cusp -options_left
 > >
 > > The result was a report that there was one option left, that being
 > > -cuda_show_devices.  I am using a copy of petsc-dev that I cloned and built
 > > this morning.
 > 
 > What do you have at src/sys/object/pinit.c:825? You should see the code
 > that processes this option. You should be able to break there in the
 > debugger and see what happens. This sounds again like you are not
 > processing options correctly.

Hi Matt,

I'll take a look at that in a bit and see if I can figure out what is going
on.  I do see the code that you mention that processes the arguments that
Barry mentioned.  In terms of processing options correctly, at least in this
case I am actually running one of the petsc examples rather than my own
code.  And it seems to correctly process the other command line arguments.
Anyway, I'll write more after I have had a chance to investigate more.

Thanks,

Dave

 > Matt
 > 
 > >  > What are the choices?  You can then pick one of them and run with
 > > -cuda_set_device integer
 > >
 > > The -cuda_set_device option does not appear to be recognized either, even
 > > if I choose an integer like 0.
 > >
 > >  > Does this change things?
 > >
 > > I suspect it would change things if I could get it to work.
 > >
 > > Thanks,
 > >
 > > Dave
 > >
 > >  > Barry
 > >  >
 > >  > >
 > >  > > Thanks,
 > >  > >
 > >  > > Dave
 > >  > >
 > >  > > Barry Smith writes:
 > >  > >> Dave,
 > >  > >>
 > >  > >> We have no mechanism in the PETSc code for a PETSc single CPU process to
 > >  > >> use two GPUs at the same time. However you could have two MPI processes
 > >  > >> each using their own GPU.
 > >  > >>
 > >  > >> The one tricky part is you need to make sure each MPI process uses a
 > >  > >> different GPU. We currently do not have a mechanism to do this assignment
 > >  > >> automatically. I think it can be done with cudaSetDevice(). But I don't
 > >  > >> know the details, sending this to petsc-dev at mcs.anl.gov where more people
 > >  > >> may know.
 > >  > >>
 > >  > >> PETSc-folks,
 > >  > >>
 > >  > >> We need a way to have this setup automatically.
 > >  > >>
 > >  > >> Barry
 > >  > >>
 > >  > >> On Oct 1, 2011, at 5:43 PM, Dave Nystrom wrote:
 > >  > >>
 > >  > >>> I'm running petsc on a machine with Cuda 4.0 and 2 gpus.  This is a desktop
 > >  > >>> machine with a single processor.  I know that Cuda 4.0 has support for
 > >  > >>> running on multiple gpus but don't know if petsc uses that.  But suppose I
 > >  > >>> have a problem that will fit in the memory for a single gpu.  Will petsc run
 > >  > >>> the problem on a single gpu or does it split it between the 2 gpus and incur
 > >  > >>> the communication overhead of copying data between the two gpus?
 > >  > >>>
 > >  > >>> Thanks,
 > >  > >>>
 > >  > >>> Dave
 > 
 > -- 
 > What most experimenters take for granted before they begin their experiments
 > is infinitely more interesting than any results to which their experiments
 > lead.
 > -- Norbert Wiener