[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

Sat Oct 1 22:48:35 CDT 2011

our testbox has 2gpus

balay at bb30:~>lspci |grep -i nvidia
0b:00.0 3D controller: nVidia Corporation GT200 [Tesla C1060] (rev a1)
0c:00.0 3D controller: nVidia Corporation GT200 [Tesla C1060] (rev a1)
balay at bb30:~>

Is there some test I can run on this? [it has cuda 4.0]

satish

On Sat, 1 Oct 2011, Matthew Knepley wrote:

> This diagnosis is total crap (I think), as I tried to explain. We would
> never get the same result (or the right result),
> and partitioning makes no sense. Something else is going on. Can't we run on
> a 2 GPU system at ANL?
> 
>    Matt
> 
> On Sat, Oct 1, 2011 at 9:30 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
> >
> > On Oct 1, 2011, at 9:22 PM, Dave Nystrom wrote:
> >
> > > Hi Barry,
> > >
> > > I've sent a couple more emails on this topic.  What I am trying to do at
> > the
> > > moment is to figure out how to have a problem run on only one gpu if it
> > will
> > > fit in the memory of that gpu.  Back in April when I had built petsc-dev
> > with
> > > Cuda 3.2, petsc would only use one gpu if you had multiple gpus on your
> > > machine.  In order to use multiple gpus for a problem, one had to use
> > > multiple threads with a separate thread assigned to control each gpu.
> >  But
> > > Cuda 4.0 has, I believe, made that transparent and under the hood.  So
> > now
> > > when I run a small example problem such as
> > > src/ksp/ksp/examples/tutorials/ex2f.F with an 800x800 problem, it gets
> > > partitioned to run on both of the gpus in my machine.  The result is a
> > very
> > > large performance hit because of communication back and forth from one
> > gpu to
> > > the other via the cpu.
> >
> >     How do you know there is lots of communication from the GPU to the CPU?
> > In the -log_summary? Nope because PETSc does not manage anything like that
> > (that is one CPU process using both GPUs).
> >
> >
> > > So this problem with a 3200x3200 grid runs 5x slower
> > > now than it did with Cuda 3.2.  I believe if one is programming down at
> > the
> > > cuda level, it is possible to have a smaller problem run on only one gpu
> > so
> > > that there is communication only between the cpu and gpu and only at the
> > > start and end of the calculation.
> > >
> > > To me, it seems like what is needed is a petsc option to specify the
> > number
> > > of gpus to run on that can somehow get passed down to the cuda level
> > through
> > > cusp and thrust.  I fear that the short term solution is going to have to
> > be
> > > for me to pull one of the gpus out of my desktop system but it would be
> > nice
> > > if there was a way to tell petsc and friends to just use one gpu when I
> > want
> > > it to.
> > >
> > > If necessary, I can send a couple of log files to demonstrate what I am
> > > trying to describe regarding the performance hit.
> >
> >    I am not convinced that the poor performance you are getting now has
> > anything to do with using both GPUs. Please run
> > a PETSc program with the command -cuda_show_devices
> >
> >    What are the choices?  You can then pick one of them and run with
> > -cuda_set_device integer
> >
> >    Does this change things?
> >
> >    Barry
> >
> > >
> > > Thanks,
> > >
> > > Dave
> > >
> > > Barry Smith writes:
> > >> Dave,
> > >>
> > >> We have no mechanism in the PETSc code for a PETSc single CPU process to
> > >> use two GPUs at the same time. However you could have two MPI processes
> > >> each using their own GPU.
> > >>
> > >> The one tricky part is you need to make sure each MPI process uses a
> > >> different GPU. We currently do not have a mechanism to do this
> > assignment
> > >> automatically. I think it can be done with cudaSetDevice(). But I don't
> > >> know the details, sending this to petsc-dev at mcs.anl.gov where more
> > people
> > >> may know.
> > >>
> > >> PETSc-folks,
> > >>
> > >> We need a way to have this setup automatically.
> > >>
> > >> Barry
> > >>
> > >> On Oct 1, 2011, at 5:43 PM, Dave Nystrom wrote:
> > >>
> > >>> I'm running petsc on a machine with Cuda 4.0 and 2 gpus.  This is a
> > desktop
> > >>> machine with a single processor.  I know that Cuda 4.0 has support for
> > >>> running on multiple gpus but don't know if petsc uses that.  But
> > suppose I
> > >>> have a problem that will fit in the memory for a single gpu.  Will
> > petsc run
> > >>> the problem on a single gpu or does it split it between the 2 gpus and
> > incur
> > >>> the communication overhead of copying data between the two gpus?
> > >>>
> > >>> Thanks,
> > >>>
> > >>> Dave
> > >>>
> > >>
> >
> >
> 
> 
>