[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

Matthew Knepley petsc-maint at mcs.anl.gov
Sun Oct 2 10:22:40 CDT 2011


On Sat, Oct 1, 2011 at 11:26 PM, Dave Nystrom <Dave.Nystrom at tachyonlogic.com
> wrote:

> Barry Smith writes:
>  >
>  > On Oct 1, 2011, at 9:22 PM, Dave Nystrom wrote:
>  >
>  > > Hi Barry,
>  > >
>  > > I've sent a couple more emails on this topic.  What I am trying to do
> at the
>  > > moment is to figure out how to have a problem run on only one gpu if
> it will
>  > > fit in the memory of that gpu.  Back in April when I had built
> petsc-dev with
>  > > Cuda 3.2, petsc would only use one gpu if you had multiple gpus on
> your
>  > > machine.  In order to use multiple gpus for a problem, one had to use
>  > > multiple threads with a separate thread assigned to control each gpu.
>  But
>  > > Cuda 4.0 has, I believe, made that transparent and under the hood.  So
> now
>  > > when I run a small example problem such as
>  > > src/ksp/ksp/examples/tutorials/ex2f.F with an 800x800 problem, it gets
>  > > partitioned to run on both of the gpus in my machine.  The result is a
> very
>  > > large performance hit because of communication back and forth from one
> gpu to
>  > > the other via the cpu.
>  >
>  > How do you know there is lots of communication from the GPU to the CPU?
> In
>  > the -log_summary? Nope because PETSc does not manage anything like that
>  > (that is one CPU process using both GPUs).
>
> What I believe is that it is being managed by Cuda 4.0, not by petsc.
>
>  > > So this problem with a 3200x3200 grid runs 5x slower
>  > > now than it did with Cuda 3.2.  I believe if one is programming down
> at the
>  > > cuda level, it is possible to have a smaller problem run on only one
> gpu so
>  > > that there is communication only between the cpu and gpu and only at
> the
>  > > start and end of the calculation.
>  > >
>  > > To me, it seems like what is needed is a petsc option to specify the
> number
>  > > of gpus to run on that can somehow get passed down to the cuda level
> through
>  > > cusp and thrust.  I fear that the short term solution is going to have
> to be
>  > > for me to pull one of the gpus out of my desktop system but it would
> be nice
>  > > if there was a way to tell petsc and friends to just use one gpu when
> I want
>  > > it to.
>  > >
>  > > If necessary, I can send a couple of log files to demonstrate what I
> am
>  > > trying to describe regarding the performance hit.
>  >
>  > I am not convinced that the poor performance you are getting now has
>  > anything to do with using both GPUs. Please run a PETSc program with the
>  > command -cuda_show_devices
>
> I ran the following command:
>
> ex2f -m 8 -n 8 -ksp_type cg -pc_type jacobi -log_summary -cuda_show_devices
> -mat_type aijcusp -vec_type cusp -options_left
>
> The result was a report that there was one option left, that being
> -cuda_show_devices.  I am using a copy of petsc-dev that I cloned and built
> this morning.


What do you have at src/sys/object/pinit.c:825? You should see the code that
processes this option. You
should be able to break there in the debugger and see what happens. This
sounds again like you are
not processing options correctly.

   Matt


>  > What are the choices?  You can then pick one of them and run with
> -cuda_set_device integer
>
> The -cuda_set_device option does not appear to be recognized either, even
> if
> I choose an integer like 0.
>
>  > Does this change things?
>
> I suspect it would change things if I could get it to work.
>
> Thanks,
>
> Dave
>
>  > Barry
>  >
>  > >
>  > > Thanks,
>  > >
>  > > Dave
>  > >
>  > > Barry Smith writes:
>  > >> Dave,
>  > >>
>  > >> We have no mechanism in the PETSc code for a PETSc single CPU process
> to
>  > >> use two GPUs at the same time. However you could have two MPI
> processes
>  > >> each using their own GPU.
>  > >>
>  > >> The one tricky part is you need to make sure each MPI process uses a
>  > >> different GPU. We currently do not have a mechanism to do this
> assignment
>  > >> automatically. I think it can be done with cudaSetDevice(). But I
> don't
>  > >> know the details, sending this to petsc-dev at mcs.anl.gov where more
> people
>  > >> may know.
>  > >>
>  > >> PETSc-folks,
>  > >>
>  > >> We need a way to have this setup automatically.
>  > >>
>  > >> Barry
>  > >>
>  > >> On Oct 1, 2011, at 5:43 PM, Dave Nystrom wrote:
>  > >>
>  > >>> I'm running petsc on a machine with Cuda 4.0 and 2 gpus.  This is a
> desktop
>  > >>> machine with a single processor.  I know that Cuda 4.0 has support
> for
>  > >>> running on multiple gpus but don't know if petsc uses that.  But
> suppose I
>  > >>> have a problem that will fit in the memory for a single gpu.  Will
> petsc run
>  > >>> the problem on a single gpu or does it split it between the 2 gpus
> and incur
>  > >>> the communication overhead of copying data between the two gpus?
>  > >>>
>  > >>> Thanks,
>  > >>>
>  > >>> Dave
>  > >>>
>  > >>
>  >
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111002/14597930/attachment.html>


More information about the petsc-dev mailing list