[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs

Sat Oct 1 21:30:32 CDT 2011

On Oct 1, 2011, at 9:22 PM, Dave Nystrom wrote:

> Hi Barry,
> 
> I've sent a couple more emails on this topic.  What I am trying to do at the
> moment is to figure out how to have a problem run on only one gpu if it will
> fit in the memory of that gpu.  Back in April when I had built petsc-dev with
> Cuda 3.2, petsc would only use one gpu if you had multiple gpus on your
> machine.  In order to use multiple gpus for a problem, one had to use
> multiple threads with a separate thread assigned to control each gpu.  But
> Cuda 4.0 has, I believe, made that transparent and under the hood.  So now
> when I run a small example problem such as
> src/ksp/ksp/examples/tutorials/ex2f.F with an 800x800 problem, it gets
> partitioned to run on both of the gpus in my machine.  The result is a very
> large performance hit because of communication back and forth from one gpu to
> the other via the cpu.  

    How do you know there is lots of communication from the GPU to the CPU? In the -log_summary? Nope because PETSc does not manage anything like that (that is one CPU process using both GPUs).

> So this problem with a 3200x3200 grid runs 5x slower
> now than it did with Cuda 3.2.  I believe if one is programming down at the
> cuda level, it is possible to have a smaller problem run on only one gpu so
> that there is communication only between the cpu and gpu and only at the
> start and end of the calculation.
> 
> To me, it seems like what is needed is a petsc option to specify the number
> of gpus to run on that can somehow get passed down to the cuda level through
> cusp and thrust.  I fear that the short term solution is going to have to be
> for me to pull one of the gpus out of my desktop system but it would be nice
> if there was a way to tell petsc and friends to just use one gpu when I want
> it to.
> 
> If necessary, I can send a couple of log files to demonstrate what I am
> trying to describe regarding the performance hit.

   I am not convinced that the poor performance you are getting now has anything to do with using both GPUs. Please run 
a PETSc program with the command -cuda_show_devices 

    What are the choices?  You can then pick one of them and run with -cuda_set_device integer    

    Does this change things?

    Barry

> 
> Thanks,
> 
> Dave
> 
> Barry Smith writes:
>> Dave,
>> 
>> We have no mechanism in the PETSc code for a PETSc single CPU process to
>> use two GPUs at the same time. However you could have two MPI processes
>> each using their own GPU.
>> 
>> The one tricky part is you need to make sure each MPI process uses a
>> different GPU. We currently do not have a mechanism to do this assignment
>> automatically. I think it can be done with cudaSetDevice(). But I don't
>> know the details, sending this to petsc-dev at mcs.anl.gov where more people
>> may know.
>> 
>> PETSc-folks,
>> 
>> We need a way to have this setup automatically.
>> 
>> Barry
>> 
>> On Oct 1, 2011, at 5:43 PM, Dave Nystrom wrote:
>> 
>>> I'm running petsc on a machine with Cuda 4.0 and 2 gpus.  This is a desktop
>>> machine with a single processor.  I know that Cuda 4.0 has support for
>>> running on multiple gpus but don't know if petsc uses that.  But suppose I
>>> have a problem that will fit in the memory for a single gpu.  Will petsc run
>>> the problem on a single gpu or does it split it between the 2 gpus and incur
>>> the communication overhead of copying data between the two gpus?
>>> 
>>> Thanks,
>>> 
>>> Dave
>>> 
>>