[petsc-dev] [petsc-maint #88993] Petsc with Cuda 4.0 and Multiple GPUs
Matthew Knepley
petsc-maint at mcs.anl.gov
Mon Oct 3 09:33:19 CDT 2011
On Sun, Oct 2, 2011 at 4:43 PM, Dave Nystrom
<Dave.Nystrom at tachyonlogic.com>wrote:
> Dave Nystrom writes:
> > In case it might be useful, I have attached two log files of runs with
> the
> > ex2f petsc example from src/ksp/ksp/examples/tutorials. One was run
> back in
> > April with petsc-dev linked to Cuda 3.2. It shows excellent runtime
> > performance. The other was run today with petsc-dev checked out of the
> > mercurial repo yesterday morning and linked to Cuda 4.0. In addition to
> the
> > differences in run time performance, I also do not see an entry for
> > MatCUSPCopyTo in the profiling section. I'm not sure what the
> significance
> > of that is. I do observe that the run time for PCApply is about the
> same for
> > the two cases. I think I would expect that to be the case even if the
> > problem were partitioned across two gpus. However, it does make me
> wonder if
> > the absence of MatCUSPCopyTo in the profiling section of the Cuda 4.0
> log
> > file is an indication that the matrix was not actually copied to the
> gpu.
> > I'm not sure yet how to check for that. Hope this might be useful.
>
> I have been able to get the option "-cuda_show_devices" to work if I use
> the
> C version of the ex2 example rather than the Fortran version. So it would
> seem that there are some issues associated with command line option
> processing for the petsc case. To be more explicit, I am running the
> following C petsc example:
>
> src/ksp/ksp/examples/tutorials/ex2.c
>
> However, when I ran this example with the "-cuda_set_device 0" option, I
> did
> not see any change in the run time performance. The option was recognized
> and parsed by the C example.
>
> I'm not sure how to proceed. It would seem that one of two scenarios may
> be
> at play here.
>
> 1. The problem is being partitioned across the two gpus under the hood by
> Cuda 4.0 regardless of whether the problem would fit on one gpu. And this
> has the result that the matvec requires communication each iteration
> between
> the two gpus.
>
Dave, this is definitely not happening. There is not evidence for this.
Instead, the
matrix is not using the GPU at all. There must be
MatCUSPCopyToGPU ---
in the -log_summary in order to be using the GPU.
> 2. For some reason, the matrix may not be copied to the gpu at all meaning
> that the matvec requires communication with the gpu on each iteration.
>
> Any thoughts on what might be happening? I certainly got excellent
> performance back in April.
>
Look at your April log. It has that event. Something else is happening in
this code.
I can confirm that my run of ex2f executes MatCUSPCopyToGPU.
You can look at the MatMult call in the debugger, and see what it is
dispatching to.
Matt
> Thanks,
>
> Dave
>
--
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20111003/2e30d24c/attachment.html>
More information about the petsc-dev
mailing list