[petsc-users] MatMult

Matthew Knepley knepley at gmail.com
Mon Dec 13 01:37:00 CST 2010


Yes, it should run on the GPU. Check an example, like ex19.

   Matt

On Mon, Dec 13, 2010 at 7:29 AM, Jakub Pola <jakub.pola at gmail.com> wrote:

> Hi,
>
> Does MatMult function is performed on GPU? when I prepared program which
> just executes this function with parameters -vec_type cuda and -mat_type
> seqaijcuda i havent seen in summary log any VecCUDACopyTo entry
>
>
> Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith pisze:
> > To answer this you need to understand that PETSc copies vectors and
> matrices to the GPU memory "on demand" (that is exactly when they are first
> needed on the GPU, and not before) and once it has copied to the GPU it
> keeps track of it and will NOT copy it down again if it is already there.
> >
> >    Hence in your run below, yes it includes the copy time down.
> >
> >    But note that ONE multiply on the GPU is absurd, it does not make
> sense to copy a matrix down to the GPU and then do ONE multiply with it.
> Thus I NEVER do "sandalone" benchmarking where a single kernel is called by
> it self once, the time results are useless. Always run a FULL application
> with -log_summary; for example in this case a full KSPSolve() that requires
> a bunch of iterations. Then you can look at the performance of each kernel.
> The reason to do it this way is that the numbers can be very different and
> what matters is runs in APPLICATIONS so that is what should be measured.
> >
> >    If say you run KSP with 20 iterations then the time to copy the matrix
> down to the GPU is amortized over those 20 iterations and thus maybe ok. You
> should see the flop rate for the MatMult() go up in this case.
> >
> >    You may have noticed we have a log entry for VecCopyToGPU() we will be
> adding one for matrices as well thus you will be able to see how long the
> copy time is but not that the copy time is still counted in the MatMult()
> time if the first copy of the matrix to GPU is triggered by the MatMult. You
> can subtract the copy time from the mult time to get the per multiply time,
> this would correspond to the multiply time in the limit of a single copy
> down and many, many multiplies on the GPU.
> >
> >    Barry
> >
> >
> >
> >
> > On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:
> >
> > > Hello again,
> > >
> > > I compiled one of te examples. I used sparse matix called 02-raefsky3.
> > > I used -vec_type cuda and -mat_type seqaijcuda.
> > >
> > > When I see summary of the operations performed by program there is
> > >
> > > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00  2100
> > > 0  0  0   2100  0  0  0   147
> > >
> > > Does time of performing MatMult includes memory transfer for loading
> > > matrix in GPU memory or just exact computation time?
> > >
> > > Thanks in advance.
> > > Kuba.
> > >
> >
>
>
>


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101213/4e839060/attachment.htm>


More information about the petsc-users mailing list