[petsc-users] MatMult

Mon Dec 13 02:20:57 CST 2010

Could you please check the file attached to this email. there is source
code and log summary from execution of mat mult.

When I run the ex131 with parameters -vec_type cuda and -mat_type
seqaijcuda 

mpiexec -n 1 ./ex131 -f ../matbinary.ex  -vec 0 -mat_type seqaijcuda
-vec_type cuda -log_summary

it fails because of CUDA Error 4. see MatMultKO.log

When I run the same program without -vec_type cuda parameter only with
-mat_type seqaijcuda it run ok.
mpiexec -n 1 ./ex131 -f ../matbinary.ex  -vec 0 -mat_type seqaijcuda
-log_summary

MatMltOK.log

When I run without -math_type seqaijcuda only with -vec_type cuda it
fails again because

terminate called after throwing an instance of
'thrust::system::system_error'
  what():  invalid argument
terminate called after throwing an instance of
'thrust::system::system_error'
  what():  invalid argument
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 3755 on node desktop exited
on signal 6 (Aborted).
--------------------------------------------------------------------------

Could you please give me some comments on that

Dnia 2010-12-13, pon o godzinie 07:37 +0000, Matthew Knepley pisze:
> Yes, it should run on the GPU. Check an example, like ex19.
> 
> 
>    Matt
> 
> On Mon, Dec 13, 2010 at 7:29 AM, Jakub Pola <jakub.pola at gmail.com>
> wrote:
>         Hi,
>         
>         Does MatMult function is performed on GPU? when I prepared
>         program which
>         just executes this function with parameters -vec_type cuda and
>         -mat_type
>         seqaijcuda i havent seen in summary log any VecCUDACopyTo
>         entry
>         
>         
>         Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith
>         pisze:
>         
>         
>         > To answer this you need to understand that PETSc copies
>         vectors and matrices to the GPU memory "on demand" (that is
>         exactly when they are first needed on the GPU, and not before)
>         and once it has copied to the GPU it keeps track of it and
>         will NOT copy it down again if it is already there.
>         >
>         >    Hence in your run below, yes it includes the copy time
>         down.
>         >
>         >    But note that ONE multiply on the GPU is absurd, it does
>         not make sense to copy a matrix down to the GPU and then do
>         ONE multiply with it. Thus I NEVER do "sandalone" benchmarking
>         where a single kernel is called by it self once, the time
>         results are useless. Always run a FULL application with
>         -log_summary; for example in this case a full KSPSolve() that
>         requires a bunch of iterations. Then you can look at the
>         performance of each kernel. The reason to do it this way is
>         that the numbers can be very different and what matters is
>         runs in APPLICATIONS so that is what should be measured.
>         >
>         >    If say you run KSP with 20 iterations then the time to
>         copy the matrix down to the GPU is amortized over those 20
>         iterations and thus maybe ok. You should see the flop rate for
>         the MatMult() go up in this case.
>         >
>         >    You may have noticed we have a log entry for
>         VecCopyToGPU() we will be adding one for matrices as well thus
>         you will be able to see how long the copy time is but not that
>         the copy time is still counted in the MatMult() time if the
>         first copy of the matrix to GPU is triggered by the MatMult.
>         You can subtract the copy time from the mult time to get the
>         per multiply time, this would correspond to the multiply time
>         in the limit of a single copy down and many, many multiplies
>         on the GPU.
>         >
>         >    Barry
>         >
>         >
>         >
>         >
>         > On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:
>         >
>         > > Hello again,
>         > >
>         > > I compiled one of te examples. I used sparse matix called
>         02-raefsky3.
>         > > I used -vec_type cuda and -mat_type seqaijcuda.
>         > >
>         > > When I see summary of the operations performed by program
>         there is
>         > >
>         > > MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00
>         0.0e+00  2100
>         > > 0  0  0   2100  0  0  0   147
>         > >
>         > > Does time of performing MatMult includes memory transfer
>         for loading
>         > > matrix in GPU memory or just exact computation time?
>         > >
>         > > Thanks in advance.
>         > > Kuba.
>         > >
>         >
>         
>         
>         
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> -- Norbert Wiener
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: tests.zip
Type: application/zip
Size: 4031 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20101213/3d1963ed/attachment.zip>