[petsc-users] MatMult
Barry Smith
bsmith at mcs.anl.gov
Mon Dec 13 21:01:37 CST 2010
Runs ok for me.
Barry
On Dec 13, 2010, at 2:20 AM, Jakub Pola wrote:
> Could you please check the file attached to this email. there is source
> code and log summary from execution of mat mult.
>
> When I run the ex131 with parameters -vec_type cuda and -mat_type
> seqaijcuda
>
> mpiexec -n 1 ./ex131 -f ../matbinary.ex -vec 0 -mat_type seqaijcuda
> -vec_type cuda -log_summary
>
> it fails because of CUDA Error 4. see MatMultKO.log
>
>
> When I run the same program without -vec_type cuda parameter only with
> -mat_type seqaijcuda it run ok.
> mpiexec -n 1 ./ex131 -f ../matbinary.ex -vec 0 -mat_type seqaijcuda
> -log_summary
>
> MatMltOK.log
>
> When I run without -math_type seqaijcuda only with -vec_type cuda it
> fails again because
>
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> what(): invalid argument
> terminate called after throwing an instance of
> 'thrust::system::system_error'
> what(): invalid argument
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 3755 on node desktop exited
> on signal 6 (Aborted).
> --------------------------------------------------------------------------
>
>
> Could you please give me some comments on that
>
> Dnia 2010-12-13, pon o godzinie 07:37 +0000, Matthew Knepley pisze:
>> Yes, it should run on the GPU. Check an example, like ex19.
>>
>>
>> Matt
>>
>> On Mon, Dec 13, 2010 at 7:29 AM, Jakub Pola <jakub.pola at gmail.com>
>> wrote:
>> Hi,
>>
>> Does MatMult function is performed on GPU? when I prepared
>> program which
>> just executes this function with parameters -vec_type cuda and
>> -mat_type
>> seqaijcuda i havent seen in summary log any VecCUDACopyTo
>> entry
>>
>>
>> Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith
>> pisze:
>>
>>
>>> To answer this you need to understand that PETSc copies
>> vectors and matrices to the GPU memory "on demand" (that is
>> exactly when they are first needed on the GPU, and not before)
>> and once it has copied to the GPU it keeps track of it and
>> will NOT copy it down again if it is already there.
>>>
>>> Hence in your run below, yes it includes the copy time
>> down.
>>>
>>> But note that ONE multiply on the GPU is absurd, it does
>> not make sense to copy a matrix down to the GPU and then do
>> ONE multiply with it. Thus I NEVER do "sandalone" benchmarking
>> where a single kernel is called by it self once, the time
>> results are useless. Always run a FULL application with
>> -log_summary; for example in this case a full KSPSolve() that
>> requires a bunch of iterations. Then you can look at the
>> performance of each kernel. The reason to do it this way is
>> that the numbers can be very different and what matters is
>> runs in APPLICATIONS so that is what should be measured.
>>>
>>> If say you run KSP with 20 iterations then the time to
>> copy the matrix down to the GPU is amortized over those 20
>> iterations and thus maybe ok. You should see the flop rate for
>> the MatMult() go up in this case.
>>>
>>> You may have noticed we have a log entry for
>> VecCopyToGPU() we will be adding one for matrices as well thus
>> you will be able to see how long the copy time is but not that
>> the copy time is still counted in the MatMult() time if the
>> first copy of the matrix to GPU is triggered by the MatMult.
>> You can subtract the copy time from the mult time to get the
>> per multiply time, this would correspond to the multiply time
>> in the limit of a single copy down and many, many multiplies
>> on the GPU.
>>>
>>> Barry
>>>
>>>
>>>
>>>
>>> On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:
>>>
>>>> Hello again,
>>>>
>>>> I compiled one of te examples. I used sparse matix called
>> 02-raefsky3.
>>>> I used -vec_type cuda and -mat_type seqaijcuda.
>>>>
>>>> When I see summary of the operations performed by program
>> there is
>>>>
>>>> MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00
>> 0.0e+00 2100
>>>> 0 0 0 2100 0 0 0 147
>>>>
>>>> Does time of performing MatMult includes memory transfer
>> for loading
>>>> matrix in GPU memory or just exact computation time?
>>>>
>>>> Thanks in advance.
>>>> Kuba.
>>>>
>>>
>>
>>
>>
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>>
>
> <tests.zip>
More information about the petsc-users
mailing list