[petsc-users] MatMult

Mon Dec 13 21:01:37 CST 2010

  Runs ok for me.

  Barry

On Dec 13, 2010, at 2:20 AM, Jakub Pola wrote:

> Could you please check the file attached to this email. there is source
> code and log summary from execution of mat mult.
> 
> When I run the ex131 with parameters -vec_type cuda and -mat_type
> seqaijcuda 
> 
> mpiexec -n 1 ./ex131 -f ../matbinary.ex  -vec 0 -mat_type seqaijcuda
> -vec_type cuda -log_summary
> 
> it fails because of CUDA Error 4. see MatMultKO.log
> 
> 
> When I run the same program without -vec_type cuda parameter only with
> -mat_type seqaijcuda it run ok.
> mpiexec -n 1 ./ex131 -f ../matbinary.ex  -vec 0 -mat_type seqaijcuda
> -log_summary
> 
> MatMltOK.log
> 
> When I run without -math_type seqaijcuda only with -vec_type cuda it
> fails again because
> 
> terminate called after throwing an instance of
> 'thrust::system::system_error'
>  what():  invalid argument
> terminate called after throwing an instance of
> 'thrust::system::system_error'
>  what():  invalid argument
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 3755 on node desktop exited
> on signal 6 (Aborted).
> --------------------------------------------------------------------------
> 
> 
> Could you please give me some comments on that
> 
> Dnia 2010-12-13, pon o godzinie 07:37 +0000, Matthew Knepley pisze:
>> Yes, it should run on the GPU. Check an example, like ex19.
>> 
>> 
>>   Matt
>> 
>> On Mon, Dec 13, 2010 at 7:29 AM, Jakub Pola <jakub.pola at gmail.com>
>> wrote:
>>        Hi,
>> 
>>        Does MatMult function is performed on GPU? when I prepared
>>        program which
>>        just executes this function with parameters -vec_type cuda and
>>        -mat_type
>>        seqaijcuda i havent seen in summary log any VecCUDACopyTo
>>        entry
>> 
>> 
>>        Dnia 2010-12-11, sob o godzinie 11:50 -0600, Barry Smith
>>        pisze:
>> 
>> 
>>> To answer this you need to understand that PETSc copies
>>        vectors and matrices to the GPU memory "on demand" (that is
>>        exactly when they are first needed on the GPU, and not before)
>>        and once it has copied to the GPU it keeps track of it and
>>        will NOT copy it down again if it is already there.
>>> 
>>>   Hence in your run below, yes it includes the copy time
>>        down.
>>> 
>>>   But note that ONE multiply on the GPU is absurd, it does
>>        not make sense to copy a matrix down to the GPU and then do
>>        ONE multiply with it. Thus I NEVER do "sandalone" benchmarking
>>        where a single kernel is called by it self once, the time
>>        results are useless. Always run a FULL application with
>>        -log_summary; for example in this case a full KSPSolve() that
>>        requires a bunch of iterations. Then you can look at the
>>        performance of each kernel. The reason to do it this way is
>>        that the numbers can be very different and what matters is
>>        runs in APPLICATIONS so that is what should be measured.
>>> 
>>>   If say you run KSP with 20 iterations then the time to
>>        copy the matrix down to the GPU is amortized over those 20
>>        iterations and thus maybe ok. You should see the flop rate for
>>        the MatMult() go up in this case.
>>> 
>>>   You may have noticed we have a log entry for
>>        VecCopyToGPU() we will be adding one for matrices as well thus
>>        you will be able to see how long the copy time is but not that
>>        the copy time is still counted in the MatMult() time if the
>>        first copy of the matrix to GPU is triggered by the MatMult.
>>        You can subtract the copy time from the mult time to get the
>>        per multiply time, this would correspond to the multiply time
>>        in the limit of a single copy down and many, many multiplies
>>        on the GPU.
>>> 
>>>   Barry
>>> 
>>> 
>>> 
>>> 
>>> On Dec 11, 2010, at 8:32 AM, Jakub Pola wrote:
>>> 
>>>> Hello again,
>>>> 
>>>> I compiled one of te examples. I used sparse matix called
>>        02-raefsky3.
>>>> I used -vec_type cuda and -mat_type seqaijcuda.
>>>> 
>>>> When I see summary of the operations performed by program
>>        there is
>>>> 
>>>> MatMult 1 1.0 2.0237e-02 1.0 2.98e+06 1.0 0.0e+00 0.0e+00
>>        0.0e+00  2100
>>>> 0  0  0   2100  0  0  0   147
>>>> 
>>>> Does time of performing MatMult includes memory transfer
>>        for loading
>>>> matrix in GPU memory or just exact computation time?
>>>> 
>>>> Thanks in advance.
>>>> Kuba.
>>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which
>> their experiments lead.
>> -- Norbert Wiener
>> 
> 
> <tests.zip>