[petsc-users] Questions about setting values for GPU based matrices

Fredrik Heffer Valdmanis fredva at ifi.uio.no
Tue Nov 29 10:37:59 CST 2011


2011/11/29 Matthew Knepley <knepley at gmail.com>

> On Tue, Nov 29, 2011 at 2:38 AM, Fredrik Heffer Valdmanis <
> fredva at ifi.uio.no> wrote:
>
>> 2011/10/28 Matthew Knepley <knepley at gmail.com>
>>
>>> On Fri, Oct 28, 2011 at 10:24 AM, Fredrik Heffer Valdmanis <
>>> fredva at ifi.uio.no> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am working on integrating the new GPU based vectors and matrices into
>>>> FEniCS. Now, I'm looking at the possibility for getting some speedup during
>>>> finite element assembly, specifically when inserting the local element
>>>> matrix into the global element matrix. In that regard, I have a few
>>>> questions I hope you can help me out with:
>>>>
>>>> - When calling MatSetValues with a MATSEQAIJCUSP matrix as parameter,
>>>> what exactly is it that happens? As far as I can see, MatSetValues is not
>>>> implemented for GPU based matrices, neither is the mat->ops->setvalues set
>>>> to point at any function for this Mat type.
>>>>
>>>
>>> Yes, MatSetValues always operates on the CPU side. It would not make
>>> sense to do individual operations on the GPU.
>>>
>>> I have written batched of assembly for element matrices that are all the
>>> same size:
>>>
>>>
>>> http://www.mcs.anl.gov/petsc/petsc-as/snapshots/petsc-current/docs/manualpages/Mat/MatSetValuesBatch.html
>>>
>>>
>>>> - Is it such that matrices are assembled in their entirety on the CPU,
>>>> and then copied over to the GPU (after calling MatAssemblyBegin)? Or are
>>>> values copied over to the GPU each time you call MatSetValues?
>>>>
>>>
>>> That function assembles the matrix on the GPU and then copies to the
>>> CPU. The only time you do not want this copy is when
>>> you are running in serial and never touch the matrix afterwards, so I
>>> left it in.
>>>
>>>
>>>> - Can we expect to see any speedup from using MatSetValuesBatch over
>>>> MatSetValues, or is the batch version simply a utility function? This
>>>> question goes for both CPU- and GPU-based matrices.
>>>>
>>>
>>> CPU: no
>>>
>>> GPU: yes, I see about the memory bandwidth ratio
>>>
>>>
>>> Hi,
>>
>> I have now integrated MatSetValuesBatch in our existing PETSc wrapper
>> layer. I have tested matrix assembly with Poisson's equation on different
>> meshes with elements of varying order. I have timed the single call to
>> MatSetValuesBatch and compared that to the total time consumed by the
>> repeated calls to MatSetValues in the old implementation. I have the
>> following results:
>>
>> Poisson on 1000x1000 unit square, 1st order Lagrange elements:
>> MatSetValuesBatch: 0.88576 s
>> repeated calls to MatSetValues: 0.76654 s
>>
>> Poisson on 500x500 unit square, 2nd order Lagrange elements:
>> MatSetValuesBatch: 0.9324 s
>> repeated calls to MatSetValues: 0.81644 s
>>
>> Poisson on 300x300 unit square, 3rd order Lagrange elements:
>> MatSetValuesBatch: 0.93988 s
>> repeated calls to MatSetValues: 1.03884 s
>>
>> As you can see, the two methods take almost the same amount of time.
>> What behavior and performance should we expect? Is there any way to
>> optimize the performance of batched assembly?
>>
>
> Almost certainly it is not dispatching to the CUDA version. The regular
> version just calls MatSetValues() in a loop. Are you
> using a SEQAIJCUSP matrix?
>
Yes. The same matrices yields a speedup of 4-6x when solving the system on
the GPU.

>
>
>>  I also have a problem with Thrust throwing std::bad_alloc on some calls
>> to MatSetValuesBatch. The exception originates in thrust::device_ptr<void>
>> thrust::detail::device::cuda::malloc<0u>(unsigned long). It seems to be
>> thrown when the number of double values I send to MatSetValuesBatch
>> approaches 30 million. I am testing this on a laptop with 4 GB RAM and a
>> GeForce 540 M (1 GB memory), so the 30 million doubles are far away from
>> exhausting my memory, both on the host and device side. Any clues on what
>> causes this problem and how to avoid it?
>>
>
> It uses more memory that just the values. I would have to look at the
> specific case, but
> I assume that the memory is exhausted.
>
OK, I can look further into it myself as well. Thanks,

Fredrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20111129/53054b44/attachment.htm>


More information about the petsc-users mailing list