[petsc-dev] Proper matrix size to choose when evaluating MatMult?
Junchao Zhang
jczhang at mcs.anl.gov
Sat Feb 22 07:49:06 CST 2020
On Fri, Feb 21, 2020 at 6:41 PM Matthew Knepley <knepley at gmail.com> wrote:
> I think Karl goes into these issues here:
> https://arxiv.org/pdf/1410.4054.pdf
>
Wonderful, thanks.
>
> Thanks,
>
> Matt
>
> On Fri, Feb 21, 2020 at 5:58 PM Junchao Zhang via petsc-dev <
> petsc-dev at mcs.anl.gov> wrote:
>
>>
>>
>> On Fri, Feb 21, 2020 at 4:38 PM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>>
>>>
>>> On Fri, Feb 21, 2020 at 4:51 PM Junchao Zhang via petsc-dev <
>>> petsc-dev at mcs.anl.gov> wrote:
>>>
>>>> Hello,
>>>>
>>>> I want to evaluate MatMult on GPU. I took a 2M x 2M matrix and ran
>>>> with 6 mpi ranks and 6 GPUs. It took about 0.9 seconds. A kernel launch
>>>> or a stream synchronization took about 10us.
>>>>
>>>
>>> Your call, but you should run the code once and then run it in a new
>>> timer. I've seen some big "warm up costs" GPUs today.
>>>
>>
>> Yes, I usually run hundreds iterations and skip the first few. Thanks.
>>
>>>
>>>
>>>> Compared with MatMult, they are tiny. Does it mean we can ignore them?
>>>> What is a proper size to evaluate MatMult?
>>>>
>>>
>>> It depends on the purpose/audience for the study. There is no right size
>>> other than being much larger than the launch cost, perhaps.
>>>
>>>
>>>> I heard it is a few thousand rows per MPI rank. Why?
>>>> Thanks.
>>>> --Junchao Zhang
>>>>
>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200222/90f8fbc6/attachment-0001.html>
More information about the petsc-dev
mailing list