[petsc-dev] PETSc GPU example

Mon Dec 6 21:03:20 CST 2021

* snes/ex56 runs a convergence study and confusingly sets the options
manually, thus erasing your -ex56_dm_refine.

* To refine, use -max_conv_its N <3>, this sets the number of steps of
refinement. That is, the length of the convergence study

* You can adjust where it starts from with -cells i,j,k <1,1,1>
You do want to set this if you have multiple MPI processes so that the size
of this mesh is the number of processes. That way it starts with one cell
per process and refines from there.

* GPU speedup is all about subdomain size. AMG has lots of kernel launches
and you need to overcome this before you get net gain.
Very rough numbers: I see a speedup of about 5-10x with a few million
equations per GPU.
As Matt said the assembly is on the CPU and ex56 gets really slow on larger
problems. Be prepared to run the largest case for close to an hour.
This setup is not measured in KSP[SNES]Solve in the -log_view output so
look at that.
When you do this convergence study there will be a new stage created for
each refinement, so one run will give you a range of problem size data.
Each refinement step increases the problem size by 8x so when the solve
times increase by ~8x then that tells you you are past the
latency dominated regime. You want to get into that to see gain.

* The end of the source file has example parameters that you should use
(the gamg one)

* src/snes/tests/ex13.c is designed to be a benchmark test and it
partitions the problem better in parallel and has modern Plex usage. If you
are doing large scale parallelism then you should use this.
(It is a little hard to understand. Not well documented.)

Hope that helps,
Mark

On Mon, Dec 6, 2021 at 9:05 PM Fande Kong <fdkong.jd at gmail.com> wrote:

>
>
> On Mon, Dec 6, 2021 at 5:59 PM Matthew Knepley <knepley at gmail.com> wrote:
>
>> On Mon, Dec 6, 2021 at 7:54 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>
>>> Thanks, Matt,
>>>
>>> Sorry, I still have more questions on this example. How to refine mesh
>>> to make the problem larger?
>>>
>>> I tried the following options, and none of them worked. I might do
>>> something wrong.
>>>
>>> -ex56_dm_refine 9
>>>
>>> and
>>>
>>> -dm_refine 4
>>>
>>
>> The mesh handling in this example does not conform to the others, but it
>> appears that
>>
>>   -ex56_dm_refine <k>
>>
>> should take effect at
>>
>>
>> https://gitlab.com/petsc/petsc/-/blob/main/src/snes/tutorials/ex56.c#L381
>>
>>
> I was puzzled about this because DMSetFromOptions does not seem to trigger
> -ex56_dm_refine.
>
> I did a search, and could not find where we call " -ex56_dm_refine" in
> PETSc.
>
> I got the same result by running the following two combinations:
>
> 1) ./ex56  -log_view  -snes_view  -max_conv_its 3 -ex56_dm_refine 10
>
> 2) ./ex56  -log_view  -snes_view  -max_conv_its 3 -ex56_dm_refine 0
>
> Thanks,
>
> Fande
>
>
> unless you are setting max_conv_its to 0 somehow.
>>
>>   Thanks,
>>
>>      Matt
>>
>>
>>> Thanks,
>>>
>>> Fande
>>>
>>> On Mon, Dec 6, 2021 at 5:04 PM Matthew Knepley <knepley at gmail.com>
>>> wrote:
>>>
>>>> On Mon, Dec 6, 2021 at 7:02 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>>
>>>>> Thanks, Matt
>>>>>
>>>>> On Mon, Dec 6, 2021 at 4:47 PM Matthew Knepley <knepley at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On Mon, Dec 6, 2021 at 6:40 PM Fande Kong <fdkong.jd at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Dear PETSc team,
>>>>>>>
>>>>>>> I am interested in a careful evaluation of PETSc GPU performance in
>>>>>>> our INL cluster.
>>>>>>>
>>>>>>> Any example in PETSc that can show GPU speedup with solving a
>>>>>>> nonlinear equation?
>>>>>>>
>>>>>>> I talked to Junchao; he suggested that I try SNES/tutorial/ex56. I
>>>>>>> tried that, but I could not find any speedup using the GPU. I could attach
>>>>>>> some results of "log_view" later if we would like to see that.
>>>>>>>
>>>>>>
>>>>>> We should note that you will only see speedup in the solver, so that
>>>>>> problem has to be pretty large. I believe Mark has good results with it.
>>>>>> The assembly is still all on the CPU. I am working on this over
>>>>>> break, and hope to have a CEED version of it by the new year.
>>>>>>
>>>>>
>>>>> Are both function and matrix assmelies on CPU? Or just the matrix
>>>>> assembly?
>>>>>
>>>>
>>>> There is no GPU assembly right now.
>>>>
>>>>   Matt
>>>>
>>>>
>>>>> OK, I will try to check the solver part
>>>>>
>>>>> Thanks, again
>>>>>
>>>>> Fande
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>   Thanks,
>>>>>>
>>>>>>      Matt
>>>>>>
>>>>>>
>>>>>>> Appreciate any instructions/comments about running a simple PETSc
>>>>>>> GPU example to get a speedup.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Fande
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20211206/5c24e4af/attachment-0001.html>