[petsc-dev] PETSc GPU example

Tue Dec 7 05:47:38 CST 2021

Also Fande,

If you are _not_ using NVIDIA with the MPS system, then you should run with
the default -cells 1,1,1, and use just one MPI process and one GPU.
This will be fine for evaluating the GPU.

If you want to use more than one MPI process per GPU (because you want to
use the CPUs in the rest of your app) then the MPS system is important (I
see 3x speedup) and I would use NVIDIA+MPS unless you talk to
someone/vendor knowledgeable about using more than one MPI/GPU.

Now if you can use NVIDIA+MPS it would be interesting to compare GPU solver
performance with single vs multiple MPI/GPU. It should be faster to use one
MPI/GPU (running the same problem of course), but it would be interesting
to quantify this.
If you want to do this then I can explain how to do it.

Thanks,
Mark

On Mon, Dec 6, 2021 at 10:03 PM Mark Adams <mfadams at lbl.gov> wrote:

> * snes/ex56 runs a convergence study and confusingly sets the options
> manually, thus erasing your -ex56_dm_refine.
>
> * To refine, use -max_conv_its N <3>, this sets the number of steps of
> refinement. That is, the length of the convergence study
>
> * You can adjust where it starts from with -cells i,j,k <1,1,1>
> You do want to set this if you have multiple MPI processes so that the
> size of this mesh is the number of processes. That way it starts with one
> cell per process and refines from there.
>
> * GPU speedup is all about subdomain size. AMG has lots of kernel launches
> and you need to overcome this before you get net gain.
> Very rough numbers: I see a speedup of about 5-10x with a few million
> equations per GPU.
> As Matt said the assembly is on the CPU and ex56 gets really slow on
> larger problems. Be prepared to run the largest case for close to an hour.
> This setup is not measured in KSP[SNES]Solve in the -log_view output so
> look at that.
> When you do this convergence study there will be a new stage created for
> each refinement, so one run will give you a range of problem size data.
> Each refinement step increases the problem size by 8x so when the solve
> times increase by ~8x then that tells you you are past the
> latency dominated regime. You want to get into that to see gain.
>
> * The end of the source file has example parameters that you should use
> (the gamg one)
>
> * src/snes/tests/ex13.c is designed to be a benchmark test and it
> partitions the problem better in parallel and has modern Plex usage. If you
> are doing large scale parallelism then you should use this.
> (It is a little hard to understand. Not well documented.)
>
> Hope that helps,
> Mark
>
> On Mon, Dec 6, 2021 at 9:05 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>
>>
>>
>> On Mon, Dec 6, 2021 at 5:59 PM Matthew Knepley <knepley at gmail.com> wrote:
>>
>>> On Mon, Dec 6, 2021 at 7:54 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>
>>>> Thanks, Matt,
>>>>
>>>> Sorry, I still have more questions on this example. How to refine mesh
>>>> to make the problem larger?
>>>>
>>>> I tried the following options, and none of them worked. I might do
>>>> something wrong.
>>>>
>>>> -ex56_dm_refine 9
>>>>
>>>> and
>>>>
>>>> -dm_refine 4
>>>>
>>>
>>> The mesh handling in this example does not conform to the others, but it
>>> appears that
>>>
>>>   -ex56_dm_refine <k>
>>>
>>> should take effect at
>>>
>>>
>>> https://gitlab.com/petsc/petsc/-/blob/main/src/snes/tutorials/ex56.c#L381
>>>
>>>
>> I was puzzled about this because DMSetFromOptions does not seem to
>> trigger -ex56_dm_refine.
>>
>> I did a search, and could not find where we call " -ex56_dm_refine" in
>> PETSc.
>>
>> I got the same result by running the following two combinations:
>>
>> 1) ./ex56  -log_view  -snes_view  -max_conv_its 3 -ex56_dm_refine 10
>>
>> 2) ./ex56  -log_view  -snes_view  -max_conv_its 3 -ex56_dm_refine 0
>>
>> Thanks,
>>
>> Fande
>>
>>
>> unless you are setting max_conv_its to 0 somehow.
>>>
>>>   Thanks,
>>>
>>>      Matt
>>>
>>>
>>>> Thanks,
>>>>
>>>> Fande
>>>>
>>>> On Mon, Dec 6, 2021 at 5:04 PM Matthew Knepley <knepley at gmail.com>
>>>> wrote:
>>>>
>>>>> On Mon, Dec 6, 2021 at 7:02 PM Fande Kong <fdkong.jd at gmail.com> wrote:
>>>>>
>>>>>> Thanks, Matt
>>>>>>
>>>>>> On Mon, Dec 6, 2021 at 4:47 PM Matthew Knepley <knepley at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Mon, Dec 6, 2021 at 6:40 PM Fande Kong <fdkong.jd at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Dear PETSc team,
>>>>>>>>
>>>>>>>> I am interested in a careful evaluation of PETSc GPU performance in
>>>>>>>> our INL cluster.
>>>>>>>>
>>>>>>>> Any example in PETSc that can show GPU speedup with solving a
>>>>>>>> nonlinear equation?
>>>>>>>>
>>>>>>>> I talked to Junchao; he suggested that I try SNES/tutorial/ex56. I
>>>>>>>> tried that, but I could not find any speedup using the GPU. I could attach
>>>>>>>> some results of "log_view" later if we would like to see that.
>>>>>>>>
>>>>>>>
>>>>>>> We should note that you will only see speedup in the solver, so that
>>>>>>> problem has to be pretty large. I believe Mark has good results with it.
>>>>>>> The assembly is still all on the CPU. I am working on this over
>>>>>>> break, and hope to have a CEED version of it by the new year.
>>>>>>>
>>>>>>
>>>>>> Are both function and matrix assmelies on CPU? Or just the matrix
>>>>>> assembly?
>>>>>>
>>>>>
>>>>> There is no GPU assembly right now.
>>>>>
>>>>>   Matt
>>>>>
>>>>>
>>>>>> OK, I will try to check the solver part
>>>>>>
>>>>>> Thanks, again
>>>>>>
>>>>>> Fande
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>   Thanks,
>>>>>>>
>>>>>>>      Matt
>>>>>>>
>>>>>>>
>>>>>>>> Appreciate any instructions/comments about running a simple PETSc
>>>>>>>> GPU example to get a speedup.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Fande
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> What most experimenters take for granted before they begin their
>>>>>>> experiments is infinitely more interesting than any results to which their
>>>>>>> experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>>
>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20211207/f61d839b/attachment.html>