[petsc-users] multi GPU partitions have very different memory usage

Thu Jan 19 08:40:37 CST 2023

On Wed, Jan 18, 2023 at 6:03 PM Mark Lohry <mlohry at gmail.com> wrote:

> Thanks Mark, I'll try the kokkos bit. Any other suggestions for minimizing
> memory besides the obvious use less levels?
>
> Unfortunately Jacobi does poorly compared to ILU on these systems.
>
> I'm seeing grid complexity 1.48 and operator complexity 1.75 with
> pc_gamg_square_graph 0, and 1.15/1.25 with it at 1.
>

That looks good. Use 1.

> Additionally the convergence rate is pretty healthy with 5 gmres+asm
> smooths but very bad with 5 Richardson+asm.
>
>
Yea, it needs to be damped and GMRES does that automatically.

>
> On Wed, Jan 18, 2023, 4:48 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> cusparse matrix triple product takes a lot of memory. We usually use
>> Kokkos, configured with TPL turned off.
>>
>> If you have a complex problem different parts of the domain can coarsen
>> at different rates.
>> Jacobi instead of asm will save a fair amount od memory.
>> If you run with -ksp_view you will see operator/matrix complexity from
>> GAMG. These should be < 1.5,
>>
>> Mark
>>
>> On Wed, Jan 18, 2023 at 3:42 PM Mark Lohry <mlohry at gmail.com> wrote:
>>
>>> With asm I see a range of 8GB-13GB, slightly smaller ratio but that
>>> probably explains it (does this still seem like a lot of memory to you for
>>> the problem size?)
>>>
>>> In general I don't have the same number of blocks per row, so I suppose
>>> it makes sense there's some memory imbalance.
>>>
>>>
>>>
>>> On Wed, Jan 18, 2023 at 3:35 PM Mark Adams <mfadams at lbl.gov> wrote:
>>>
>>>> Can your problem have load imbalance?
>>>>
>>>> You might try '-pc_type asm' (and/or jacobi) to see your baseline load
>>>> imbalance.
>>>> GAMG can add some load imbalance but start by getting a baseline.
>>>>
>>>> Mark
>>>>
>>>> On Wed, Jan 18, 2023 at 2:54 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>>
>>>>> Q0) does -memory_view trace GPU memory as well, or is there another
>>>>> method to query the peak device memory allocation?
>>>>>
>>>>> Q1) I'm loading a aijcusparse matrix with MatLoad, and running with
>>>>> -ksp_type fgmres -pc_type gamg -mg_levels_pc_type asm with mat info
>>>>> 27,142,948 rows and cols, bs=4, total nonzeros 759,709,392. Using 8 ranks
>>>>> on 8x80GB GPUs, and during the setup phase before crashing with
>>>>> CUSPARSE_STATUS_INSUFFICIENT_RESOURCES nvidia-smi shows the below pasted
>>>>> content.
>>>>>
>>>>> GPU memory usage spanning from 36GB-50GB but with one rank at 77GB. Is
>>>>> this expected? Do I need to manually repartition this somehow?
>>>>>
>>>>> Thanks,
>>>>> Mark
>>>>>
>>>>>
>>>>>
>>>>> +-----------------------------------------------------------------------------+
>>>>>
>>>>> | Processes:
>>>>>                |
>>>>>
>>>>> |  GPU   GI   CI        PID   Type   Process name                  GPU
>>>>> Memory |
>>>>>
>>>>> |        ID   ID
>>>>> Usage      |
>>>>>
>>>>>
>>>>> |=============================================================================|
>>>>>
>>>>> |    0   N/A  N/A   1630309      C
>>>>> nvidia-cuda-mps-server             27MiB |
>>>>>
>>>>> |    0   N/A  N/A   1696543      C   ./petsc_solver_test
>>>>> 38407MiB |
>>>>>
>>>>> |    0   N/A  N/A   1696544      C   ./petsc_solver_test
>>>>> 467MiB |
>>>>>
>>>>> |    0   N/A  N/A   1696545      C   ./petsc_solver_test
>>>>> 467MiB |
>>>>>
>>>>> |    0   N/A  N/A   1696546      C   ./petsc_solver_test
>>>>> 467MiB |
>>>>>
>>>>> |    0   N/A  N/A   1696548      C   ./petsc_solver_test
>>>>> 467MiB |
>>>>>
>>>>> |    0   N/A  N/A   1696550      C   ./petsc_solver_test
>>>>> 471MiB |
>>>>>
>>>>> |    0   N/A  N/A   1696551      C   ./petsc_solver_test
>>>>> 467MiB |
>>>>>
>>>>> |    0   N/A  N/A   1696552      C   ./petsc_solver_test
>>>>> 467MiB |
>>>>>
>>>>> |    1   N/A  N/A   1630309      C
>>>>> nvidia-cuda-mps-server             27MiB |
>>>>>
>>>>> |    1   N/A  N/A   1696544      C   ./petsc_solver_test
>>>>> 35849MiB |
>>>>>
>>>>> |    2   N/A  N/A   1630309      C
>>>>> nvidia-cuda-mps-server             27MiB |
>>>>>
>>>>> |    2   N/A  N/A   1696545      C   ./petsc_solver_test
>>>>> 36719MiB |
>>>>>
>>>>> |    3   N/A  N/A   1630309      C
>>>>> nvidia-cuda-mps-server             27MiB |
>>>>>
>>>>> |    3   N/A  N/A   1696546      C   ./petsc_solver_test
>>>>> 37343MiB |
>>>>>
>>>>> |    4   N/A  N/A   1630309      C
>>>>> nvidia-cuda-mps-server             27MiB |
>>>>>
>>>>> |    4   N/A  N/A   1696548      C   ./petsc_solver_test
>>>>> 36935MiB |
>>>>>
>>>>> |    5   N/A  N/A   1630309      C
>>>>> nvidia-cuda-mps-server             27MiB |
>>>>>
>>>>> |    5   N/A  N/A   1696550      C   ./petsc_solver_test
>>>>> 49953MiB |
>>>>>
>>>>> |    6   N/A  N/A   1630309      C
>>>>> nvidia-cuda-mps-server             27MiB |
>>>>>
>>>>> |    6   N/A  N/A   1696551      C   ./petsc_solver_test
>>>>> 47693MiB |
>>>>>
>>>>> |    7   N/A  N/A   1630309      C
>>>>> nvidia-cuda-mps-server             27MiB |
>>>>>
>>>>> |    7   N/A  N/A   1696552      C   ./petsc_solver_test
>>>>> 77331MiB |
>>>>>
>>>>>
>>>>> +-----------------------------------------------------------------------------+
>>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230119/52d38ad1/attachment.html>