[petsc-users] multi GPU partitions have very different memory usage

Mark Lohry mlohry at gmail.com
Wed Jan 18 17:03:01 CST 2023


Thanks Mark, I'll try the kokkos bit. Any other suggestions for minimizing
memory besides the obvious use less levels?

Unfortunately Jacobi does poorly compared to ILU on these systems.

I'm seeing grid complexity 1.48 and operator complexity 1.75 with
pc_gamg_square_graph 0, and 1.15/1.25 with it at 1. Additionally the
convergence rate is pretty healthy with 5 gmres+asm smooths but very bad
with 5 Richardson+asm.


On Wed, Jan 18, 2023, 4:48 PM Mark Adams <mfadams at lbl.gov> wrote:

> cusparse matrix triple product takes a lot of memory. We usually use
> Kokkos, configured with TPL turned off.
>
> If you have a complex problem different parts of the domain can coarsen at
> different rates.
> Jacobi instead of asm will save a fair amount od memory.
> If you run with -ksp_view you will see operator/matrix complexity from
> GAMG. These should be < 1.5,
>
> Mark
>
> On Wed, Jan 18, 2023 at 3:42 PM Mark Lohry <mlohry at gmail.com> wrote:
>
>> With asm I see a range of 8GB-13GB, slightly smaller ratio but that
>> probably explains it (does this still seem like a lot of memory to you for
>> the problem size?)
>>
>> In general I don't have the same number of blocks per row, so I suppose
>> it makes sense there's some memory imbalance.
>>
>>
>>
>> On Wed, Jan 18, 2023 at 3:35 PM Mark Adams <mfadams at lbl.gov> wrote:
>>
>>> Can your problem have load imbalance?
>>>
>>> You might try '-pc_type asm' (and/or jacobi) to see your baseline load
>>> imbalance.
>>> GAMG can add some load imbalance but start by getting a baseline.
>>>
>>> Mark
>>>
>>> On Wed, Jan 18, 2023 at 2:54 PM Mark Lohry <mlohry at gmail.com> wrote:
>>>
>>>> Q0) does -memory_view trace GPU memory as well, or is there another
>>>> method to query the peak device memory allocation?
>>>>
>>>> Q1) I'm loading a aijcusparse matrix with MatLoad, and running with
>>>> -ksp_type fgmres -pc_type gamg -mg_levels_pc_type asm with mat info
>>>> 27,142,948 rows and cols, bs=4, total nonzeros 759,709,392. Using 8 ranks
>>>> on 8x80GB GPUs, and during the setup phase before crashing with
>>>> CUSPARSE_STATUS_INSUFFICIENT_RESOURCES nvidia-smi shows the below pasted
>>>> content.
>>>>
>>>> GPU memory usage spanning from 36GB-50GB but with one rank at 77GB. Is
>>>> this expected? Do I need to manually repartition this somehow?
>>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>>
>>>>
>>>> +-----------------------------------------------------------------------------+
>>>>
>>>> | Processes:
>>>>                |
>>>>
>>>> |  GPU   GI   CI        PID   Type   Process name                  GPU
>>>> Memory |
>>>>
>>>> |        ID   ID
>>>> Usage      |
>>>>
>>>>
>>>> |=============================================================================|
>>>>
>>>> |    0   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>>> 27MiB |
>>>>
>>>> |    0   N/A  N/A   1696543      C   ./petsc_solver_test
>>>> 38407MiB |
>>>>
>>>> |    0   N/A  N/A   1696544      C   ./petsc_solver_test
>>>> 467MiB |
>>>>
>>>> |    0   N/A  N/A   1696545      C   ./petsc_solver_test
>>>> 467MiB |
>>>>
>>>> |    0   N/A  N/A   1696546      C   ./petsc_solver_test
>>>> 467MiB |
>>>>
>>>> |    0   N/A  N/A   1696548      C   ./petsc_solver_test
>>>> 467MiB |
>>>>
>>>> |    0   N/A  N/A   1696550      C   ./petsc_solver_test
>>>> 471MiB |
>>>>
>>>> |    0   N/A  N/A   1696551      C   ./petsc_solver_test
>>>> 467MiB |
>>>>
>>>> |    0   N/A  N/A   1696552      C   ./petsc_solver_test
>>>> 467MiB |
>>>>
>>>> |    1   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>>> 27MiB |
>>>>
>>>> |    1   N/A  N/A   1696544      C   ./petsc_solver_test
>>>> 35849MiB |
>>>>
>>>> |    2   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>>> 27MiB |
>>>>
>>>> |    2   N/A  N/A   1696545      C   ./petsc_solver_test
>>>> 36719MiB |
>>>>
>>>> |    3   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>>> 27MiB |
>>>>
>>>> |    3   N/A  N/A   1696546      C   ./petsc_solver_test
>>>> 37343MiB |
>>>>
>>>> |    4   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>>> 27MiB |
>>>>
>>>> |    4   N/A  N/A   1696548      C   ./petsc_solver_test
>>>> 36935MiB |
>>>>
>>>> |    5   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>>> 27MiB |
>>>>
>>>> |    5   N/A  N/A   1696550      C   ./petsc_solver_test
>>>> 49953MiB |
>>>>
>>>> |    6   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>>> 27MiB |
>>>>
>>>> |    6   N/A  N/A   1696551      C   ./petsc_solver_test
>>>> 47693MiB |
>>>>
>>>> |    7   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>>> 27MiB |
>>>>
>>>> |    7   N/A  N/A   1696552      C   ./petsc_solver_test
>>>> 77331MiB |
>>>>
>>>>
>>>> +-----------------------------------------------------------------------------+
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230118/955ae066/attachment-0001.html>


More information about the petsc-users mailing list