[petsc-users] multi GPU partitions have very different memory usage

Mark Adams mfadams at lbl.gov
Wed Jan 18 15:48:36 CST 2023


cusparse matrix triple product takes a lot of memory. We usually use
Kokkos, configured with TPL turned off.

If you have a complex problem different parts of the domain can coarsen at
different rates.
Jacobi instead of asm will save a fair amount od memory.
If you run with -ksp_view you will see operator/matrix complexity from
GAMG. These should be < 1.5,

Mark

On Wed, Jan 18, 2023 at 3:42 PM Mark Lohry <mlohry at gmail.com> wrote:

> With asm I see a range of 8GB-13GB, slightly smaller ratio but that
> probably explains it (does this still seem like a lot of memory to you for
> the problem size?)
>
> In general I don't have the same number of blocks per row, so I suppose it
> makes sense there's some memory imbalance.
>
>
>
> On Wed, Jan 18, 2023 at 3:35 PM Mark Adams <mfadams at lbl.gov> wrote:
>
>> Can your problem have load imbalance?
>>
>> You might try '-pc_type asm' (and/or jacobi) to see your baseline load
>> imbalance.
>> GAMG can add some load imbalance but start by getting a baseline.
>>
>> Mark
>>
>> On Wed, Jan 18, 2023 at 2:54 PM Mark Lohry <mlohry at gmail.com> wrote:
>>
>>> Q0) does -memory_view trace GPU memory as well, or is there another
>>> method to query the peak device memory allocation?
>>>
>>> Q1) I'm loading a aijcusparse matrix with MatLoad, and running with
>>> -ksp_type fgmres -pc_type gamg -mg_levels_pc_type asm with mat info
>>> 27,142,948 rows and cols, bs=4, total nonzeros 759,709,392. Using 8 ranks
>>> on 8x80GB GPUs, and during the setup phase before crashing with
>>> CUSPARSE_STATUS_INSUFFICIENT_RESOURCES nvidia-smi shows the below pasted
>>> content.
>>>
>>> GPU memory usage spanning from 36GB-50GB but with one rank at 77GB. Is
>>> this expected? Do I need to manually repartition this somehow?
>>>
>>> Thanks,
>>> Mark
>>>
>>>
>>>
>>> +-----------------------------------------------------------------------------+
>>>
>>> | Processes:
>>>                |
>>>
>>> |  GPU   GI   CI        PID   Type   Process name                  GPU
>>> Memory |
>>>
>>> |        ID   ID
>>> Usage      |
>>>
>>>
>>> |=============================================================================|
>>>
>>> |    0   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>> 27MiB |
>>>
>>> |    0   N/A  N/A   1696543      C   ./petsc_solver_test
>>> 38407MiB |
>>>
>>> |    0   N/A  N/A   1696544      C   ./petsc_solver_test
>>> 467MiB |
>>>
>>> |    0   N/A  N/A   1696545      C   ./petsc_solver_test
>>> 467MiB |
>>>
>>> |    0   N/A  N/A   1696546      C   ./petsc_solver_test
>>> 467MiB |
>>>
>>> |    0   N/A  N/A   1696548      C   ./petsc_solver_test
>>> 467MiB |
>>>
>>> |    0   N/A  N/A   1696550      C   ./petsc_solver_test
>>> 471MiB |
>>>
>>> |    0   N/A  N/A   1696551      C   ./petsc_solver_test
>>> 467MiB |
>>>
>>> |    0   N/A  N/A   1696552      C   ./petsc_solver_test
>>> 467MiB |
>>>
>>> |    1   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>> 27MiB |
>>>
>>> |    1   N/A  N/A   1696544      C   ./petsc_solver_test
>>> 35849MiB |
>>>
>>> |    2   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>> 27MiB |
>>>
>>> |    2   N/A  N/A   1696545      C   ./petsc_solver_test
>>> 36719MiB |
>>>
>>> |    3   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>> 27MiB |
>>>
>>> |    3   N/A  N/A   1696546      C   ./petsc_solver_test
>>> 37343MiB |
>>>
>>> |    4   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>> 27MiB |
>>>
>>> |    4   N/A  N/A   1696548      C   ./petsc_solver_test
>>> 36935MiB |
>>>
>>> |    5   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>> 27MiB |
>>>
>>> |    5   N/A  N/A   1696550      C   ./petsc_solver_test
>>> 49953MiB |
>>>
>>> |    6   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>> 27MiB |
>>>
>>> |    6   N/A  N/A   1696551      C   ./petsc_solver_test
>>> 47693MiB |
>>>
>>> |    7   N/A  N/A   1630309      C   nvidia-cuda-mps-server
>>> 27MiB |
>>>
>>> |    7   N/A  N/A   1696552      C   ./petsc_solver_test
>>> 77331MiB |
>>>
>>>
>>> +-----------------------------------------------------------------------------+
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230118/f00ebaed/attachment.html>


More information about the petsc-users mailing list