[petsc-users] multi GPU partitions have very different memory usage
Mark Adams
mfadams at lbl.gov
Wed Jan 18 14:34:52 CST 2023
Can your problem have load imbalance?
You might try '-pc_type asm' (and/or jacobi) to see your baseline load
imbalance.
GAMG can add some load imbalance but start by getting a baseline.
Mark
On Wed, Jan 18, 2023 at 2:54 PM Mark Lohry <mlohry at gmail.com> wrote:
> Q0) does -memory_view trace GPU memory as well, or is there another method
> to query the peak device memory allocation?
>
> Q1) I'm loading a aijcusparse matrix with MatLoad, and running with
> -ksp_type fgmres -pc_type gamg -mg_levels_pc_type asm with mat info
> 27,142,948 rows and cols, bs=4, total nonzeros 759,709,392. Using 8 ranks
> on 8x80GB GPUs, and during the setup phase before crashing with
> CUSPARSE_STATUS_INSUFFICIENT_RESOURCES nvidia-smi shows the below pasted
> content.
>
> GPU memory usage spanning from 36GB-50GB but with one rank at 77GB. Is
> this expected? Do I need to manually repartition this somehow?
>
> Thanks,
> Mark
>
>
>
> +-----------------------------------------------------------------------------+
>
> | Processes:
> |
>
> | GPU GI CI PID Type Process name GPU
> Memory |
>
> | ID ID
> Usage |
>
>
> |=============================================================================|
>
> | 0 N/A N/A 1630309 C nvidia-cuda-mps-server
> 27MiB |
>
> | 0 N/A N/A 1696543 C ./petsc_solver_test
> 38407MiB |
>
> | 0 N/A N/A 1696544 C ./petsc_solver_test
> 467MiB |
>
> | 0 N/A N/A 1696545 C ./petsc_solver_test
> 467MiB |
>
> | 0 N/A N/A 1696546 C ./petsc_solver_test
> 467MiB |
>
> | 0 N/A N/A 1696548 C ./petsc_solver_test
> 467MiB |
>
> | 0 N/A N/A 1696550 C ./petsc_solver_test
> 471MiB |
>
> | 0 N/A N/A 1696551 C ./petsc_solver_test
> 467MiB |
>
> | 0 N/A N/A 1696552 C ./petsc_solver_test
> 467MiB |
>
> | 1 N/A N/A 1630309 C nvidia-cuda-mps-server
> 27MiB |
>
> | 1 N/A N/A 1696544 C ./petsc_solver_test
> 35849MiB |
>
> | 2 N/A N/A 1630309 C nvidia-cuda-mps-server
> 27MiB |
>
> | 2 N/A N/A 1696545 C ./petsc_solver_test
> 36719MiB |
>
> | 3 N/A N/A 1630309 C nvidia-cuda-mps-server
> 27MiB |
>
> | 3 N/A N/A 1696546 C ./petsc_solver_test
> 37343MiB |
>
> | 4 N/A N/A 1630309 C nvidia-cuda-mps-server
> 27MiB |
>
> | 4 N/A N/A 1696548 C ./petsc_solver_test
> 36935MiB |
>
> | 5 N/A N/A 1630309 C nvidia-cuda-mps-server
> 27MiB |
>
> | 5 N/A N/A 1696550 C ./petsc_solver_test
> 49953MiB |
>
> | 6 N/A N/A 1630309 C nvidia-cuda-mps-server
> 27MiB |
>
> | 6 N/A N/A 1696551 C ./petsc_solver_test
> 47693MiB |
>
> | 7 N/A N/A 1630309 C nvidia-cuda-mps-server
> 27MiB |
>
> | 7 N/A N/A 1696552 C ./petsc_solver_test
> 77331MiB |
>
>
> +-----------------------------------------------------------------------------+
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230118/8e85d344/attachment.html>
More information about the petsc-users
mailing list