[petsc-users] multi GPU partitions have very different memory usage
    Mark Lohry 
    mlohry at gmail.com
       
    Wed Jan 18 13:53:54 CST 2023
    
    
  
Q0) does -memory_view trace GPU memory as well, or is there another method
to query the peak device memory allocation?
Q1) I'm loading a aijcusparse matrix with MatLoad, and running with
-ksp_type fgmres -pc_type gamg -mg_levels_pc_type asm with mat info
27,142,948 rows and cols, bs=4, total nonzeros 759,709,392. Using 8 ranks
on 8x80GB GPUs, and during the setup phase before crashing with
CUSPARSE_STATUS_INSUFFICIENT_RESOURCES nvidia-smi shows the below pasted
content.
GPU memory usage spanning from 36GB-50GB but with one rank at 77GB. Is this
expected? Do I need to manually repartition this somehow?
Thanks,
Mark
+-----------------------------------------------------------------------------+
| Processes:
               |
|  GPU   GI   CI        PID   Type   Process name                  GPU
Memory |
|        ID   ID
Usage      |
|=============================================================================|
|    0   N/A  N/A   1630309      C   nvidia-cuda-mps-server
27MiB |
|    0   N/A  N/A   1696543      C   ./petsc_solver_test
38407MiB |
|    0   N/A  N/A   1696544      C   ./petsc_solver_test
467MiB |
|    0   N/A  N/A   1696545      C   ./petsc_solver_test
467MiB |
|    0   N/A  N/A   1696546      C   ./petsc_solver_test
467MiB |
|    0   N/A  N/A   1696548      C   ./petsc_solver_test
467MiB |
|    0   N/A  N/A   1696550      C   ./petsc_solver_test
471MiB |
|    0   N/A  N/A   1696551      C   ./petsc_solver_test
467MiB |
|    0   N/A  N/A   1696552      C   ./petsc_solver_test
467MiB |
|    1   N/A  N/A   1630309      C   nvidia-cuda-mps-server
27MiB |
|    1   N/A  N/A   1696544      C   ./petsc_solver_test
35849MiB |
|    2   N/A  N/A   1630309      C   nvidia-cuda-mps-server
27MiB |
|    2   N/A  N/A   1696545      C   ./petsc_solver_test
36719MiB |
|    3   N/A  N/A   1630309      C   nvidia-cuda-mps-server
27MiB |
|    3   N/A  N/A   1696546      C   ./petsc_solver_test
37343MiB |
|    4   N/A  N/A   1630309      C   nvidia-cuda-mps-server
27MiB |
|    4   N/A  N/A   1696548      C   ./petsc_solver_test
36935MiB |
|    5   N/A  N/A   1630309      C   nvidia-cuda-mps-server
27MiB |
|    5   N/A  N/A   1696550      C   ./petsc_solver_test
49953MiB |
|    6   N/A  N/A   1630309      C   nvidia-cuda-mps-server
27MiB |
|    6   N/A  N/A   1696551      C   ./petsc_solver_test
47693MiB |
|    7   N/A  N/A   1630309      C   nvidia-cuda-mps-server
27MiB |
|    7   N/A  N/A   1696552      C   ./petsc_solver_test
77331MiB |
+-----------------------------------------------------------------------------+
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230118/4224af96/attachment.html>
    
    
More information about the petsc-users
mailing list