[petsc-dev] MatPinToCPU

Smith, Barry F. bsmith at mcs.anl.gov
Mon Jul 29 22:27:00 CDT 2019


  Thanks. Could you please send the 24 processors with the GPU? 

   Note the final column of the table gives you the percentage of flops (not rates, actual operations) on the GPU. For you biggest run it is

   For the MatMult it is 18 percent and for KSP solve it is 23 percent. I think this is much too low, we'd like to see well over 90 percent of the flops on the GPU; or 95 or more. Is this because you are forced to put very large matrices only the CPU? 

   For the MatMult if we assume the flop rate for the GPU is 25 times as fast as the CPU and 18 percent of the flops are done on the GPU then the ratio of time for the GPU should be 82.7 percent of the time for the CPU but  it is .90; so where is the extra time? Seems too much than just for the communication. 

   There is so much information and so much happening in the final stage that it is hard to discern what is killing the performance in the GPU case for the KSP solve. Anyway you can just have a stage at the end with several KSP solves and nothing else? 

   Barry


> On Jul 29, 2019, at 5:26 PM, Mark Adams <mfadams at lbl.gov> wrote:
> 
> 
> 
> On Mon, Jul 29, 2019 at 5:31 PM Smith, Barry F. <bsmith at mcs.anl.gov> wrote:
> 
>   I don't understand the notation in the legend on the second page
> 
> 12,288 cpus and no GPUs ?
> 
> Yes
>  
> 
> 24 GPUs?  or 6 GPUs
> 
> 24 virtual, 6 real GPUs per node. The first case is one node, 24 cores/vGPUs
>  
> 
> 192 GPUs?
> 
> 1536 GPUs?
> 
> 12,288 GPUs?  or 12288/4 = 3072  GPUs?
> 
> All "GPUs" are one core/process/vGPU. So 12288 virtual GPUs and 3072 physical GPUs.
> 
> Maybe I should add 'virtual GPUs' and put (4 processes/SUMMIT GPU)
>  
> 
> So on the largest run using GPUs or not takes pretty much exactly the same 
> amount of  time?
> 
> yes. The raw Mat-vec is about 3x faster with ~95K equations/process. I've attached the data.
>  
> 
> What about 6 GPUs vs 24 CPUs ? Same equal amount of time. 
> 
> Can you send some log summaries
> 
> <out_cpu_012288><out_cuda_000024><out_cuda_001536><out_cuda_000192><out_cuda_012288>



More information about the petsc-dev mailing list