[petsc-dev] Understanding -log_summary with GPUs

Abhyankar, Shrirang G shrirang.abhyankar at pnnl.gov
Wed Mar 11 10:17:46 CDT 2020


Thank you, Richard and Junchao. This is very helpful info. I’ll try to step through using debugger as you’ve suggested.

Shri
From: petsc-dev <petsc-dev-bounces at mcs.anl.gov> on behalf of PETSc Development <petsc-dev at mcs.anl.gov>
Reply-To: Richard Tran Mills <rtmills at anl.gov>
Date: Tuesday, March 10, 2020 at 11:34 PM
To: PETSc Development <petsc-dev at mcs.anl.gov>
Subject: Re: [petsc-dev] Understanding -log_summary with GPUs


Hi Shri,

Probably the best way to understand what is going on is to step through things using a debugger, as Junchao suggests. VecAXPY does get used in a lot of places, and maybe it is being called on some vectors that aren't getting their type from the options database? Also, there are several places where a vector gets "bound" to execute operations on the CPU instead of the GPU (see VecBindToCPU()) either because we know that the vector isn't going to be needed on the CPU for subsequent operations, or because the size of the vector is too small for it to make sense to do on the GPU because of kernel launch latency. When a vector is bound to the CPU, operations with it will be counted in the CPU MFlops column.

It looks like you are actually getting decent GPU usage for your vector operations. While VecAXPY is showing only 80% of operations on the GPU, it's also accounting for less than one percent of the total flops. I see 100% GPU flops for the VecMAXPY that accounts for 13% of your flops.

Best regards,
Richard
On 3/10/20 3:44 PM, Junchao Zhang via petsc-dev wrote:
Hi, Shri,
  I don't understand either. But there are many invocations of VecAXPY etc. Is it possible some are done on CPU? Attach a debugger and set a breakpoint on VecAXPY_SeqCUDA to see if it gets a hit. If yes, then see why.

--Junchao Zhang


On Tue, Mar 10, 2020 at 2:44 PM Abhyankar, Shrirang G via petsc-dev <petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>> wrote:
Hello all,
  I need help in understanding the output from -log_summary for the GPU related columns. I am currently simply setting -vec_type seqcuda which I believe performs the vector operations on the GPU. With -vec_type seqcuda, I presumed all vector operations are being done on the GPU. So, only the GPU MFlops will be logged, and CPU MFlops will be zero. But, -log_summary reports Mflops for both CPU and GPU. I do not understand why Mflops are shown both for CPU and GPU?

What is the meaning of the last column – percent flops on the GPU? For instance, some operations such as VecDot show 100 %F, while others like VecAXPY have less. What is the meaning of this?

Any other general comments on these numbers?

Let me know if you need more information.

Thanks,
Shri

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200311/af03da55/attachment.html>


More information about the petsc-dev mailing list