[petsc-users] (percent time in this phase)

Matthew Knepley knepley at gmail.com
Sun Oct 3 06:54:28 CDT 2021


On Sun, Oct 3, 2021 at 5:43 AM Karthikeyan Chockalingam - STFC UKRI <
karthikeyan.chockalingam at stfc.ac.uk> wrote:

> Hi Matt,
>
>
>
> Thank you. The flamegraph tool is helpful.
>
> Please find the attached screen shoot and foo.txt which generated that
> graph (using https://www.speedscope.app).
>
> I find the following call sequence from the graph
>
> KSPSolve -> PCApply -> KSPSolve -> PCApply -> MatSolve
>
>
>
> I have a couple of questions
>
>    1. The KSPSolve time listed in the file using -log_summary (or
>    -log_view), is it the time taken by the first KSPSolve (in the above call
>    sequence)?
>
> Yes. When calls are nested, we just do not record the time for the nested
call in log_view.


>
>    1.
>    2. What is the unit of measurement in flamegrah?
>
> I believe it is microseconds, but I am not sure.

  Thanks,

     Matt


>
>    1.
>
> Thanks,
>
> Karthik.
>
>
>
> *From: *Matthew Knepley <knepley at gmail.com>
> *Date: *Friday, 1 October 2021 at 14:51
> *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" <
> karthikeyan.chockalingam at stfc.ac.uk>
> *Cc: *Barry Smith <bsmith at petsc.dev>, "petsc-users at mcs.anl.gov" <
> petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] (percent time in this phase)
>
>
>
> On Thu, Sep 30, 2021 at 8:50 AM Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalingam at stfc.ac.uk> wrote:
>
> When comparing the MatSolve data for
>
>
>
> GPU
>
>
>
> MatSolve             341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 36 34  0  0  0  36 34  0  0  0  4536    4538      1 5.34e+02    0
> 0.00e+00 100
>
> MatCUSPARSSolAnl       2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  9  0  0  0  0   9  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0
>
>
>
> and CPU
>
>
>
> MatSolve             352 1.0 1.3553e+02 1.0 1.02e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 35 34  0  0  0  35 34  0  0  0  4489
>
>
>
> the time spent is almost the same for this preconditioner.  Look like
> MatCUSPARSSolAnl is called only *twice* (since I am running on two cores)
>
>
>
> mpirun -n 2 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z
> 511  -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type
> bjacobi -ksp_monitor
>
>
>
> So would it be fair to assume MatCUSPARSSolAnl  is *not *accounted for in
> MatSolve and it is an exclusive event?
>
> KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%)
> ~ 100 %
>
>
>
> I am getting so old. We have a different kind of log output if you are
> really concerned about inclusion. You can run with
>
>
>
>   -log_view :foo.txt:ascii_flamegraph
>
>
>
> and then there are tools for plotting that output, described here
>
>
>
>   https://firedrakeproject.org/optimising.html
>
>
>
> This output _guarantees_ strict inclusion, so you will not have the
> problems you have above adding things up.
>
>
>
>   Thanks,
>
>
>
>     Matt
>
>
>
> Best,
>
> Karthik.
>
>
>
>
>
> *From: *Matthew Knepley <knepley at gmail.com>
> *Date: *Wednesday, 29 September 2021 at 16:29
> *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" <
> karthikeyan.chockalingam at stfc.ac.uk>
> *Cc: *Barry Smith <bsmith at petsc.dev>, "petsc-users at mcs.anl.gov" <
> petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] %T (percent time in this phase)
>
>
>
> On Wed, Sep 29, 2021 at 10:18 AM Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalingam at stfc.ac.uk> wrote:
>
> Thank you!
>
>
>
> Just to summarize
>
>
>
> KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%)
> ~ 100 %
>
>
>
> You didn’t happen to mention how MatCUSPARSSolAnl is accounted for? Am I
> right in accounting for it as above?
>
>
>
> I am not sure.I thought it might be the GPU part of MatSolve(). I will
> have to look in the code. I am not as familiar with the GPU part.
>
>
>
> MatCUSPARSSolAnl       2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  9  0  0  0  0   9  0  0  0  0     0       0      0 0.00e+00    0
> 0.00e+00  0
>
>
>
> Finally, I believe the vector events, VecNorn, VecTDot, VecAXPY, and
> VecAYPX are mutually exclusive?
>
>
>
> Yes.
>
>
>
>   Thanks,
>
>
>
>      Matt
>
>
>
> Best,
>
>
>
> Karthik.
>
>
>
> *From: *Matthew Knepley <knepley at gmail.com>
> *Date: *Wednesday, 29 September 2021 at 11:58
> *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" <
> karthikeyan.chockalingam at stfc.ac.uk>
> *Cc: *Barry Smith <bsmith at petsc.dev>, "petsc-users at mcs.anl.gov" <
> petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] %T (percent time in this phase)
>
>
>
> On Wed, Sep 29, 2021 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalingam at stfc.ac.uk> wrote:
>
> Thank you Mathew. Now, it is all making sense to me.
>
>
>
> From data file ksp_ex45_N511_gpu_2.txt
>
>
>
> KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%).
>
>
>
> However, you said “So an iteration would mostly consist of MatMult +
> PCApply, with some vector work”
>
>
>
> 1) You do one solve, but 2 KSPSetUp()s. You must be running on more than
> one process and using Block-Jacobi . Half the time is spent in the solve
> (53%)
>
>
>
> KSPSetUp               2 1.0 5.3149e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01  0  0  0  0  1   0  0  0  0  1     0       0      0 0.00e+00    0 0.00e+00  0
>
> KSPSolve               1 1.0 1.5837e+02 1.1 8.63e+11 1.0 6.8e+02 2.1e+06 4.4e+03 53100100100 95  53100100100 96 10881   11730   1022 6.40e+03 1021 8.17e-03 100
>
>
>
> 2) The preconditioner look like BJacobi-ILU. The setup time is 16%, which
> is all setup of the individual blocks, and this is all used by the
> numerical ILU factorization.
>
>
>
> PCSetUp 2 1.0 4.9623e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0
> 0 16 0 0 0 0 58 0 2 6.93e+03 0 0.00e+00 0 PCSetUpOnBlocks 1 1.0 4.9274e+01
> 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 59 0 2
> 6.93e+03 0 0.00e+00 0
>
> MatLUFactorNum         1 1.0 4.6126e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14  0  0  0  0  14  0  0  0  0    63       0      2 6.93e+03    0 0.00e+00  0
>
> MatILUFactorSym        1 1.0 2.5110e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0       0      0 0.00e+00    0 0.00e+00  0
>
>
>
> 3) The preconditioner application takes 37% of the time, which is all
> solving the factors and recorded in MatSolve(). Matrix multiplication takes
> 4%.
>
>
>
> PCApply 341 1.0 1.3068e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 34
> 0 0 0 37 34 0 0 0 4516 4523 1 5.34e+02 0 0.00e+00 100
>
> MatSolve             341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34  0  0  0  36 34  0  0  0  4536    4538      1 5.34e+02    0 0.00e+00 100
>
> MatMult              341 1.0 1.0774e+01 1.1 2.96e+11 1.0 6.9e+02 2.1e+06 2.0e+00  4 34100100  0   4 34100100  0 54801   66441      2 5.86e+03    0 0.00e+00 100
>
>
>
> 4) The significant vector time is all in norms (11%) since they are really
> slow on the GPU.
>
>
>
> VecNorm              342 1.0 6.2261e+01129.9 4.57e+10 1.0 0.0e+00 0.0e+00 6.8e+02 11  5  0  0 15  11  5  0  0 15  1466   196884      0 0.00e+00  342 2.74e-03 100
>
> VecTDot              680 1.0 1.7107e+00 1.3 9.09e+10 1.0 0.0e+00 0.0e+00 1.4e+03  1 10  0  0 29   1 10  0  0 29 106079   133922      0 0.00e+00  680 5.44e-03 100
>
> VecAXPY              681 1.0 3.2036e+00 1.7 9.10e+10 1.0 0.0e+00 0.0e+00 0.0e+00  1 11  0  0  0   1 11  0  0  0 56728   58367    682 5.34e+02    0 0.00e+00 100
>
> VecAYPX              339 1.0 2.6502e+00 1.8 4.53e+10 1.0 0.0e+00 0.0e+00 0.0e+00  1  5  0  0  0   1  5  0  0  0 34136   34153    339 2.71e-03    0 0.00e+00 100
>
>
>
> So the solve time is:
>
>
>
>   53% ~ 37% + 4% + 11%
>
>
>
> and the setup time is about 16%. I was wrong about the SetUp time being
> included, as it is outside the event:
>
>
>
>
> https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852
>
>
>
> It looks like the remainder of the time (23%) is spent preallocating the
> matrix.
>
>
>
>   Thanks,
>
>
>
>      Matt
>
>
>
> The MalMult event is 4 %. How does this event figure into the above
> equation; if preconditioning (MatMult + PCApply) is included in KSPSolve?
>
>
>
> Best,
>
> Karthik.
>
>
>
> *From: *Matthew Knepley <knepley at gmail.com>
> *Date: *Wednesday, 29 September 2021 at 10:58
> *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" <
> karthikeyan.chockalingam at stfc.ac.uk>
> *Cc: *Barry Smith <bsmith at petsc.dev>, "petsc-users at mcs.anl.gov" <
> petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] %T (percent time in this phase)
>
>
>
> On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalingam at stfc.ac.uk> wrote:
>
> That was helpful. I would like to provide some additional details of my
> run on cpus and gpus. Please find the following attachments:
>
>
>
>    1. graph.pdf a plot showing overall time and various petsc events.
>    2. ksp_ex45_N511_cpu_6.txt data file of the log_summary
>    3. ksp_ex45_N511_gpu_2.txt data file of the log_summary
>
>
>
> I used the following petsc options for cpu
>
>
>
> mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z
> 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi
> -ksp_monitor
>
>
>
> and for gpus
>
>
>
> mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z
> 511  -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type
> bjacobi -ksp_monitor
>
>
>
> to run the following problem
>
>
>
> https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html
>
>
>
> From the above code, I see is there no individual function called KSPSetUp(),
> so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS,
> kSPSetComputeOperators all are timed together as KSPSetUp. For this
> example, is KSPSetUp time and KSPSolve time mutually exclusive?
>
>
>
> No, KSPSetUp() will be contained in KSPSolve() if it is called
> automatically.
>
>
>
> In your response you said that
>
>
>
>    “PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it
> depends on how much of the preconditioner construction can take place
> early, so depends exactly on the preconditioner used.”
>
>
>
> I don’t see a explicit call to PCSetUp() or  PCApply() in ex45; so for
> this particular preconditioner (bjacobi) how can I tell how they are timed?
>
>
>
> They are all inside KSPSolve(). If you have a preconditioned linear solve,
> the oreconditioning happens during the iteration. So an iteration would
> mostly
>
> consist of MatMult + PCApply, with some vector work.
>
>
>
> I am hoping to time KSP solving and preconditioning mutually exclusively.
>
>
>
> I am not sure that concept makes sense here. See above.
>
>
>
>   Thanks,
>
>
>
>      Matt
>
>
>
>
>
> Kind regards,
>
> Karthik.
>
>
>
>
>
> *From: *Barry Smith <bsmith at petsc.dev>
> *Date: *Tuesday, 28 September 2021 at 19:19
> *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" <
> karthikeyan.chockalingam at stfc.ac.uk>
> *Cc: *"petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] %T (percent time in this phase)
>
>
>
>
>
>
>
> On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalingam at stfc.ac.uk> wrote:
>
>
>
> Thanks for Barry for your response.
>
>
>
> I was just benchmarking the problem with various preconditioner on cpu and
> gpu. I understand, it is not possible to get mutually exclusive timing.
>
> However, can you tell if KSPSolve time includes both PCSetup and PCApply?
> And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp
> and PCApply.
>
>
>
>    If you do not call KSPSetUp() separately from KSPSolve() then its time
> is included with KSPSolve().
>
>
>
>    PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends
> on how much of the preconditioner construction can take place early, so
> depends exactly on the preconditioner used.
>
>
>
>    So yes the answer is not totally satisfying. The one thing I would
> recommend is to not call KSPSetUp() directly and then KSPSolve() will
> always include the total time of the solve plus all setup time. PCApply
> will contain all the time to apply the preconditioner but may also include
> some setup time.
>
>
>
>   Barry
>
>
>
>
>
> Best,
>
> Karthik.
>
>
>
>
>
>
>
>
>
> *From: *Barry Smith <bsmith at petsc.dev>
> *Date: *Tuesday, 28 September 2021 at 16:56
> *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" <
> karthikeyan.chockalingam at stfc.ac.uk>
> *Cc: *"petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] %T (percent time in this phase)
>
>
>
>
>
>
>
> On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalingam at stfc.ac.uk> wrote:
>
>
>
> Hello,
>
>
>
> I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson
> problem. I noticed from the output from using the flag -log_summary that
> for various events their respective %T (percent time in this phase) do not
> add up to 100 but rather exceeds 100. So, I gather there is some overlap
> among these events. I am primarily looking at the events KSPSetUp,
> KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive
> %T or Time for these individual events? I have attached  the log_summary
> output file from my run for your reference.
>
>
>
>
>
>   For nested solvers it is tricky to get the times to be mutually
> exclusive because some parts of the building of the preconditioner is for
> some preconditioners delayed until the solve has started.
>
>
>
>   It looks like you are using the default preconditioner options which for
> this example are taking more or less no time since so many iterations are
> needed. It is best to use -pc_type mg to use geometric multigrid on this
> problem.
>
>
>
>   Barry
>
>
>
>
>
>
>
> Thanks!
>
> Karthik.
>
>
>
> This email and any attachments are intended solely for the use of the
> named recipients. If you are not the intended recipient you must not use,
> disclose, copy or distribute this email or any of its attachments and
> should notify the sender immediately and delete this email from your
> system. UK Research and Innovation (UKRI) has taken every reasonable
> precaution to minimise risk of this email or any attachments containing
> viruses or malware but the recipient should carry out its own virus and
> malware checks before opening the attachments. UKRI does not accept any
> liability for any losses or damages which the recipient may sustain due to
> presence of any viruses.
>
> <ksp_ex45_N511_cpu_6.txt>
>
>
>
>
>
>
> --
>
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
>
> --
>
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
>
> --
>
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
>
> --
>
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211003/0f84ce2a/attachment-0001.html>


More information about the petsc-users mailing list