[petsc-users] (percent time in this phase)
Matthew Knepley
knepley at gmail.com
Fri Oct 1 08:50:37 CDT 2021
On Thu, Sep 30, 2021 at 8:50 AM Karthikeyan Chockalingam - STFC UKRI <
karthikeyan.chockalingam at stfc.ac.uk> wrote:
> When comparing the MatSolve data for
>
>
>
> GPU
>
>
>
> MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0
> 0.00e+00 100
>
> MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0
> 0.00e+00 0
>
>
>
> and CPU
>
>
>
> MatSolve 352 1.0 1.3553e+02 1.0 1.02e+11 1.0 0.0e+00 0.0e+00
> 0.0e+00 35 34 0 0 0 35 34 0 0 0 4489
>
>
>
> the time spent is almost the same for this preconditioner. Look like
> MatCUSPARSSolAnl is called only *twice* (since I am running on two cores)
>
>
>
> mpirun -n 2 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z
> 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type
> bjacobi -ksp_monitor
>
>
>
> So would it be fair to assume MatCUSPARSSolAnl is *not *accounted for in
> MatSolve and it is an exclusive event?
>
> KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%)
> ~ 100 %
>
I am getting so old. We have a different kind of log output if you are
really concerned about inclusion. You can run with
-log_view :foo.txt:ascii_flamegraph
and then there are tools for plotting that output, described here
https://firedrakeproject.org/optimising.html
This output _guarantees_ strict inclusion, so you will not have the
problems you have above adding things up.
Thanks,
Matt
Best,
>
> Karthik.
>
>
>
>
>
> *From: *Matthew Knepley <knepley at gmail.com>
> *Date: *Wednesday, 29 September 2021 at 16:29
> *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" <
> karthikeyan.chockalingam at stfc.ac.uk>
> *Cc: *Barry Smith <bsmith at petsc.dev>, "petsc-users at mcs.anl.gov" <
> petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] %T (percent time in this phase)
>
>
>
> On Wed, Sep 29, 2021 at 10:18 AM Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalingam at stfc.ac.uk> wrote:
>
> Thank you!
>
>
>
> Just to summarize
>
>
>
> KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%)
> ~ 100 %
>
>
>
> You didn’t happen to mention how MatCUSPARSSolAnl is accounted for? Am I
> right in accounting for it as above?
>
>
>
> I am not sure.I thought it might be the GPU part of MatSolve(). I will
> have to look in the code. I am not as familiar with the GPU part.
>
>
>
> MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0
> 0.00e+00 0
>
>
>
> Finally, I believe the vector events, VecNorn, VecTDot, VecAXPY, and
> VecAYPX are mutually exclusive?
>
>
>
> Yes.
>
>
>
> Thanks,
>
>
>
> Matt
>
>
>
> Best,
>
>
>
> Karthik.
>
>
>
> *From: *Matthew Knepley <knepley at gmail.com>
> *Date: *Wednesday, 29 September 2021 at 11:58
> *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" <
> karthikeyan.chockalingam at stfc.ac.uk>
> *Cc: *Barry Smith <bsmith at petsc.dev>, "petsc-users at mcs.anl.gov" <
> petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] %T (percent time in this phase)
>
>
>
> On Wed, Sep 29, 2021 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalingam at stfc.ac.uk> wrote:
>
> Thank you Mathew. Now, it is all making sense to me.
>
>
>
> From data file ksp_ex45_N511_gpu_2.txt
>
>
>
> KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%).
>
>
>
> However, you said “So an iteration would mostly consist of MatMult +
> PCApply, with some vector work”
>
>
>
> 1) You do one solve, but 2 KSPSetUp()s. You must be running on more than
> one process and using Block-Jacobi . Half the time is spent in the solve
> (53%)
>
>
>
> KSPSetUp 2 1.0 5.3149e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 1 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0
>
> KSPSolve 1 1.0 1.5837e+02 1.1 8.63e+11 1.0 6.8e+02 2.1e+06 4.4e+03 53100100100 95 53100100100 96 10881 11730 1022 6.40e+03 1021 8.17e-03 100
>
>
>
> 2) The preconditioner look like BJacobi-ILU. The setup time is 16%, which
> is all setup of the individual blocks, and this is all used by the
> numerical ILU factorization.
>
>
>
> PCSetUp 2 1.0 4.9623e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0
> 0 16 0 0 0 0 58 0 2 6.93e+03 0 0.00e+00 0 PCSetUpOnBlocks 1 1.0 4.9274e+01
> 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 59 0 2
> 6.93e+03 0 0.00e+00 0
>
> MatLUFactorNum 1 1.0 4.6126e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 63 0 2 6.93e+03 0 0.00e+00 0
>
> MatILUFactorSym 1 1.0 2.5110e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0
>
>
>
> 3) The preconditioner application takes 37% of the time, which is all
> solving the factors and recorded in MatSolve(). Matrix multiplication takes
> 4%.
>
>
>
> PCApply 341 1.0 1.3068e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 34
> 0 0 0 37 34 0 0 0 4516 4523 1 5.34e+02 0 0.00e+00 100
>
> MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 0.00e+00 100
>
> MatMult 341 1.0 1.0774e+01 1.1 2.96e+11 1.0 6.9e+02 2.1e+06 2.0e+00 4 34100100 0 4 34100100 0 54801 66441 2 5.86e+03 0 0.00e+00 100
>
>
>
> 4) The significant vector time is all in norms (11%) since they are really
> slow on the GPU.
>
>
>
> VecNorm 342 1.0 6.2261e+01129.9 4.57e+10 1.0 0.0e+00 0.0e+00 6.8e+02 11 5 0 0 15 11 5 0 0 15 1466 196884 0 0.00e+00 342 2.74e-03 100
>
> VecTDot 680 1.0 1.7107e+00 1.3 9.09e+10 1.0 0.0e+00 0.0e+00 1.4e+03 1 10 0 0 29 1 10 0 0 29 106079 133922 0 0.00e+00 680 5.44e-03 100
>
> VecAXPY 681 1.0 3.2036e+00 1.7 9.10e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 56728 58367 682 5.34e+02 0 0.00e+00 100
>
> VecAYPX 339 1.0 2.6502e+00 1.8 4.53e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 34136 34153 339 2.71e-03 0 0.00e+00 100
>
>
>
> So the solve time is:
>
>
>
> 53% ~ 37% + 4% + 11%
>
>
>
> and the setup time is about 16%. I was wrong about the SetUp time being
> included, as it is outside the event:
>
>
>
>
> https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852
>
>
>
> It looks like the remainder of the time (23%) is spent preallocating the
> matrix.
>
>
>
> Thanks,
>
>
>
> Matt
>
>
>
> The MalMult event is 4 %. How does this event figure into the above
> equation; if preconditioning (MatMult + PCApply) is included in KSPSolve?
>
>
>
> Best,
>
> Karthik.
>
>
>
> *From: *Matthew Knepley <knepley at gmail.com>
> *Date: *Wednesday, 29 September 2021 at 10:58
> *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" <
> karthikeyan.chockalingam at stfc.ac.uk>
> *Cc: *Barry Smith <bsmith at petsc.dev>, "petsc-users at mcs.anl.gov" <
> petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] %T (percent time in this phase)
>
>
>
> On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalingam at stfc.ac.uk> wrote:
>
> That was helpful. I would like to provide some additional details of my
> run on cpus and gpus. Please find the following attachments:
>
>
>
> 1. graph.pdf a plot showing overall time and various petsc events.
> 2. ksp_ex45_N511_cpu_6.txt data file of the log_summary
> 3. ksp_ex45_N511_gpu_2.txt data file of the log_summary
>
>
>
> I used the following petsc options for cpu
>
>
>
> mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z
> 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi
> -ksp_monitor
>
>
>
> and for gpus
>
>
>
> mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z
> 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type
> bjacobi -ksp_monitor
>
>
>
> to run the following problem
>
>
>
> https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html
>
>
>
> From the above code, I see is there no individual function called KSPSetUp(),
> so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS,
> kSPSetComputeOperators all are timed together as KSPSetUp. For this
> example, is KSPSetUp time and KSPSolve time mutually exclusive?
>
>
>
> No, KSPSetUp() will be contained in KSPSolve() if it is called
> automatically.
>
>
>
> In your response you said that
>
>
>
> “PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it
> depends on how much of the preconditioner construction can take place
> early, so depends exactly on the preconditioner used.”
>
>
>
> I don’t see a explicit call to PCSetUp() or PCApply() in ex45; so for
> this particular preconditioner (bjacobi) how can I tell how they are timed?
>
>
>
> They are all inside KSPSolve(). If you have a preconditioned linear solve,
> the oreconditioning happens during the iteration. So an iteration would
> mostly
>
> consist of MatMult + PCApply, with some vector work.
>
>
>
> I am hoping to time KSP solving and preconditioning mutually exclusively.
>
>
>
> I am not sure that concept makes sense here. See above.
>
>
>
> Thanks,
>
>
>
> Matt
>
>
>
>
>
> Kind regards,
>
> Karthik.
>
>
>
>
>
> *From: *Barry Smith <bsmith at petsc.dev>
> *Date: *Tuesday, 28 September 2021 at 19:19
> *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" <
> karthikeyan.chockalingam at stfc.ac.uk>
> *Cc: *"petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] %T (percent time in this phase)
>
>
>
>
>
>
>
> On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalingam at stfc.ac.uk> wrote:
>
>
>
> Thanks for Barry for your response.
>
>
>
> I was just benchmarking the problem with various preconditioner on cpu and
> gpu. I understand, it is not possible to get mutually exclusive timing.
>
> However, can you tell if KSPSolve time includes both PCSetup and PCApply?
> And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp
> and PCApply.
>
>
>
> If you do not call KSPSetUp() separately from KSPSolve() then its time
> is included with KSPSolve().
>
>
>
> PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends
> on how much of the preconditioner construction can take place early, so
> depends exactly on the preconditioner used.
>
>
>
> So yes the answer is not totally satisfying. The one thing I would
> recommend is to not call KSPSetUp() directly and then KSPSolve() will
> always include the total time of the solve plus all setup time. PCApply
> will contain all the time to apply the preconditioner but may also include
> some setup time.
>
>
>
> Barry
>
>
>
>
>
> Best,
>
> Karthik.
>
>
>
>
>
>
>
>
>
> *From: *Barry Smith <bsmith at petsc.dev>
> *Date: *Tuesday, 28 September 2021 at 16:56
> *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" <
> karthikeyan.chockalingam at stfc.ac.uk>
> *Cc: *"petsc-users at mcs.anl.gov" <petsc-users at mcs.anl.gov>
> *Subject: *Re: [petsc-users] %T (percent time in this phase)
>
>
>
>
>
>
>
> On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalingam at stfc.ac.uk> wrote:
>
>
>
> Hello,
>
>
>
> I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson
> problem. I noticed from the output from using the flag -log_summary that
> for various events their respective %T (percent time in this phase) do not
> add up to 100 but rather exceeds 100. So, I gather there is some overlap
> among these events. I am primarily looking at the events KSPSetUp,
> KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive
> %T or Time for these individual events? I have attached the log_summary
> output file from my run for your reference.
>
>
>
>
>
> For nested solvers it is tricky to get the times to be mutually
> exclusive because some parts of the building of the preconditioner is for
> some preconditioners delayed until the solve has started.
>
>
>
> It looks like you are using the default preconditioner options which for
> this example are taking more or less no time since so many iterations are
> needed. It is best to use -pc_type mg to use geometric multigrid on this
> problem.
>
>
>
> Barry
>
>
>
>
>
>
>
> Thanks!
>
> Karthik.
>
>
>
> This email and any attachments are intended solely for the use of the
> named recipients. If you are not the intended recipient you must not use,
> disclose, copy or distribute this email or any of its attachments and
> should notify the sender immediately and delete this email from your
> system. UK Research and Innovation (UKRI) has taken every reasonable
> precaution to minimise risk of this email or any attachments containing
> viruses or malware but the recipient should carry out its own virus and
> malware checks before opening the attachments. UKRI does not accept any
> liability for any losses or damages which the recipient may sustain due to
> presence of any viruses.
>
> <ksp_ex45_N511_cpu_6.txt>
>
>
>
>
>
>
> --
>
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
>
> --
>
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>
>
>
> --
>
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20211001/3f3bb95c/attachment-0001.html>
More information about the petsc-users
mailing list