[petsc-users] KSPSolve() get slower when preconditioner or Cholesky factor is re-used with many multiple RHS.
Matthew Knepley
knepley at gmail.com
Sun Nov 16 05:57:53 CST 2014
On Sat, Nov 15, 2014 at 9:24 PM, Evan Um <evanum at gmail.com> wrote:
> Dear PETSC users,
>
> I would like to show you a performance issue when Cholesky factor is
> re-used as a direct solver or pre-conditioner many times with many
> right-hand side vectors. Does anyone suggest a solution about this issue?
> In advance, thanks for your help.
>
> Regards,
> Evan
>
> Example 1: I used MUMPS as a direct solver and measured backward/forward
> substitution solution time for selected RHS vectors. A simplified test code
> is shown in code 1 and time measurements in result 1. As shown below,
> backward and forward solution time is nearly constant (e.g. about 2.2
> seconds) in early time, but later the solution time overall increases. In
> contrast, in late time, it takes about 12.8 seconds. It is hard to
> understand why the backward/forward solution time gets longer although the
> same numerical operation is carried out with each RHS vector (for i). For
> many RHS vectors, is there any parameter that I need to reset to stably
> take care of lots of RHS vectors?
>
> Example 2: In this case, I use the Cholesky factor as a preconditioner
> for CG solver. One iteration is performed. Its sample code and time
> measurments are shown in Code 2 and Result 2. Again, KSPSolve() gets slower
> as the preconditioner is re-used with many RHS vectors. For example, In
> early stage, it took about 4.6 seconds. Later, it took 16 seconds. Does
> anyone observe such a performance issue? Do you know any solution for this
> problem?
> In the two experiments, I expected Example 2 would show shorter solution
> time with each RHS vector than Example 1 because Example 2 uses scalable
> matrix-vector multiplication. Instead, when MUMPS is used in Example 1,
> MUMPS carries out back/forward substitution that are inherently
> un-scalable. However, my experiments consistently showed that the
> back/forward substitution of MUMPS is faster than a single PCG iteration
> with Cholesky preconditioner. Can anyone explain this? FYI, in both example
> 1 and 2, I used a very large test problem (about sparse SPD
> 4,000,000-by-4,000,000 matrix).
>
> Code 1:
> KSPCreate(PETSC_COMM_WORLD, &ksp_fetd_dt);
> KSPSetOperators(ksp_fetd_dt, A_dt, A_dt);
> KSPSetType (ksp_fetd_dt, KSPPREONLY);
> KSPGetPC(ksp_fetd_dt, &pc_fetd_dt);
> MatSetOption(A_dt, MAT_SPD, PETSC_TRUE);
> PCSetType(pc_fetd_dt, PCCHOLESKY);
> PCFactorSetMatSolverPackage(pc_fetd_dt, MATSOLVERMUMPS);
> PCFactorSetUpMatSolverPackage(pc_fetd_dt);
> PCFactorGetMatrix(pc_fetd_dt, &F_dt);
> for (int i=0; i<1000; i++) {
> // Create a new RHS vector B_dt
>
Are you calling VecCreate() each time here? If so, you could be filling up
the memory
of your computer as you run, causing it to slow down due to swapping.
For any performance question, ALWAYS send the output of -log_summary. For
your
question, also divide this loop into log stages for each iteration, so we
can see what
specific operations slow down.
Matt
> time1=my_timer_function();
> KSPSolve(ksp_fetd_dt,B_dt,solution);
> time2=my_timer_function ()
> // Output solution time=time2-time1;
> }
>
> Result1:
> nTS:12 Time(s):2.400e-04 Sol. time(s):2.246e+00 PCG niter:1
> nTS:13 Time(s):2.600e-04 Sol. time(s):2.329e+00 PCG niter:1
> nTS:14 Time(s):2.800e-04 Sol. time(s):2.289e+00 PCG niter:1
> nTS:15 Time(s):3.000e-04 Sol. time(s):2.239e+00 PCG niter:1
> .
> .
> nTS:267 Time(s):5.340e-03 Sol. time(s):2.152e+00 PCG niter:1
> nTS:280 Time(s):5.600e-03 Sol. time(s):2.255e+00 PCG niter:1
> nTS:293 Time(s):5.860e-03 Sol. time(s):1.087e+01 PCG niter:1
> nTS:307 Time(s):6.140e-03 Sol. time(s):1.225e+01 PCG niter:1
> nTS:321 Time(s):6.420e-03 Sol. time(s):1.280e+01 PCG niter:1
> nTS:337 Time(s):6.740e-03 Sol. time(s):1.243e+01 PCG niter:1
> nTS:353 Time(s):7.060e-03 Sol. time(s):6.953e+00 PCG niter:1
> nTS:369 Time(s):7.380e-03 Sol. time(s):8.712e+00 PCG niter:1
> nTS:387 Time(s):7.740e-03 Sol. time(s):9.048e+00 PCG niter:1
> nTS:405 Time(s):8.100e-03 Sol. time(s):1.099e+01 PCG niter:1
> nTS:424 Time(s):8.480e-03 Sol. time(s):1.171e+01 PCG niter:1
> nTS:445 Time(s):8.900e-03 Sol. time(s):1.294e+01 PCG niter:1
> nTS:466 Time(s):9.320e-03 Sol. time(s):1.227e+01 PCG niter:1
> nTS:488 Time(s):9.760e-03 Sol. time(s):8.581e+00 PCG niter:1
> nTS:511 Time(s):1.022e-02 Sol. time(s):1.059e+01 PCG niter:1
> nTS:535 Time(s):1.070e-02 Sol. time(s):8.452e+00 PCG niter:1
>
> Code 2:
> KSPCreate(PETSC_COMM_WORLD, &ksp_fetd_dt);
> KSPSetOperators(ksp_fetd_dt, A_dt, A_dt);
> KSPSetType (ksp_fetd_dt, KSPPREONLY);
> KSPGetPC(ksp_fetd_dt, &pc_fetd_dt);
> MatSetOption(A_dt, MAT_SPD, PETSC_TRUE);
> PCSetType(pc_fetd_dt, PCCHOLESKY);
> PCFactorSetMatSolverPackage(pc_fetd_dt, MATSOLVERMUMPS);
> PCFactorSetUpMatSolverPackage(pc_fetd_dt);
> PCFactorGetMatrix(pc_fetd_dt, &F_dt);
> KSPSetType(ksp_fetd_dt, KSPCG);
> KSPSetTolerances(ksp_fetd_dt, 1e-9, 1.0e-50, 1.0e10, ksp_iter);
> for (int i=0; i<1000; i++) {
> // Create a new RHS vector B_dt
> time1=my_timer_function();
> KSPSolve(ksp_fetd_dt,B_dt,solution);
> time2=my_timer_function ()
> // Output solution time=time2-time1;
> }
>
> Result 2:
> nTS:10 Time(s):2.000e-04 Sol. time(s):4.644e+00 PCG niter:1 Norm:2.937e-10
> nTS:11 Time(s):2.200e-04 Sol. time(s):4.585e+00 PCG niter:1 Norm:3.151e-10
> nTS:12 Time(s):2.400e-04 Sol. time(s):4.737e+00 PCG niter:1 Norm:2.719e-10
> nTS:13 Time(s):2.600e-04 Sol. time(s):4.537e+00 PCG niter:1 Norm:3.982e-10
> nTS:14 Time(s):2.800e-04 Sol. time(s):4.578e+00 PCG niter:1 Norm:3.221e-10
> nTS:15 Time(s):3.000e-04 Sol. time(s):4.678e+00 PCG niter:1 Norm:3.008e-10
> nTS:16 Time(s):3.200e-04 Sol. time(s):4.619e+00 PCG niter:1 Norm:3.005e-10
> nTS:46 Time(s):9.200e-04 Sol. time(s):6.862e+00 PCG niter:1 Norm:2.759e-10
> nTS:48 Time(s):9.600e-04 Sol. time(s):4.475e+00 PCG niter:1 Norm:3.071e-10
> nTS:50 Time(s):1.000e-03 Sol. time(s):6.068e+00 PCG niter:1 Norm:3.848e-10
> nTS:52 Time(s):1.040e-03 Sol. time(s):5.024e+00 PCG niter:1 Norm:3.639e-10
> nTS:55 Time(s):1.100e-03 Sol. time(s):1.333e+01 PCG niter:1 Norm:3.049e-10
> nTS:58 Time(s):1.160e-03 Sol. time(s):1.440e+01 PCG niter:1 Norm:3.467e-10
> nTS:60 Time(s):1.200e-03 Sol. time(s):1.475e+01 PCG niter:1 Norm:2.951e-10
> nTS:63 Time(s):1.260e-03 Sol. time(s):9.899e+00 PCG niter:1 Norm:3.018e-10
> nTS:66 Time(s):1.320e-03 Sol. time(s):1.097e+01 PCG niter:1 Norm:3.320e-10
> nTS:69 Time(s):1.380e-03 Sol. time(s):1.485e+01 PCG niter:1 Norm:2.811e-10
> nTS:73 Time(s):1.460e-03 Sol. time(s):1.199e+01 PCG niter:1 Norm:2.999e-10
> nTS:76 Time(s):1.520e-03 Sol. time(s):1.109e+01 PCG niter:1 Norm:3.359e-10
> nTS:80 Time(s):1.600e-03 Sol. time(s):1.473e+01 PCG niter:1 Norm:3.336e-10
> nTS:84 Time(s):1.680e-03 Sol. time(s):1.444e+01 PCG niter:1 Norm:3.181e-10
> nTS:88 Time(s):1.760e-03 Sol. time(s):1.639e+01 PCG niter:1 Norm:2.956e-10
> nTS:92 Time(s):1.840e-03 Sol. time(s):1.713e+01 PCG niter:1 Norm:3.552e-10
> nTS:96 Time(s):1.920e-03 Sol. time(s):1.562e+01 PCG niter:1 Norm:2.843e-10
> nTS:101 Time(s):2.020e-03 Sol. time(s):1.631e+01 PCG niter:1
> Norm:3.687e-10
> nTS:105 Time(s):2.100e-03 Sol. time(s):1.496e+01 PCG niter:1
> Norm:3.567e-10
> nTS:111 Time(s):2.220e-03 Sol. time(s):1.607e+01 PCG niter:1
> Norm:3.392e-10
>
>
>
>
>
>
>
>
>
>
>
>
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20141116/0eaed37b/attachment.html>
More information about the petsc-users
mailing list