[petsc-users] KSPSolve() get slower when preconditioner or Cholesky factor is re-used with many multiple RHS.
Evan Um
evanum at gmail.com
Sat Nov 15 21:24:31 CST 2014
Dear PETSC users,
I would like to show you a performance issue when Cholesky factor is
re-used as a direct solver or pre-conditioner many times with many
right-hand side vectors. Does anyone suggest a solution about this issue?
In advance, thanks for your help.
Regards,
Evan
Example 1: I used MUMPS as a direct solver and measured backward/forward
substitution solution time for selected RHS vectors. A simplified test code
is shown in code 1 and time measurements in result 1. As shown below,
backward and forward solution time is nearly constant (e.g. about 2.2
seconds) in early time, but later the solution time overall increases. In
contrast, in late time, it takes about 12.8 seconds. It is hard to
understand why the backward/forward solution time gets longer although the
same numerical operation is carried out with each RHS vector (for i). For
many RHS vectors, is there any parameter that I need to reset to stably
take care of lots of RHS vectors?
Example 2: In this case, I use the Cholesky factor as a preconditioner
for CG solver. One iteration is performed. Its sample code and time
measurments are shown in Code 2 and Result 2. Again, KSPSolve() gets slower
as the preconditioner is re-used with many RHS vectors. For example, In
early stage, it took about 4.6 seconds. Later, it took 16 seconds. Does
anyone observe such a performance issue? Do you know any solution for this
problem?
In the two experiments, I expected Example 2 would show shorter solution
time with each RHS vector than Example 1 because Example 2 uses scalable
matrix-vector multiplication. Instead, when MUMPS is used in Example 1,
MUMPS carries out back/forward substitution that are inherently
un-scalable. However, my experiments consistently showed that the
back/forward substitution of MUMPS is faster than a single PCG iteration
with Cholesky preconditioner. Can anyone explain this? FYI, in both example
1 and 2, I used a very large test problem (about sparse SPD
4,000,000-by-4,000,000 matrix).
Code 1:
KSPCreate(PETSC_COMM_WORLD, &ksp_fetd_dt);
KSPSetOperators(ksp_fetd_dt, A_dt, A_dt);
KSPSetType (ksp_fetd_dt, KSPPREONLY);
KSPGetPC(ksp_fetd_dt, &pc_fetd_dt);
MatSetOption(A_dt, MAT_SPD, PETSC_TRUE);
PCSetType(pc_fetd_dt, PCCHOLESKY);
PCFactorSetMatSolverPackage(pc_fetd_dt, MATSOLVERMUMPS);
PCFactorSetUpMatSolverPackage(pc_fetd_dt);
PCFactorGetMatrix(pc_fetd_dt, &F_dt);
for (int i=0; i<1000; i++) {
// Create a new RHS vector B_dt
time1=my_timer_function();
KSPSolve(ksp_fetd_dt,B_dt,solution);
time2=my_timer_function ()
// Output solution time=time2-time1;
}
Result1:
nTS:12 Time(s):2.400e-04 Sol. time(s):2.246e+00 PCG niter:1
nTS:13 Time(s):2.600e-04 Sol. time(s):2.329e+00 PCG niter:1
nTS:14 Time(s):2.800e-04 Sol. time(s):2.289e+00 PCG niter:1
nTS:15 Time(s):3.000e-04 Sol. time(s):2.239e+00 PCG niter:1
.
.
nTS:267 Time(s):5.340e-03 Sol. time(s):2.152e+00 PCG niter:1
nTS:280 Time(s):5.600e-03 Sol. time(s):2.255e+00 PCG niter:1
nTS:293 Time(s):5.860e-03 Sol. time(s):1.087e+01 PCG niter:1
nTS:307 Time(s):6.140e-03 Sol. time(s):1.225e+01 PCG niter:1
nTS:321 Time(s):6.420e-03 Sol. time(s):1.280e+01 PCG niter:1
nTS:337 Time(s):6.740e-03 Sol. time(s):1.243e+01 PCG niter:1
nTS:353 Time(s):7.060e-03 Sol. time(s):6.953e+00 PCG niter:1
nTS:369 Time(s):7.380e-03 Sol. time(s):8.712e+00 PCG niter:1
nTS:387 Time(s):7.740e-03 Sol. time(s):9.048e+00 PCG niter:1
nTS:405 Time(s):8.100e-03 Sol. time(s):1.099e+01 PCG niter:1
nTS:424 Time(s):8.480e-03 Sol. time(s):1.171e+01 PCG niter:1
nTS:445 Time(s):8.900e-03 Sol. time(s):1.294e+01 PCG niter:1
nTS:466 Time(s):9.320e-03 Sol. time(s):1.227e+01 PCG niter:1
nTS:488 Time(s):9.760e-03 Sol. time(s):8.581e+00 PCG niter:1
nTS:511 Time(s):1.022e-02 Sol. time(s):1.059e+01 PCG niter:1
nTS:535 Time(s):1.070e-02 Sol. time(s):8.452e+00 PCG niter:1
Code 2:
KSPCreate(PETSC_COMM_WORLD, &ksp_fetd_dt);
KSPSetOperators(ksp_fetd_dt, A_dt, A_dt);
KSPSetType (ksp_fetd_dt, KSPPREONLY);
KSPGetPC(ksp_fetd_dt, &pc_fetd_dt);
MatSetOption(A_dt, MAT_SPD, PETSC_TRUE);
PCSetType(pc_fetd_dt, PCCHOLESKY);
PCFactorSetMatSolverPackage(pc_fetd_dt, MATSOLVERMUMPS);
PCFactorSetUpMatSolverPackage(pc_fetd_dt);
PCFactorGetMatrix(pc_fetd_dt, &F_dt);
KSPSetType(ksp_fetd_dt, KSPCG);
KSPSetTolerances(ksp_fetd_dt, 1e-9, 1.0e-50, 1.0e10, ksp_iter);
for (int i=0; i<1000; i++) {
// Create a new RHS vector B_dt
time1=my_timer_function();
KSPSolve(ksp_fetd_dt,B_dt,solution);
time2=my_timer_function ()
// Output solution time=time2-time1;
}
Result 2:
nTS:10 Time(s):2.000e-04 Sol. time(s):4.644e+00 PCG niter:1 Norm:2.937e-10
nTS:11 Time(s):2.200e-04 Sol. time(s):4.585e+00 PCG niter:1 Norm:3.151e-10
nTS:12 Time(s):2.400e-04 Sol. time(s):4.737e+00 PCG niter:1 Norm:2.719e-10
nTS:13 Time(s):2.600e-04 Sol. time(s):4.537e+00 PCG niter:1 Norm:3.982e-10
nTS:14 Time(s):2.800e-04 Sol. time(s):4.578e+00 PCG niter:1 Norm:3.221e-10
nTS:15 Time(s):3.000e-04 Sol. time(s):4.678e+00 PCG niter:1 Norm:3.008e-10
nTS:16 Time(s):3.200e-04 Sol. time(s):4.619e+00 PCG niter:1 Norm:3.005e-10
nTS:46 Time(s):9.200e-04 Sol. time(s):6.862e+00 PCG niter:1 Norm:2.759e-10
nTS:48 Time(s):9.600e-04 Sol. time(s):4.475e+00 PCG niter:1 Norm:3.071e-10
nTS:50 Time(s):1.000e-03 Sol. time(s):6.068e+00 PCG niter:1 Norm:3.848e-10
nTS:52 Time(s):1.040e-03 Sol. time(s):5.024e+00 PCG niter:1 Norm:3.639e-10
nTS:55 Time(s):1.100e-03 Sol. time(s):1.333e+01 PCG niter:1 Norm:3.049e-10
nTS:58 Time(s):1.160e-03 Sol. time(s):1.440e+01 PCG niter:1 Norm:3.467e-10
nTS:60 Time(s):1.200e-03 Sol. time(s):1.475e+01 PCG niter:1 Norm:2.951e-10
nTS:63 Time(s):1.260e-03 Sol. time(s):9.899e+00 PCG niter:1 Norm:3.018e-10
nTS:66 Time(s):1.320e-03 Sol. time(s):1.097e+01 PCG niter:1 Norm:3.320e-10
nTS:69 Time(s):1.380e-03 Sol. time(s):1.485e+01 PCG niter:1 Norm:2.811e-10
nTS:73 Time(s):1.460e-03 Sol. time(s):1.199e+01 PCG niter:1 Norm:2.999e-10
nTS:76 Time(s):1.520e-03 Sol. time(s):1.109e+01 PCG niter:1 Norm:3.359e-10
nTS:80 Time(s):1.600e-03 Sol. time(s):1.473e+01 PCG niter:1 Norm:3.336e-10
nTS:84 Time(s):1.680e-03 Sol. time(s):1.444e+01 PCG niter:1 Norm:3.181e-10
nTS:88 Time(s):1.760e-03 Sol. time(s):1.639e+01 PCG niter:1 Norm:2.956e-10
nTS:92 Time(s):1.840e-03 Sol. time(s):1.713e+01 PCG niter:1 Norm:3.552e-10
nTS:96 Time(s):1.920e-03 Sol. time(s):1.562e+01 PCG niter:1 Norm:2.843e-10
nTS:101 Time(s):2.020e-03 Sol. time(s):1.631e+01 PCG niter:1 Norm:3.687e-10
nTS:105 Time(s):2.100e-03 Sol. time(s):1.496e+01 PCG niter:1 Norm:3.567e-10
nTS:111 Time(s):2.220e-03 Sol. time(s):1.607e+01 PCG niter:1 Norm:3.392e-10
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20141115/6af6ae46/attachment.html>
More information about the petsc-users
mailing list