[petsc-users] KSPSolve() get slower when preconditioner or Cholesky factor is re-used with many multiple RHS.

Evan Um evanum at gmail.com
Sun Nov 16 10:11:42 CST 2014


Dear Matthew,

Thanks for your comments. I will prepare log summary. To activate an
separate log stage for each iteration (i.e. each RHS vector), I tried the
code below, but got an error. Could you give me a comment? Does
PetscLogView() allow users to use different output file names such that
each iteration generates its own log file?

Evan

/* work */
PetscViewer viewer;
for (int i=0; i<1000; i++) {
PetscLogBegin();
/* work to do*/
PetscLogView(viewer);
}
/* work */

Errors:
Number of PCG iterations:1
[0]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
[0]PETSC ERROR: Corrupt argument:
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: Invalid type of object: Parameter # 1
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.5.0, Jun, 30, 2014
[0]PETSC ERROR: fetdem3dp on a arch-linux2-c-debug named n0024.voltaire0 by
esum Sun Nov 16 08:08:53 2014
[0]PETSC ERROR: Configure options
--prefix=/clusterfs/voltaire/home/software/modules/petsc/3.5.0
--download-fblaslapack=1 --download-mumps=1 --download-parmetis=1
--with-ptscotch=1
--with-ptscotch-dir=/clusterfs/voltaire/home/software/modules/scotch/5.1.12-gcc/
--download-scalapack --download-metis=1 --download-superlu=1
--download-superlu_dist=1 --download-hypre=1
--with-mpi-dir=/global/software/sl-6.x86_64/modules/gcc/4.4.7/openmpi/1.6.5-gcc/
[0]PETSC ERROR: #1 PetscObjectTypeCompare() line 145 in
/clusterfs/voltaire/home/software/source/petsc-3.5.0/src/sys/objects/destroy.c
[0]PETSC ERROR: #2 PetscLogView() line 1807 in
/clusterfs/voltaire/home/software/source/petsc-3.5.0/src/sys/logging/plog.c


On Sun, Nov 16, 2014 at 3:57 AM, Matthew Knepley <knepley at gmail.com> wrote:

> On Sat, Nov 15, 2014 at 9:24 PM, Evan Um <evanum at gmail.com> wrote:
>
>> Dear PETSC users,
>>
>> I would like to show you a performance issue when Cholesky factor is
>> re-used as a direct solver or pre-conditioner many times with many
>> right-hand side vectors. Does anyone suggest a solution about this issue?
>> In advance, thanks for your help.
>>
>> Regards,
>> Evan
>>
>> Example 1: I used MUMPS as a direct solver and measured backward/forward
>> substitution solution time for selected RHS vectors. A simplified test code
>> is shown in code 1 and time measurements in result 1. As shown below,
>> backward and forward solution time is nearly constant (e.g. about 2.2
>> seconds) in early time, but later the solution time overall increases. In
>> contrast, in late time, it takes about 12.8 seconds. It is hard to
>> understand why the backward/forward solution time gets longer although the
>> same numerical operation is carried out with each RHS vector (for i). For
>> many RHS vectors, is there any parameter that I need to reset to stably
>> take care of lots of RHS vectors?
>>
>> Example 2: In this case, I use the Cholesky factor as a preconditioner
>> for CG solver. One iteration is performed. Its sample code and time
>> measurments are shown in Code 2 and Result 2. Again, KSPSolve() gets slower
>> as the preconditioner is re-used with many RHS vectors. For example, In
>> early stage, it took about 4.6 seconds. Later, it took 16 seconds. Does
>> anyone observe such a performance issue? Do you know any solution for this
>> problem?
>> In the two experiments, I expected Example 2 would show shorter solution
>> time with each RHS vector than Example 1 because Example 2 uses scalable
>> matrix-vector multiplication. Instead, when MUMPS is used in Example 1,
>> MUMPS carries out back/forward substitution that are inherently
>> un-scalable. However, my experiments consistently showed that the
>> back/forward substitution of MUMPS is faster than a single PCG iteration
>> with Cholesky preconditioner. Can anyone explain this? FYI, in both example
>> 1 and 2, I used a very large test problem (about sparse SPD
>> 4,000,000-by-4,000,000 matrix).
>>
>> Code 1:
>> KSPCreate(PETSC_COMM_WORLD, &ksp_fetd_dt);
>> KSPSetOperators(ksp_fetd_dt, A_dt, A_dt);
>> KSPSetType (ksp_fetd_dt, KSPPREONLY);
>> KSPGetPC(ksp_fetd_dt, &pc_fetd_dt);
>> MatSetOption(A_dt, MAT_SPD, PETSC_TRUE);
>> PCSetType(pc_fetd_dt, PCCHOLESKY);
>> PCFactorSetMatSolverPackage(pc_fetd_dt, MATSOLVERMUMPS);
>> PCFactorSetUpMatSolverPackage(pc_fetd_dt);
>> PCFactorGetMatrix(pc_fetd_dt, &F_dt);
>> for (int i=0; i<1000; i++) {
>> // Create a new RHS vector B_dt
>>
>
> Are you calling VecCreate() each time here? If so, you could be filling up
> the memory
> of your computer as you run, causing it to slow down due to swapping.
>
> For any performance question, ALWAYS send the output of -log_summary. For
> your
> question, also divide this loop into log stages for each iteration, so we
> can see what
> specific operations slow down.
>
>    Matt
>
>
>> time1=my_timer_function();
>> KSPSolve(ksp_fetd_dt,B_dt,solution);
>> time2=my_timer_function ()
>> // Output solution time=time2-time1;
>> }
>>
>> Result1:
>> nTS:12 Time(s):2.400e-04 Sol. time(s):2.246e+00  PCG niter:1
>> nTS:13 Time(s):2.600e-04 Sol. time(s):2.329e+00  PCG niter:1
>> nTS:14 Time(s):2.800e-04 Sol. time(s):2.289e+00  PCG niter:1
>> nTS:15 Time(s):3.000e-04 Sol. time(s):2.239e+00  PCG niter:1
>> .
>> .
>> nTS:267 Time(s):5.340e-03 Sol. time(s):2.152e+00  PCG niter:1
>> nTS:280 Time(s):5.600e-03 Sol. time(s):2.255e+00  PCG niter:1
>> nTS:293 Time(s):5.860e-03 Sol. time(s):1.087e+01  PCG niter:1
>> nTS:307 Time(s):6.140e-03 Sol. time(s):1.225e+01  PCG niter:1
>> nTS:321 Time(s):6.420e-03 Sol. time(s):1.280e+01  PCG niter:1
>> nTS:337 Time(s):6.740e-03 Sol. time(s):1.243e+01  PCG niter:1
>> nTS:353 Time(s):7.060e-03 Sol. time(s):6.953e+00  PCG niter:1
>> nTS:369 Time(s):7.380e-03 Sol. time(s):8.712e+00  PCG niter:1
>> nTS:387 Time(s):7.740e-03 Sol. time(s):9.048e+00  PCG niter:1
>> nTS:405 Time(s):8.100e-03 Sol. time(s):1.099e+01  PCG niter:1
>> nTS:424 Time(s):8.480e-03 Sol. time(s):1.171e+01  PCG niter:1
>> nTS:445 Time(s):8.900e-03 Sol. time(s):1.294e+01  PCG niter:1
>> nTS:466 Time(s):9.320e-03 Sol. time(s):1.227e+01  PCG niter:1
>> nTS:488 Time(s):9.760e-03 Sol. time(s):8.581e+00  PCG niter:1
>> nTS:511 Time(s):1.022e-02 Sol. time(s):1.059e+01  PCG niter:1
>> nTS:535 Time(s):1.070e-02 Sol. time(s):8.452e+00  PCG niter:1
>>
>> Code 2:
>> KSPCreate(PETSC_COMM_WORLD, &ksp_fetd_dt);
>> KSPSetOperators(ksp_fetd_dt, A_dt, A_dt);
>> KSPSetType (ksp_fetd_dt, KSPPREONLY);
>> KSPGetPC(ksp_fetd_dt, &pc_fetd_dt);
>> MatSetOption(A_dt, MAT_SPD, PETSC_TRUE);
>> PCSetType(pc_fetd_dt, PCCHOLESKY);
>> PCFactorSetMatSolverPackage(pc_fetd_dt, MATSOLVERMUMPS);
>> PCFactorSetUpMatSolverPackage(pc_fetd_dt);
>> PCFactorGetMatrix(pc_fetd_dt, &F_dt);
>> KSPSetType(ksp_fetd_dt, KSPCG);
>> KSPSetTolerances(ksp_fetd_dt, 1e-9, 1.0e-50, 1.0e10, ksp_iter);
>> for (int i=0; i<1000; i++) {
>> // Create a new RHS vector B_dt
>> time1=my_timer_function();
>> KSPSolve(ksp_fetd_dt,B_dt,solution);
>> time2=my_timer_function ()
>> // Output solution time=time2-time1;
>> }
>>
>> Result 2:
>> nTS:10 Time(s):2.000e-04 Sol. time(s):4.644e+00  PCG niter:1
>> Norm:2.937e-10
>> nTS:11 Time(s):2.200e-04 Sol. time(s):4.585e+00  PCG niter:1
>> Norm:3.151e-10
>> nTS:12 Time(s):2.400e-04 Sol. time(s):4.737e+00  PCG niter:1
>> Norm:2.719e-10
>> nTS:13 Time(s):2.600e-04 Sol. time(s):4.537e+00  PCG niter:1
>> Norm:3.982e-10
>> nTS:14 Time(s):2.800e-04 Sol. time(s):4.578e+00  PCG niter:1
>> Norm:3.221e-10
>> nTS:15 Time(s):3.000e-04 Sol. time(s):4.678e+00  PCG niter:1
>> Norm:3.008e-10
>> nTS:16 Time(s):3.200e-04 Sol. time(s):4.619e+00  PCG niter:1
>> Norm:3.005e-10
>> nTS:46 Time(s):9.200e-04 Sol. time(s):6.862e+00  PCG niter:1
>> Norm:2.759e-10
>> nTS:48 Time(s):9.600e-04 Sol. time(s):4.475e+00  PCG niter:1
>> Norm:3.071e-10
>> nTS:50 Time(s):1.000e-03 Sol. time(s):6.068e+00  PCG niter:1
>> Norm:3.848e-10
>> nTS:52 Time(s):1.040e-03 Sol. time(s):5.024e+00  PCG niter:1
>> Norm:3.639e-10
>> nTS:55 Time(s):1.100e-03 Sol. time(s):1.333e+01  PCG niter:1
>> Norm:3.049e-10
>> nTS:58 Time(s):1.160e-03 Sol. time(s):1.440e+01  PCG niter:1
>> Norm:3.467e-10
>> nTS:60 Time(s):1.200e-03 Sol. time(s):1.475e+01  PCG niter:1
>> Norm:2.951e-10
>> nTS:63 Time(s):1.260e-03 Sol. time(s):9.899e+00  PCG niter:1
>> Norm:3.018e-10
>> nTS:66 Time(s):1.320e-03 Sol. time(s):1.097e+01  PCG niter:1
>> Norm:3.320e-10
>> nTS:69 Time(s):1.380e-03 Sol. time(s):1.485e+01  PCG niter:1
>> Norm:2.811e-10
>> nTS:73 Time(s):1.460e-03 Sol. time(s):1.199e+01  PCG niter:1
>> Norm:2.999e-10
>> nTS:76 Time(s):1.520e-03 Sol. time(s):1.109e+01  PCG niter:1
>> Norm:3.359e-10
>> nTS:80 Time(s):1.600e-03 Sol. time(s):1.473e+01  PCG niter:1
>> Norm:3.336e-10
>> nTS:84 Time(s):1.680e-03 Sol. time(s):1.444e+01  PCG niter:1
>> Norm:3.181e-10
>> nTS:88 Time(s):1.760e-03 Sol. time(s):1.639e+01  PCG niter:1
>> Norm:2.956e-10
>> nTS:92 Time(s):1.840e-03 Sol. time(s):1.713e+01  PCG niter:1
>> Norm:3.552e-10
>> nTS:96 Time(s):1.920e-03 Sol. time(s):1.562e+01  PCG niter:1
>> Norm:2.843e-10
>> nTS:101 Time(s):2.020e-03 Sol. time(s):1.631e+01  PCG niter:1
>> Norm:3.687e-10
>> nTS:105 Time(s):2.100e-03 Sol. time(s):1.496e+01  PCG niter:1
>> Norm:3.567e-10
>> nTS:111 Time(s):2.220e-03 Sol. time(s):1.607e+01  PCG niter:1
>> Norm:3.392e-10
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20141116/9bc1d395/attachment-0001.html>


More information about the petsc-users mailing list