[petsc-users] KSPSolve() get slower when preconditioner or Cholesky factor is re-used with many multiple RHS.

Matthew Knepley knepley at gmail.com
Sun Nov 16 10:44:04 CST 2014


On Sun, Nov 16, 2014 at 10:11 AM, Evan Um <evanum at gmail.com> wrote:

> Dear Matthew,
>
> Thanks for your comments. I will prepare log summary. To activate an
> separate log stage for each iteration (i.e. each RHS vector), I tried the
> code below, but got an error. Could you give me a comment? Does
> PetscLogView() allow users to use different output file names such that
> each iteration generates its own log file?
>

You use


http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Profiling/PetscLogStageRegister.html

for each stage, and then


http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Profiling/PetscLogStagePush.html

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Profiling/PetscLogStagePop.html#PetscLogStagePop

   Matt


> Evan
>
> /* work */
> PetscViewer viewer;
> for (int i=0; i<1000; i++) {
> PetscLogBegin();
> /* work to do*/
> PetscLogView(viewer);
> }
> /* work */
>
> Errors:
> Number of PCG iterations:1
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Corrupt argument:
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [0]PETSC ERROR: Invalid type of object: Parameter # 1
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.5.0, Jun, 30, 2014
> [0]PETSC ERROR: fetdem3dp on a arch-linux2-c-debug named n0024.voltaire0
> by esum Sun Nov 16 08:08:53 2014
> [0]PETSC ERROR: Configure options
> --prefix=/clusterfs/voltaire/home/software/modules/petsc/3.5.0
> --download-fblaslapack=1 --download-mumps=1 --download-parmetis=1
> --with-ptscotch=1
> --with-ptscotch-dir=/clusterfs/voltaire/home/software/modules/scotch/5.1.12-gcc/
> --download-scalapack --download-metis=1 --download-superlu=1
> --download-superlu_dist=1 --download-hypre=1
> --with-mpi-dir=/global/software/sl-6.x86_64/modules/gcc/4.4.7/openmpi/1.6.5-gcc/
> [0]PETSC ERROR: #1 PetscObjectTypeCompare() line 145 in
> /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/sys/objects/destroy.c
> [0]PETSC ERROR: #2 PetscLogView() line 1807 in
> /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/sys/logging/plog.c
>
>
> On Sun, Nov 16, 2014 at 3:57 AM, Matthew Knepley <knepley at gmail.com>
> wrote:
>
>> On Sat, Nov 15, 2014 at 9:24 PM, Evan Um <evanum at gmail.com> wrote:
>>
>>> Dear PETSC users,
>>>
>>> I would like to show you a performance issue when Cholesky factor is
>>> re-used as a direct solver or pre-conditioner many times with many
>>> right-hand side vectors. Does anyone suggest a solution about this issue?
>>> In advance, thanks for your help.
>>>
>>> Regards,
>>> Evan
>>>
>>> Example 1: I used MUMPS as a direct solver and measured backward/forward
>>> substitution solution time for selected RHS vectors. A simplified test code
>>> is shown in code 1 and time measurements in result 1. As shown below,
>>> backward and forward solution time is nearly constant (e.g. about 2.2
>>> seconds) in early time, but later the solution time overall increases. In
>>> contrast, in late time, it takes about 12.8 seconds. It is hard to
>>> understand why the backward/forward solution time gets longer although the
>>> same numerical operation is carried out with each RHS vector (for i). For
>>> many RHS vectors, is there any parameter that I need to reset to stably
>>> take care of lots of RHS vectors?
>>>
>>> Example 2: In this case, I use the Cholesky factor as a preconditioner
>>> for CG solver. One iteration is performed. Its sample code and time
>>> measurments are shown in Code 2 and Result 2. Again, KSPSolve() gets slower
>>> as the preconditioner is re-used with many RHS vectors. For example, In
>>> early stage, it took about 4.6 seconds. Later, it took 16 seconds. Does
>>> anyone observe such a performance issue? Do you know any solution for this
>>> problem?
>>> In the two experiments, I expected Example 2 would show shorter solution
>>> time with each RHS vector than Example 1 because Example 2 uses scalable
>>> matrix-vector multiplication. Instead, when MUMPS is used in Example 1,
>>> MUMPS carries out back/forward substitution that are inherently
>>> un-scalable. However, my experiments consistently showed that the
>>> back/forward substitution of MUMPS is faster than a single PCG iteration
>>> with Cholesky preconditioner. Can anyone explain this? FYI, in both example
>>> 1 and 2, I used a very large test problem (about sparse SPD
>>> 4,000,000-by-4,000,000 matrix).
>>>
>>> Code 1:
>>> KSPCreate(PETSC_COMM_WORLD, &ksp_fetd_dt);
>>> KSPSetOperators(ksp_fetd_dt, A_dt, A_dt);
>>> KSPSetType (ksp_fetd_dt, KSPPREONLY);
>>> KSPGetPC(ksp_fetd_dt, &pc_fetd_dt);
>>> MatSetOption(A_dt, MAT_SPD, PETSC_TRUE);
>>> PCSetType(pc_fetd_dt, PCCHOLESKY);
>>> PCFactorSetMatSolverPackage(pc_fetd_dt, MATSOLVERMUMPS);
>>> PCFactorSetUpMatSolverPackage(pc_fetd_dt);
>>> PCFactorGetMatrix(pc_fetd_dt, &F_dt);
>>> for (int i=0; i<1000; i++) {
>>> // Create a new RHS vector B_dt
>>>
>>
>> Are you calling VecCreate() each time here? If so, you could be filling
>> up the memory
>> of your computer as you run, causing it to slow down due to swapping.
>>
>> For any performance question, ALWAYS send the output of -log_summary. For
>> your
>> question, also divide this loop into log stages for each iteration, so we
>> can see what
>> specific operations slow down.
>>
>>    Matt
>>
>>
>>> time1=my_timer_function();
>>> KSPSolve(ksp_fetd_dt,B_dt,solution);
>>> time2=my_timer_function ()
>>> // Output solution time=time2-time1;
>>> }
>>>
>>> Result1:
>>> nTS:12 Time(s):2.400e-04 Sol. time(s):2.246e+00  PCG niter:1
>>> nTS:13 Time(s):2.600e-04 Sol. time(s):2.329e+00  PCG niter:1
>>> nTS:14 Time(s):2.800e-04 Sol. time(s):2.289e+00  PCG niter:1
>>> nTS:15 Time(s):3.000e-04 Sol. time(s):2.239e+00  PCG niter:1
>>> .
>>> .
>>> nTS:267 Time(s):5.340e-03 Sol. time(s):2.152e+00  PCG niter:1
>>> nTS:280 Time(s):5.600e-03 Sol. time(s):2.255e+00  PCG niter:1
>>> nTS:293 Time(s):5.860e-03 Sol. time(s):1.087e+01  PCG niter:1
>>> nTS:307 Time(s):6.140e-03 Sol. time(s):1.225e+01  PCG niter:1
>>> nTS:321 Time(s):6.420e-03 Sol. time(s):1.280e+01  PCG niter:1
>>> nTS:337 Time(s):6.740e-03 Sol. time(s):1.243e+01  PCG niter:1
>>> nTS:353 Time(s):7.060e-03 Sol. time(s):6.953e+00  PCG niter:1
>>> nTS:369 Time(s):7.380e-03 Sol. time(s):8.712e+00  PCG niter:1
>>> nTS:387 Time(s):7.740e-03 Sol. time(s):9.048e+00  PCG niter:1
>>> nTS:405 Time(s):8.100e-03 Sol. time(s):1.099e+01  PCG niter:1
>>> nTS:424 Time(s):8.480e-03 Sol. time(s):1.171e+01  PCG niter:1
>>> nTS:445 Time(s):8.900e-03 Sol. time(s):1.294e+01  PCG niter:1
>>> nTS:466 Time(s):9.320e-03 Sol. time(s):1.227e+01  PCG niter:1
>>> nTS:488 Time(s):9.760e-03 Sol. time(s):8.581e+00  PCG niter:1
>>> nTS:511 Time(s):1.022e-02 Sol. time(s):1.059e+01  PCG niter:1
>>> nTS:535 Time(s):1.070e-02 Sol. time(s):8.452e+00  PCG niter:1
>>>
>>> Code 2:
>>> KSPCreate(PETSC_COMM_WORLD, &ksp_fetd_dt);
>>> KSPSetOperators(ksp_fetd_dt, A_dt, A_dt);
>>> KSPSetType (ksp_fetd_dt, KSPPREONLY);
>>> KSPGetPC(ksp_fetd_dt, &pc_fetd_dt);
>>> MatSetOption(A_dt, MAT_SPD, PETSC_TRUE);
>>> PCSetType(pc_fetd_dt, PCCHOLESKY);
>>> PCFactorSetMatSolverPackage(pc_fetd_dt, MATSOLVERMUMPS);
>>> PCFactorSetUpMatSolverPackage(pc_fetd_dt);
>>> PCFactorGetMatrix(pc_fetd_dt, &F_dt);
>>> KSPSetType(ksp_fetd_dt, KSPCG);
>>> KSPSetTolerances(ksp_fetd_dt, 1e-9, 1.0e-50, 1.0e10, ksp_iter);
>>> for (int i=0; i<1000; i++) {
>>> // Create a new RHS vector B_dt
>>> time1=my_timer_function();
>>> KSPSolve(ksp_fetd_dt,B_dt,solution);
>>> time2=my_timer_function ()
>>> // Output solution time=time2-time1;
>>> }
>>>
>>> Result 2:
>>> nTS:10 Time(s):2.000e-04 Sol. time(s):4.644e+00  PCG niter:1
>>> Norm:2.937e-10
>>> nTS:11 Time(s):2.200e-04 Sol. time(s):4.585e+00  PCG niter:1
>>> Norm:3.151e-10
>>> nTS:12 Time(s):2.400e-04 Sol. time(s):4.737e+00  PCG niter:1
>>> Norm:2.719e-10
>>> nTS:13 Time(s):2.600e-04 Sol. time(s):4.537e+00  PCG niter:1
>>> Norm:3.982e-10
>>> nTS:14 Time(s):2.800e-04 Sol. time(s):4.578e+00  PCG niter:1
>>> Norm:3.221e-10
>>> nTS:15 Time(s):3.000e-04 Sol. time(s):4.678e+00  PCG niter:1
>>> Norm:3.008e-10
>>> nTS:16 Time(s):3.200e-04 Sol. time(s):4.619e+00  PCG niter:1
>>> Norm:3.005e-10
>>> nTS:46 Time(s):9.200e-04 Sol. time(s):6.862e+00  PCG niter:1
>>> Norm:2.759e-10
>>> nTS:48 Time(s):9.600e-04 Sol. time(s):4.475e+00  PCG niter:1
>>> Norm:3.071e-10
>>> nTS:50 Time(s):1.000e-03 Sol. time(s):6.068e+00  PCG niter:1
>>> Norm:3.848e-10
>>> nTS:52 Time(s):1.040e-03 Sol. time(s):5.024e+00  PCG niter:1
>>> Norm:3.639e-10
>>> nTS:55 Time(s):1.100e-03 Sol. time(s):1.333e+01  PCG niter:1
>>> Norm:3.049e-10
>>> nTS:58 Time(s):1.160e-03 Sol. time(s):1.440e+01  PCG niter:1
>>> Norm:3.467e-10
>>> nTS:60 Time(s):1.200e-03 Sol. time(s):1.475e+01  PCG niter:1
>>> Norm:2.951e-10
>>> nTS:63 Time(s):1.260e-03 Sol. time(s):9.899e+00  PCG niter:1
>>> Norm:3.018e-10
>>> nTS:66 Time(s):1.320e-03 Sol. time(s):1.097e+01  PCG niter:1
>>> Norm:3.320e-10
>>> nTS:69 Time(s):1.380e-03 Sol. time(s):1.485e+01  PCG niter:1
>>> Norm:2.811e-10
>>> nTS:73 Time(s):1.460e-03 Sol. time(s):1.199e+01  PCG niter:1
>>> Norm:2.999e-10
>>> nTS:76 Time(s):1.520e-03 Sol. time(s):1.109e+01  PCG niter:1
>>> Norm:3.359e-10
>>> nTS:80 Time(s):1.600e-03 Sol. time(s):1.473e+01  PCG niter:1
>>> Norm:3.336e-10
>>> nTS:84 Time(s):1.680e-03 Sol. time(s):1.444e+01  PCG niter:1
>>> Norm:3.181e-10
>>> nTS:88 Time(s):1.760e-03 Sol. time(s):1.639e+01  PCG niter:1
>>> Norm:2.956e-10
>>> nTS:92 Time(s):1.840e-03 Sol. time(s):1.713e+01  PCG niter:1
>>> Norm:3.552e-10
>>> nTS:96 Time(s):1.920e-03 Sol. time(s):1.562e+01  PCG niter:1
>>> Norm:2.843e-10
>>> nTS:101 Time(s):2.020e-03 Sol. time(s):1.631e+01  PCG niter:1
>>> Norm:3.687e-10
>>> nTS:105 Time(s):2.100e-03 Sol. time(s):1.496e+01  PCG niter:1
>>> Norm:3.567e-10
>>> nTS:111 Time(s):2.220e-03 Sol. time(s):1.607e+01  PCG niter:1
>>> Norm:3.392e-10
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20141116/de37f6c7/attachment.html>


More information about the petsc-users mailing list