<div dir="ltr"><div>Dear Matthew,</div><div><br></div><div>Thanks for your comments. I will prepare log summary. To activate an separate log stage for each iteration (i.e. each RHS vector), I tried the code below, but got an error. Could you give me a comment? Does PetscLogView() allow users to use different output file names such that each iteration generates its own log file?</div><div><br></div><div>Evan</div><div><br></div><div>/* work */</div><div>PetscViewer viewer;</div><div>for (int i=0; i<1000; i++) {</div><div>PetscLogBegin();</div><div>/* work to do*/</div><div>PetscLogView(viewer);</div><div>}</div><div>/* work */</div><div><br></div><div>Errors:</div><div>Number of PCG iterations:1<br>[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------<br>[0]PETSC ERROR: Corrupt argument: <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind">http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind</a><br>[0]PETSC ERROR: Invalid type of object: Parameter # 1<br>[0]PETSC ERROR: See <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html">http://www.mcs.anl.gov/petsc/documentation/faq.html</a> for trouble shooting.<br>[0]PETSC ERROR: Petsc Release Version 3.5.0, Jun, 30, 2014<br>[0]PETSC ERROR: fetdem3dp on a arch-linux2-c-debug named n0024.voltaire0 by esum Sun Nov 16 08:08:53 2014<br>[0]PETSC ERROR: Configure options --prefix=/clusterfs/voltaire/home/software/modules/petsc/3.5.0 --download-fblaslapack=1 --download-mumps=1 --download-parmetis=1 --with-ptscotch=1 --with-ptscotch-dir=/clusterfs/voltaire/home/software/modules/scotch/5.1.12-gcc/ --download-scalapack --download-metis=1 --download-superlu=1 --download-superlu_dist=1 --download-hypre=1 --with-mpi-dir=/global/software/sl-6.x86_64/modules/gcc/4.4.7/openmpi/1.6.5-gcc/<br>[0]PETSC ERROR: #1 PetscObjectTypeCompare() line 145 in /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/sys/objects/destroy.c<br>[0]PETSC ERROR: #2 PetscLogView() line 1807 in /clusterfs/voltaire/home/software/source/petsc-3.5.0/src/sys/logging/plog.c</div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Nov 16, 2014 at 3:57 AM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Sat, Nov 15, 2014 at 9:24 PM, Evan Um <span dir="ltr"><<a href="mailto:evanum@gmail.com" target="_blank">evanum@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><div dir="ltr"><div>Dear PETSC users,</div><div><br></div><div>I would like to show you a performance issue when Cholesky factor is re-used as a direct solver or pre-conditioner many times with many right-hand side vectors. Does anyone suggest a solution about this issue? In advance, thanks for your help.</div><div><br></div><div>Regards,</div><div>Evan</div><div><br></div><div>Example 1: I used MUMPS as a direct solver and measured backward/forward substitution solution time for selected RHS vectors. A simplified test code is shown in code 1 and time measurements in result 1. As shown below, backward and forward solution time is nearly constant (e.g. about 2.2 seconds) in early time, but later the solution time overall increases. In contrast, in late time, it takes about 12.8 seconds. It is hard to understand why the backward/forward solution time gets longer although the same numerical operation is carried out with each RHS vector (for i). For many RHS vectors, is there any parameter that I need to reset to stably take care of lots of RHS vectors?</div><div><br></div><div>Example 2: In this case, I use the Cholesky factor as a preconditioner for CG solver. One iteration is performed. Its sample code and time measurments are shown in Code 2 and Result 2. Again, KSPSolve() gets slower as the preconditioner is re-used with many RHS vectors. For example, In early stage, it took about 4.6 seconds. Later, it took 16 seconds. Does anyone observe such a performance issue? Do you know any solution for this problem? </div><div>In the two experiments, I expected Example 2 would show shorter solution time with each RHS vector than Example 1 because Example 2 uses scalable matrix-vector multiplication. Instead, when MUMPS is used in Example 1, MUMPS carries out back/forward substitution that are inherently un-scalable. However, my experiments consistently showed that the back/forward substitution of MUMPS is faster than a single PCG iteration with Cholesky preconditioner. Can anyone explain this? FYI, in both example 1 and 2, I used a very large test problem (about sparse SPD 4,000,000-by-4,000,000 matrix).</div><div><br></div><div>Code 1:</div><div>KSPCreate(PETSC_COMM_WORLD, &ksp_fetd_dt);<br>KSPSetOperators(ksp_fetd_dt, A_dt, A_dt);<br>KSPSetType (ksp_fetd_dt, KSPPREONLY);<br>KSPGetPC(ksp_fetd_dt, &pc_fetd_dt);<br>MatSetOption(A_dt, MAT_SPD, PETSC_TRUE);<br>PCSetType(pc_fetd_dt, PCCHOLESKY);<br>PCFactorSetMatSolverPackage(pc_fetd_dt, MATSOLVERMUMPS);<br>PCFactorSetUpMatSolverPackage(pc_fetd_dt);<br>PCFactorGetMatrix(pc_fetd_dt, &F_dt);<br>for (int i=0; i<1000; i++) {<br>// Create a new RHS vector B_dt<br></div></div></blockquote><div><br></div></span><div>Are you calling VecCreate() each time here? If so, you could be filling up the memory</div><div>of your computer as you run, causing it to slow down due to swapping.</div><div><br></div><div>For any performance question, ALWAYS send the output of -log_summary. For your</div><div>question, also divide this loop into log stages for each iteration, so we can see what</div><div>specific operations slow down.</div><div><br></div><div> Matt</div><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><div dir="ltr"><div>time1=my_timer_function();<br>KSPSolve(ksp_fetd_dt,B_dt,solution); <br>time2=my_timer_function ()<br>// Output solution time=time2-time1;<br>}</div><div><br></div><font><div>Result1:</div><div>nTS:12 Time(s):2.400e-04 Sol. time(s):2.246e+00 PCG niter:1 <br>nTS:13 Time(s):2.600e-04 Sol. time(s):2.329e+00 PCG niter:1 <br>nTS:14 Time(s):2.800e-04 Sol. time(s):2.289e+00 PCG niter:1 <br>nTS:15 Time(s):3.000e-04 Sol. time(s):2.239e+00 PCG niter:1 <br>.<br>.</div><div>nTS:267 Time(s):5.340e-03 Sol. time(s):2.152e+00 PCG niter:1 <br>nTS:280 Time(s):5.600e-03 Sol. time(s):2.255e+00 PCG niter:1 <br>nTS:293 Time(s):5.860e-03 Sol. time(s):1.087e+01 PCG niter:1 <br>nTS:307 Time(s):6.140e-03 Sol. time(s):1.225e+01 PCG niter:1 <br>nTS:321 Time(s):6.420e-03 Sol. time(s):1.280e+01 PCG niter:1 <br>nTS:337 Time(s):6.740e-03 Sol. time(s):1.243e+01 PCG niter:1 <br>nTS:353 Time(s):7.060e-03 Sol. time(s):6.953e+00 PCG niter:1 <br>nTS:369 Time(s):7.380e-03 Sol. time(s):8.712e+00 PCG niter:1 <br>nTS:387 Time(s):7.740e-03 Sol. time(s):9.048e+00 PCG niter:1 <br>nTS:405 Time(s):8.100e-03 Sol. time(s):1.099e+01 PCG niter:1 <br>nTS:424 Time(s):8.480e-03 Sol. time(s):1.171e+01 PCG niter:1 <br>nTS:445 Time(s):8.900e-03 Sol. time(s):1.294e+01 PCG niter:1 <br>nTS:466 Time(s):9.320e-03 Sol. time(s):1.227e+01 PCG niter:1 <br>nTS:488 Time(s):9.760e-03 Sol. time(s):8.581e+00 PCG niter:1 <br>nTS:511 Time(s):1.022e-02 Sol. time(s):1.059e+01 PCG niter:1 <br>nTS:535 Time(s):1.070e-02 Sol. time(s):8.452e+00 PCG niter:1 </div><div><br></div><div>Code 2:</div><div>KSPCreate(PETSC_COMM_WORLD, &ksp_fetd_dt);<br>KSPSetOperators(ksp_fetd_dt, A_dt, A_dt);<br>KSPSetType (ksp_fetd_dt, KSPPREONLY);<br>KSPGetPC(ksp_fetd_dt, &pc_fetd_dt);<br>MatSetOption(A_dt, MAT_SPD, PETSC_TRUE);<br>PCSetType(pc_fetd_dt, PCCHOLESKY);<br>PCFactorSetMatSolverPackage(pc_fetd_dt, MATSOLVERMUMPS);<br>PCFactorSetUpMatSolverPackage(pc_fetd_dt);<br>PCFactorGetMatrix(pc_fetd_dt, &F_dt);<br>KSPSetType(ksp_fetd_dt, KSPCG); <br>KSPSetTolerances(ksp_fetd_dt, 1e-9, 1.0e-50, 1.0e10, ksp_iter);</div><div>for (int i=0; i<1000; i++) {<br>// Create a new RHS vector B_dt<br>time1=my_timer_function();<br>KSPSolve(ksp_fetd_dt,B_dt,solution); <br>time2=my_timer_function ()<br>// Output solution time=time2-time1;<br>}</div><div><br></div><div>Result 2:</div><div>nTS:10 Time(s):2.000e-04 Sol. time(s):4.644e+00 PCG niter:1 Norm:2.937e-10<br>nTS:11 Time(s):2.200e-04 Sol. time(s):4.585e+00 PCG niter:1 Norm:3.151e-10<br>nTS:12 Time(s):2.400e-04 Sol. time(s):4.737e+00 PCG niter:1 Norm:2.719e-10<br>nTS:13 Time(s):2.600e-04 Sol. time(s):4.537e+00 PCG niter:1 Norm:3.982e-10<br>nTS:14 Time(s):2.800e-04 Sol. time(s):4.578e+00 PCG niter:1 Norm:3.221e-10<br>nTS:15 Time(s):3.000e-04 Sol. time(s):4.678e+00 PCG niter:1 Norm:3.008e-10<br>nTS:16 Time(s):3.200e-04 Sol. time(s):4.619e+00 PCG niter:1 Norm:3.005e-10<br>nTS:46 Time(s):9.200e-04 Sol. time(s):6.862e+00 PCG niter:1 Norm:2.759e-10<br>nTS:48 Time(s):9.600e-04 Sol. time(s):4.475e+00 PCG niter:1 Norm:3.071e-10<br>nTS:50 Time(s):1.000e-03 Sol. time(s):6.068e+00 PCG niter:1 Norm:3.848e-10<br>nTS:52 Time(s):1.040e-03 Sol. time(s):5.024e+00 PCG niter:1 Norm:3.639e-10<br>nTS:55 Time(s):1.100e-03 Sol. time(s):1.333e+01 PCG niter:1 Norm:3.049e-10<br>nTS:58 Time(s):1.160e-03 Sol. time(s):1.440e+01 PCG niter:1 Norm:3.467e-10<br>nTS:60 Time(s):1.200e-03 Sol. time(s):1.475e+01 PCG niter:1 Norm:2.951e-10<br>nTS:63 Time(s):1.260e-03 Sol. time(s):9.899e+00 PCG niter:1 Norm:3.018e-10<br>nTS:66 Time(s):1.320e-03 Sol. time(s):1.097e+01 PCG niter:1 Norm:3.320e-10<br>nTS:69 Time(s):1.380e-03 Sol. time(s):1.485e+01 PCG niter:1 Norm:2.811e-10<br>nTS:73 Time(s):1.460e-03 Sol. time(s):1.199e+01 PCG niter:1 Norm:2.999e-10<br>nTS:76 Time(s):1.520e-03 Sol. time(s):1.109e+01 PCG niter:1 Norm:3.359e-10<br>nTS:80 Time(s):1.600e-03 Sol. time(s):1.473e+01 PCG niter:1 Norm:3.336e-10<br>nTS:84 Time(s):1.680e-03 Sol. time(s):1.444e+01 PCG niter:1 Norm:3.181e-10<br>nTS:88 Time(s):1.760e-03 Sol. time(s):1.639e+01 PCG niter:1 Norm:2.956e-10<br>nTS:92 Time(s):1.840e-03 Sol. time(s):1.713e+01 PCG niter:1 Norm:3.552e-10<br>nTS:96 Time(s):1.920e-03 Sol. time(s):1.562e+01 PCG niter:1 Norm:2.843e-10<br>nTS:101 Time(s):2.020e-03 Sol. time(s):1.631e+01 PCG niter:1 Norm:3.687e-10<br>nTS:105 Time(s):2.100e-03 Sol. time(s):1.496e+01 PCG niter:1 Norm:3.567e-10<br>nTS:111 Time(s):2.220e-03 Sol. time(s):1.607e+01 PCG niter:1 Norm:3.392e-10<br><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><p><br></p></font><div><br></div></div>
</blockquote></div></div></div><span><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>
</font></span></div></div>
</blockquote></div><br></div></div>