<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Apr 30, 2014 at 8:17 AM, Justin Dong <span dir="ltr"><<a href="mailto:jsd1@rice.edu" target="_blank">jsd1@rice.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr">Here are the results of one example where the solution is incorrect: </div></blockquote><div><br></div><div>I am skeptical of your conclusion. The entry in "true residual norm", 4.545226736905e-16, is generated</div>
<div>using just MatMult() and VecAXPY(). It is the definition of "correct". What do you get when you replicate</div><div>these steps with the solution that is returned?</div><div><br></div><div> Matt</div><div>
</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div>
<p> 0 KSP unpreconditioned resid norm 7.267616711036e-05 true resid norm 7.267616711036e-05 ||r(i)||/||b|| 1.000000000000e+00</p><p> 1 KSP unpreconditioned resid norm 4.081398605668e-16 true resid norm 4.017029301117e-16 ||r(i)||/||b|| 5.527299334618e-12</p>
<p>
</p><p> 2 KSP unpreconditioned resid norm 4.378737248697e-21 true resid norm 4.545226736905e-16 ||r(i)||/||b|| 6.254081520291e-12</p>
<p>KSP Object: 4 MPI processes</p>
<p> type: gmres</p>
<p> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative refinement</p>
<p> GMRES: happy breakdown tolerance 1e-30</p>
<p> maximum iterations=10000, initial guess is zero</p>
<p> tolerances: relative=1e-13, absolute=1e-50, divergence=10000</p>
<p> right preconditioning</p>
<p> using UNPRECONDITIONED norm type for convergence test</p>
<p>PC Object: 4 MPI processes</p>
<p> type: lu</p>
<p> LU: out-of-place factorization</p>
<p> tolerance for zero pivot 2.22045e-14</p>
<p> matrix ordering: natural</p>
<p> factor fill ratio given 0, needed 0</p>
<p> Factored matrix follows:</p>
<p> Matrix Object: 4 MPI processes</p>
<p> type: mpiaij</p>
<p> rows=1536, cols=1536</p>
<p> package used to perform factorization: superlu_dist</p>
<p> total: nonzeros=0, allocated nonzeros=0</p>
<p> total number of mallocs used during MatSetValues calls =0</p>
<p> SuperLU_DIST run parameters:</p>
<p> Process grid nprow 2 x npcol 2 </p>
<p> Equilibrate matrix TRUE </p>
<p> Matrix input mode 1 </p>
<p> Replace tiny pivots TRUE </p>
<p> Use iterative refinement FALSE </p>
<p> Processors in row 2 col partition 2 </p>
<p> Row permutation LargeDiag </p>
<p> Column permutation METIS_AT_PLUS_A</p>
<p> Parallel symbolic factorization FALSE </p>
<p> Repeated factorization SamePattern_SameRowPerm</p>
<p> linear system matrix = precond matrix:</p>
<p> Matrix Object: 4 MPI processes</p>
<p> type: mpiaij</p>
<p> rows=1536, cols=1536</p>
<p> total: nonzeros=17856, allocated nonzeros=64512</p>
<p> total number of mallocs used during MatSetValues calls =0</p>
<p> using I-node (on process 0) routines: found 128 nodes, limit used is 5</p></div></div><div class=""><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Apr 30, 2014 at 7:57 AM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><br>
On Apr 30, 2014, at 6:46 AM, Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br>
<br>
> On Wed, Apr 30, 2014 at 6:19 AM, Justin Dong <<a href="mailto:jsd1@rice.edu" target="_blank">jsd1@rice.edu</a>> wrote:<br>
> Thanks. If I turn on the Krylov solver, the issue still seems to persist though.<br>
><br>
> mpiexec -n 4 -ksp_type gmres -ksp_rtol 1.0e-13 -pc_type lu -pc_factor_mat_solver_package superlu_dist<br>
><br>
> I'm testing on a very small system now (24 ndofs), but if I go larger (around 20000 ndofs) then it gets worse.<br>
><br>
> For the small system, I exported the matrices to matlab to make sure they were being assembled correct in parallel, and I'm certain that that they are.<br>
><br>
> For convergence questions, always run using -ksp_monitor -ksp_view so that we can see exactly what you run.<br>
<br>
</div> Also run with -ksp_pc_side right<br>
<div><div><br>
<br>
><br>
> Thanks,<br>
><br>
> Matt<br>
><br>
><br>
> On Wed, Apr 30, 2014 at 5:32 AM, Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br>
> On Wed, Apr 30, 2014 at 3:02 AM, Justin Dong <<a href="mailto:jsd1@rice.edu" target="_blank">jsd1@rice.edu</a>> wrote:<br>
> I actually was able to solve my own problem...for some reason, I need to do<br>
><br>
> PCSetType(pc, PCLU);<br>
> PCFactorSetMatSolverPackage(pc, MATSOLVERSUPERLU_DIST);<br>
> KSPSetTolerances(ksp, 1.e-15, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT);<br>
><br>
> 1) Before you do SetType(PCLU) the preconditioner has no type, so FactorSetMatSolverPackage() has no effect<br>
><br>
> 2) There is a larger issue here. Never ever ever ever code in this way. Hardcoding a solver is crazy. The solver you<br>
> use should depend on the equation, discretization, flow regime, and architecture. Recompiling for all those is<br>
> out of the question. You should just use<br>
><br>
> KSPCreate()<br>
> KSPSetOperators()<br>
> KSPSetFromOptions()<br>
> KSPSolve()<br>
><br>
> and then<br>
><br>
> -pc_type lu -pc_factor_mat_solver_package superlu_dist<br>
><br>
><br>
> instead of the ordering I initially had, though I'm not really clear on what the issue was. However, there seems to be some loss of accuracy as I increase the number of processes. Is this expected, or can I force a lower tolerance somehow? I am able to compare the solutions to a reference solution, and the error increases as I increase the processes. This is the solution in sequential:<br>
><br>
> Yes, this is unavoidable. However, just turn on the Krylov solver<br>
><br>
> -ksp_type gmres -ksp_rtol 1.0e-10<br>
><br>
> and you can get whatever residual tolerance you want. To get a specific error, you would need<br>
> a posteriori error estimation, which you could include in a custom convergence criterion.<br>
><br>
> Thanks,<br>
><br>
> Matt<br>
><br>
> superlu_1process = [<br>
> -6.8035811950925553e-06<br>
> 1.6324030474375778e-04<br>
> 5.4145340579614926e-02<br>
> 1.6640521936281516e-04<br>
> -1.7669374392923965e-04<br>
> -2.8099208957838207e-04<br>
> 5.3958133511222223e-02<br>
> -5.4077899123806263e-02<br>
> -5.3972905090366369e-02<br>
> -1.9485020474821160e-04<br>
> 5.4239813043824400e-02<br>
> 4.4883984259948430e-04];<br>
><br>
> superlu_2process = [<br>
> -6.8035811950509821e-06<br>
> 1.6324030474371623e-04<br>
> 5.4145340579605655e-02<br>
> 1.6640521936281687e-04<br>
> -1.7669374392923807e-04<br>
> -2.8099208957839834e-04<br>
> 5.3958133511212911e-02<br>
> -5.4077899123796964e-02<br>
> -5.3972905090357078e-02<br>
> -1.9485020474824480e-04<br>
> 5.4239813043815172e-02<br>
> 4.4883984259953320e-04];<br>
><br>
> superlu_4process= [<br>
> -6.8035811952565206e-06<br>
> 1.6324030474386164e-04<br>
> 5.4145340579691455e-02<br>
> 1.6640521936278326e-04<br>
> -1.7669374392921441e-04<br>
> -2.8099208957829171e-04<br>
> 5.3958133511299078e-02<br>
> -5.4077899123883062e-02<br>
> -5.3972905090443085e-02<br>
> -1.9485020474806352e-04<br>
> 5.4239813043900860e-02<br>
> 4.4883984259921287e-04];<br>
><br>
> This is some finite element solution and I can compute the error of the solution against an exact solution in the functional L2 norm:<br>
><br>
> error with 1 process: 1.71340e-02 (accepted value)<br>
> error with 2 processes: 2.65018e-02<br>
> error with 3 processes: 3.00164e-02<br>
> error with 4 processes: 3.14544e-02<br>
><br>
><br>
> Is there a way to remedy this?<br>
><br>
><br>
> On Wed, Apr 30, 2014 at 2:37 AM, Justin Dong <<a href="mailto:jsd1@rice.edu" target="_blank">jsd1@rice.edu</a>> wrote:<br>
> Hi,<br>
><br>
> I'm trying to solve a linear system in parallel using SuperLU but for some reason, it's not giving me the correct solution. I'm testing on a small example so I can compare the sequential and parallel cases manually. I'm absolutely sure that my routine for generating the matrix and right-hand side in parallel is working correctly.<br>
><br>
> Running with 1 process and PCLU gives the correct solution. Running with 2 processes and using SUPERLU_DIST does not give the correct solution (I tried with 1 process too but according to the superlu documentation, I would need SUPERLU for sequential?). This is the code for solving the system:<br>
><br>
> /* solve the system */<br>
> KSPCreate(PETSC_COMM_WORLD, &ksp);<br>
> KSPSetOperators(ksp, Aglobal, Aglobal, DIFFERENT_NONZERO_PATTERN);<br>
> KSPSetType(ksp,KSPPREONLY);<br>
><br>
> KSPGetPC(ksp, &pc);<br>
><br>
> KSPSetTolerances(ksp, 1.e-13, PETSC_DEFAULT, PETSC_DEFAULT, PETSC_DEFAULT);<br>
> PCFactorSetMatSolverPackage(pc, MATSOLVERSUPERLU_DIST);<br>
><br>
> KSPSolve(ksp, bglobal, bglobal);<br>
><br>
> Sincerely,<br>
> Justin<br>
><br>
><br>
><br>
><br>
><br>
><br>
> --<br>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
> -- Norbert Wiener<br>
><br>
><br>
><br>
><br>
> --<br>
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
> -- Norbert Wiener<br>
<br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener
</div></div>