[petsc-users] MATSOLVERSUPERLU_DIST not giving the correct solution

Wed Apr 30 19:26:34 CDT 2014

On Wed, Apr 30, 2014 at 5:28 PM, Justin Dong <jsd1 at rice.edu> wrote:

> Problem solved. It as user error on my part. The parallel solve was
> working correctly but when I was computing the functional errors, I needed
> to extract an array from the solution vector. Not all of the processes had
> finished assembling yet, so I think that caused some problems with the
> array.
>
> I'm noticing though that superlu_dist is taking longer than just using
> PCLU in sequential. Using the time function in Mac terminal:
>
> 34.59 real         8.12 user         7.76 sys
>
> 34.59 real         8.74 user         7.87 sys
>
> 34.60 real         8.06 user         7.80 sys
>
> 34.59 real         8.84 user         7.77 sys
>
>
> In sequential:
>
> 17.22 real        16.79 user         0.23 sys
>
>
> Is this at all expected? My code is around 2x faster in parallel (on a
> dual core machine). I tried -pc_type redundant -redundant_pc_type lu but
> that didn't speed up the parallel case.
>
Its definitely possible, depends on your machine:
http://www.mcs.anl.gov/petsc/documentation/faq.html#computers

Your assembly is mostly compute-bound so it scales nicely.

   Matt

>
> On Wed, Apr 30, 2014 at 1:19 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>>   Please send the same thing on one process.
>>
>>
>> On Apr 30, 2014, at 8:17 AM, Justin Dong <jsd1 at rice.edu> wrote:
>>
>> > Here are the results of one example where the solution is incorrect:
>> >
>> >   0 KSP unpreconditioned resid norm 7.267616711036e-05 true resid norm
>> 7.267616711036e-05 ||r(i)||/||b|| 1.000000000000e+00
>> >
>> >   1 KSP unpreconditioned resid norm 4.081398605668e-16 true resid norm
>> 4.017029301117e-16 ||r(i)||/||b|| 5.527299334618e-12
>> >
>> >
>> >   2 KSP unpreconditioned resid norm 4.378737248697e-21 true resid norm
>> 4.545226736905e-16 ||r(i)||/||b|| 6.254081520291e-12
>> >
>> > KSP Object: 4 MPI processes
>> >
>> >   type: gmres
>> >
>> >     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>> >
>> >     GMRES: happy breakdown tolerance 1e-30
>> >
>> >   maximum iterations=10000, initial guess is zero
>> >
>> >   tolerances:  relative=1e-13, absolute=1e-50, divergence=10000
>> >
>> >   right preconditioning
>> >
>> >   using UNPRECONDITIONED norm type for convergence test
>> >
>> > PC Object: 4 MPI processes
>> >
>> >   type: lu
>> >
>> >     LU: out-of-place factorization
>> >
>> >     tolerance for zero pivot 2.22045e-14
>> >
>> >     matrix ordering: natural
>> >
>> >     factor fill ratio given 0, needed 0
>> >
>> >       Factored matrix follows:
>> >
>> >         Matrix Object:         4 MPI processes
>> >
>> >           type: mpiaij
>> >
>> >           rows=1536, cols=1536
>> >
>> >           package used to perform factorization: superlu_dist
>> >
>> >           total: nonzeros=0, allocated nonzeros=0
>> >
>> >           total number of mallocs used during MatSetValues calls =0
>> >
>> >             SuperLU_DIST run parameters:
>> >
>> >               Process grid nprow 2 x npcol 2
>> >
>> >               Equilibrate matrix TRUE
>> >
>> >               Matrix input mode 1
>> >
>> >               Replace tiny pivots TRUE
>> >
>> >               Use iterative refinement FALSE
>> >
>> >               Processors in row 2 col partition 2
>> >
>> >               Row permutation LargeDiag
>> >
>> >               Column permutation METIS_AT_PLUS_A
>> >
>> >               Parallel symbolic factorization FALSE
>> >
>> >               Repeated factorization SamePattern_SameRowPerm
>> >
>> >   linear system matrix = precond matrix:
>> >
>> >   Matrix Object:   4 MPI processes
>> >
>> >     type: mpiaij
>> >
>> >     rows=1536, cols=1536
>> >
>> >     total: nonzeros=17856, allocated nonzeros=64512
>> >
>> >     total number of mallocs used during MatSetValues calls =0
>> >
>> >       using I-node (on process 0) routines: found 128 nodes, limit used
>> is 5
>> >
>> >
>> >
>> > On Wed, Apr 30, 2014 at 7:57 AM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> >
>> > On Apr 30, 2014, at 6:46 AM, Matthew Knepley <knepley at gmail.com> wrote:
>> >
>> > > On Wed, Apr 30, 2014 at 6:19 AM, Justin Dong <jsd1 at rice.edu> wrote:
>> > > Thanks. If I turn on the Krylov solver, the issue still seems to
>> persist though.
>> > >
>> > > mpiexec -n 4 -ksp_type gmres -ksp_rtol 1.0e-13 -pc_type lu
>> -pc_factor_mat_solver_package superlu_dist
>> > >
>> > > I'm testing on a very small system now (24 ndofs), but if I go larger
>> (around 20000 ndofs) then it gets worse.
>> > >
>> > > For the small system, I exported the matrices to matlab to make sure
>> they were being assembled correct in parallel, and I'm certain that that
>> they are.
>> > >
>> > > For convergence questions, always run using -ksp_monitor -ksp_view so
>> that we can see exactly what you run.
>> >
>> >   Also run with -ksp_pc_side right
>> >
>> >
>> > >
>> > >   Thanks,
>> > >
>> > >      Matt
>> > >
>> > >
>> > > On Wed, Apr 30, 2014 at 5:32 AM, Matthew Knepley <knepley at gmail.com>
>> wrote:
>> > > On Wed, Apr 30, 2014 at 3:02 AM, Justin Dong <jsd1 at rice.edu> wrote:
>> > > I actually was able to solve my own problem...for some reason, I need
>> to do
>> > >
>> > > PCSetType(pc, PCLU);
>> > > PCFactorSetMatSolverPackage(pc, MATSOLVERSUPERLU_DIST);
>> > > KSPSetTolerances(ksp, 1.e-15, PETSC_DEFAULT, PETSC_DEFAULT,
>> PETSC_DEFAULT);
>> > >
>> > > 1) Before you do SetType(PCLU) the preconditioner has no type, so
>> FactorSetMatSolverPackage() has no effect
>> > >
>> > > 2) There is a larger issue here. Never ever ever ever code in this
>> way. Hardcoding a solver is crazy. The solver you
>> > >      use should depend on the equation, discretization, flow regime,
>> and architecture. Recompiling for all those is
>> > >      out of the question. You should just use
>> > >
>> > >     KSPCreate()
>> > >     KSPSetOperators()
>> > >     KSPSetFromOptions()
>> > >     KSPSolve()
>> > >
>> > > and then
>> > >
>> > >    -pc_type lu -pc_factor_mat_solver_package superlu_dist
>> > >
>> > >
>> > > instead of the ordering I initially had, though I'm not really clear
>> on what the issue was. However, there seems to be some loss of accuracy as
>> I increase the number of processes. Is this expected, or can I force a
>> lower tolerance somehow? I am able to compare the solutions to a reference
>> solution, and the error increases as I increase the processes. This is the
>> solution in sequential:
>> > >
>> > > Yes, this is unavoidable. However, just turn on the Krylov solver
>> > >
>> > >   -ksp_type gmres -ksp_rtol 1.0e-10
>> > >
>> > > and you can get whatever residual tolerance you want. To get a
>> specific error, you would need
>> > > a posteriori error estimation, which you could include in a custom
>> convergence criterion.
>> > >
>> > >   Thanks,
>> > >
>> > >      Matt
>> > >
>> > > superlu_1process = [
>> > > -6.8035811950925553e-06
>> > > 1.6324030474375778e-04
>> > > 5.4145340579614926e-02
>> > > 1.6640521936281516e-04
>> > > -1.7669374392923965e-04
>> > > -2.8099208957838207e-04
>> > > 5.3958133511222223e-02
>> > > -5.4077899123806263e-02
>> > > -5.3972905090366369e-02
>> > > -1.9485020474821160e-04
>> > > 5.4239813043824400e-02
>> > > 4.4883984259948430e-04];
>> > >
>> > > superlu_2process = [
>> > > -6.8035811950509821e-06
>> > > 1.6324030474371623e-04
>> > > 5.4145340579605655e-02
>> > > 1.6640521936281687e-04
>> > > -1.7669374392923807e-04
>> > > -2.8099208957839834e-04
>> > > 5.3958133511212911e-02
>> > > -5.4077899123796964e-02
>> > > -5.3972905090357078e-02
>> > > -1.9485020474824480e-04
>> > > 5.4239813043815172e-02
>> > > 4.4883984259953320e-04];
>> > >
>> > > superlu_4process= [
>> > > -6.8035811952565206e-06
>> > > 1.6324030474386164e-04
>> > > 5.4145340579691455e-02
>> > > 1.6640521936278326e-04
>> > > -1.7669374392921441e-04
>> > > -2.8099208957829171e-04
>> > > 5.3958133511299078e-02
>> > > -5.4077899123883062e-02
>> > > -5.3972905090443085e-02
>> > > -1.9485020474806352e-04
>> > > 5.4239813043900860e-02
>> > > 4.4883984259921287e-04];
>> > >
>> > > This is some finite element solution and I can compute the error of
>> the solution against an exact solution in the functional L2 norm:
>> > >
>> > > error with 1 process:    1.71340e-02 (accepted value)
>> > > error with 2 processes: 2.65018e-02
>> > > error with 3 processes: 3.00164e-02
>> > > error with 4 processes: 3.14544e-02
>> > >
>> > >
>> > > Is there a way to remedy this?
>> > >
>> > >
>> > > On Wed, Apr 30, 2014 at 2:37 AM, Justin Dong <jsd1 at rice.edu> wrote:
>> > > Hi,
>> > >
>> > > I'm trying to solve a linear system in parallel using SuperLU but for
>> some reason, it's not giving me the correct solution. I'm testing on a
>> small example so I can compare the sequential and parallel cases manually.
>> I'm absolutely sure that my routine for generating the matrix and
>> right-hand side in parallel is working correctly.
>> > >
>> > > Running with 1 process and PCLU gives the correct solution. Running
>> with 2 processes and using SUPERLU_DIST does not give the correct solution
>> (I tried with 1 process too but according to the superlu documentation, I
>> would need SUPERLU for sequential?). This is the code for solving the
>> system:
>> > >
>> > >         /* solve the system */
>> > >       KSPCreate(PETSC_COMM_WORLD, &ksp);
>> > >       KSPSetOperators(ksp, Aglobal, Aglobal,
>> DIFFERENT_NONZERO_PATTERN);
>> > >       KSPSetType(ksp,KSPPREONLY);
>> > >
>> > >       KSPGetPC(ksp, &pc);
>> > >
>> > >       KSPSetTolerances(ksp, 1.e-13, PETSC_DEFAULT, PETSC_DEFAULT,
>> PETSC_DEFAULT);
>> > >       PCFactorSetMatSolverPackage(pc, MATSOLVERSUPERLU_DIST);
>> > >
>> > >       KSPSolve(ksp, bglobal, bglobal);
>> > >
>> > > Sincerely,
>> > > Justin
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > > -- Norbert Wiener
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> > > -- Norbert Wiener
>> >
>> >
>>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140430/0f52064b/attachment.html>