[petsc-users] Fieldsplit with sub pc MUMPS in parallel

Fri Jan 6 10:52:11 CST 2017

  Great, you should be now about to remove the extra options I had you add. 

> -fieldsplit_0_ksp_type gmres -fieldsplit_0_ksp_pc_side right -fieldsplit_1_ksp_type gmres -fieldsplit_1_ksp_pc_side right)

> On Jan 6, 2017, at 5:17 AM, Karin&NiKo <niko.karin at gmail.com> wrote:
> 
> Barry,
> 
> you are goddamn right - there was something wrong with the numbering. I fixed it and look what I get. The residuals of outer iterations are exactly the same.
> 
> Thanks again for your insight and perseverance.
> 
> Nicolas
> 
> 2017-01-05 20:17 GMT+01:00 Barry Smith <bsmith at mcs.anl.gov>:
> 
>     This is not good. Something is out of whack.
> 
>      First run 1 and 2 processes with -ksp_view_mat binary -ksp_view_rhs binary in each case this will generate a file called binaryoutput . Send both files to petsc-maint at mcs.anl.gov   I want to confirm that the matrices are the same in both cases.
> 
>     Barry
> 
> > On Jan 5, 2017, at 10:36 AM, Karin&NiKo <niko.karin at gmail.com> wrote:
> >
> > Dave,
> >
> > Indeed the residual histories differ. Concerning the IS's, I have checked them on small cases, so that I am quite sure they are OK.
> > What could I do with PETSc to evaluate the ill-conditioning of the system or of the sub-systems?
> >
> > Thanks again for your help,
> > Nicolas
> >
> > 2017-01-05 15:46 GMT+01:00 Barry Smith <bsmith at mcs.anl.gov>:
> >
> > > On Jan 5, 2017, at 5:58 AM, Dave May <dave.mayhem23 at gmail.com> wrote:
> > >
> > > Do you now see identical residual histories for a job using 1 rank and 4 ranks?
> >
> >    Please send the residual histories with the extra options, I'm curious too, because a Krylov method should not be needed in the inner solve, I just asked for it so we can see what the residuals look like.
> >
> >    Barry
> >
> > >
> > > If not, I am inclined to believe that the IS's you are defining for the splits in the parallel case are incorrect. The operator created to approximate the Schur complement with selfp should not depend on  the number of ranks.
> > >
> > > Or possibly your problem is horribly I'll-conditioned. If it is, then this could result in slightly different residual histories when using different numbers of ranks - even if the operators are in fact identical
> > >
> > >
> > > Thanks,
> > >   Dave
> > >
> > >
> > >
> > >
> > > On Thu, 5 Jan 2017 at 12:14, Karin&NiKo <niko.karin at gmail.com> wrote:
> > > Dear Barry, dear Dave,
> > >
> > > THANK YOU!
> > > You two pointed out the right problem.By using the options you provided (-fieldsplit_0_ksp_type gmres -fieldsplit_0_ksp_pc_side right -fieldsplit_1_ksp_type gmres -fieldsplit_1_ksp_pc_side right), the solver converges in 3 iterations whatever the size of the communicator.
> > > All the trick is in the precise resolution of the Schur complement, by using a Krylov method (and not only preonly) *and* applying the preconditioner on the right (so evaluating the convergence on the unpreconditioned residual).
> > >
> > > @Barry : the difference you see on the nonzero allocations for the different runs is just an artefact : when using more than one proc, we slighly over-estimate the number of non-zero terms. If I run the same problem with the -info option, I get extra information :
> > > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 110 X 110; storage space: 0 unneeded,5048 used
> > > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 271 X 271; storage space: 4249 unneeded,26167 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 307 X 307; storage space: 7988 unneeded,31093 used
> > > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 110 X 244; storage space: 0 unneeded,6194 used
> > > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 271 X 233; storage space: 823 unneeded,9975 used
> > > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 307 X 197; storage space: 823 unneeded,8263 used
> > > And 5048+26167+31093+6194+9975+8263=86740 which is the number of exactly estimated nonzero terms for 1 proc.
> > >
> > >
> > > Thank you again!
> > >
> > > Best regards,
> > > Nicolas
> > >
> > >
> > > 2017-01-05 1:36 GMT+01:00 Barry Smith <bsmith at mcs.anl.gov>:
> > >
> > >
> > >
> > >    There is something wrong with your set up.
> > >
> > >
> > >
> > >
> > >
> > > 1 process
> > >
> > >
> > >
> > >
> > >
> > >            total: nonzeros=140616, allocated nonzeros=140616
> > >
> > >
> > >           total: nonzeros=68940, allocated nonzeros=68940
> > >
> > >
> > >                 total: nonzeros=3584, allocated nonzeros=3584
> > >
> > >
> > >                 total: nonzeros=1000, allocated nonzeros=1000
> > >
> > >
> > >                 total: nonzeros=8400, allocated nonzeros=8400
> > >
> > >
> > >
> > >
> > >
> > > 2 processes
> > >
> > >
> > >                 total: nonzeros=146498, allocated nonzeros=146498
> > >
> > >
> > >           total: nonzeros=73470, allocated nonzeros=73470
> > >
> > >
> > >                 total: nonzeros=3038, allocated nonzeros=3038
> > >
> > >
> > >                 total: nonzeros=1110, allocated nonzeros=1110
> > >
> > >
> > >                 total: nonzeros=6080, allocated nonzeros=6080
> > >
> > >
> > >                         total: nonzeros=146498, allocated nonzeros=146498
> > >
> > >
> > >                   total: nonzeros=73470, allocated nonzeros=73470
> > >
> > >
> > >                 total: nonzeros=6080, allocated nonzeros=6080
> > >
> > >
> > >           total: nonzeros=2846, allocated nonzeros=2846
> > >
> > >
> > >     total: nonzeros=86740, allocated nonzeros=94187
> > >
> > >
> > >
> > >
> > >
> > >   It looks like you are setting up the problem differently in parallel and seq. If it is suppose to be an identical problem then the number nonzeros should be the same in at least the first two matrices.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > > On Jan 4, 2017, at 3:39 PM, Karin&NiKo <niko.karin at gmail.com> wrote:
> > >
> > >
> > > >
> > >
> > >
> > > > Dear Petsc team,
> > >
> > >
> > > >
> > >
> > >
> > > > I am (still) trying to solve Biot's poroelasticity problem :
> > >
> > >
> > > >  <image.png>
> > >
> > >
> > > >
> > >
> > >
> > > > I am using a mixed P2-P1 finite element discretization. The matrix of the discretized system in binary format is attached to this email.
> > >
> > >
> > > >
> > >
> > >
> > > > I am using the fieldsplit framework to solve the linear system. Since I am facing some troubles, I have decided to go back to simple things. Here are the options I am using :
> > >
> > >
> > > >
> > >
> > >
> > > > -ksp_rtol 1.0e-5
> > >
> > >
> > > > -ksp_type fgmres
> > >
> > >
> > > > -pc_type fieldsplit
> > >
> > >
> > > > -pc_fieldsplit_schur_factorization_type full
> > >
> > >
> > > > -pc_fieldsplit_type schur
> > >
> > >
> > > > -pc_fieldsplit_schur_precondition selfp
> > >
> > >
> > > > -fieldsplit_0_pc_type lu
> > >
> > >
> > > > -fieldsplit_0_pc_factor_mat_solver_package mumps
> > >
> > >
> > > > -fieldsplit_0_ksp_type preonly
> > >
> > >
> > > > -fieldsplit_0_ksp_converged_reason
> > >
> > >
> > > > -fieldsplit_1_pc_type lu
> > >
> > >
> > > > -fieldsplit_1_pc_factor_mat_solver_package mumps
> > >
> > >
> > > > -fieldsplit_1_ksp_type preonly
> > >
> > >
> > > > -fieldsplit_1_ksp_converged_reason
> > >
> > >
> > > >
> > >
> > >
> > > > On a single proc, everything runs fine : the solver converges in 3 iterations, according to the theory (see Run-1-proc.txt [contains -log_view]).
> > >
> > >
> > > >
> > >
> > >
> > > > On 2 procs, the solver converges in 28 iterations (see Run-2-proc.txt).
> > >
> > >
> > > >
> > >
> > >
> > > > On 3 procs, the solver converges in 91 iterations (see Run-3-proc.txt).
> > >
> > >
> > > >
> > >
> > >
> > > > I do not understand this behavior : since MUMPS is a parallel direct solver, shouldn't the solver converge in max 3 iterations whatever the number of procs?
> > >
> > >
> > > >
> > >
> > >
> > > >
> > >
> > >
> > > > Thanks for your precious help,
> > >
> > >
> > > > Nicolas
> > >
> > >
> > > >
> > >
> > >
> > > > <Run-1-proc.txt><Run-2-proc.txt><Run-3-proc.txt><1_Warning.txt>
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> > <Run-1-proc.txt><Run-2-proc.txt><Run-3-proc.txt><Run-4-proc.txt>
> 
> 
> <Run-1-proc.txt><Run-2-proc.txt><Run-3-proc.txt><Run-4-proc.txt>