[petsc-users] Tuning the parallel performance of a 3D FEM CFD code

Mon May 16 21:54:17 CDT 2011

On Fri, May 13, 2011 at 10:16 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:

>
> On May 13, 2011, at 9:34 AM, Jed Brown wrote:
>
> > How small is the Reynolds number? There is a difference between 0.1 and
> 100, although both may be laminar.
> >
> > On Fri, May 13, 2011 at 15:50, Henning Sauerland <uerland at gmail.com>
> wrote:
> > The problem is discretized using FEM (more precisely XFEM) with
> stabilized, trilinear hexahedral elements. As the XFEM approximation space
> is time-dependant as well as the physical properties at the nodes the
> resulting system may change quite significantly between time steps.
> >
> > I assume the XFEM basis is just resolving the jump across the interface.
> Does this mean the size of the matrix is changing as the interface moves? In
> any case, it looks like you can't amortize setup costs between time steps,
> so we need a solution short of a direct solve.
> >
> > ILU(2) requires less than half the number of KSP iterations, but it
> scales similar to ILU(0) and requires about 1/3 more time.
>
>
>      I think this is a more important point then you and Jed may give it
> credit for. It is not clear to me that the worse convergence (of the linear
> solve) with the number of subdomains is the chief issue you should be
> concerned about. An equally important issue is that the ILU(0) is not a good
> preconditioner (and while user a more direct solver, LU or ILU(2)) helps the
> convergence it is too expensive).  Are you using AIJ matrix (and hence it is
> point ILU?) see below

Anecdotally, I hear that XFEM can produce ill-conditioned matrices, where
the conditioning arises from geometric factors. I don't think you can
a priori rule out differences in conditioning as the process count increases
which come from changing geometry of the interface.

   Matt

> >
> > Eventually, we're going to need a coarse level to battle the poor scaling
> with more processes. Hopefully that will alleviate the need for these more
> expensive local solves.
> >
> > I guess you are talking about the nonlinear iterations? I was always
> referring to the KSP iterations and I thought that the ksp iteration count
> grows with increasing number of processors is more or less solely related to
> the iterative solver and preconditioner.
> >
> > I meant linear iterations. It is mostly dependent on preconditioner. The
> increased iteration count (when a direct subdomain solver is used, inexact
> subdomain solves confuse things) is likely due to the fundamental scaling
> for elliptic problems. There are many ways to improve constants, but you
> need a coarse level to fix the asymptotics.
> >
> > Unfortunately, multigrid methods for XFEM are a recent topic. Perhaps the
> best results I have seen (at conferences) use some geometric information in
> an otherwise algebraic framework. For this problem (unlike many fracture
> problems), the influence of the extended basis functions may be local enough
> that you can build a coarse level using the conventional part of the
> problem. The first thing I would try is probably to see if a direct solve
> with the conventional part makes an effective coarse level. If that works, I
> would see if ML or Hypre can do a reasonable job with that part of the
> problem.
> >
> > I have no great confidence that this will work, it's highly dependent on
> how local the influence of the extended basis functions is. Perhaps you have
> enough experience with the method to hypothesize.
> >
> > Note: for the conventional part of the problem, it is still
> incompressible flow. It sounds like you are using equal-order elements
> (Q1-Q1 stabilized; PSPG or Dohrmann&Bochev?). For those elements, you will
> want to interlace the velocity and pressure degrees of freedom, then use a
> smoother that respects the block size.
>
>     Is Jed is correct on this? That you have equal order elements hence all
> the degrees of freedom live at the same points, then you can switch to BAIJ
> matrix format and ILU becomes automatically point block ILU and it may work
> much better as Jed indicates. But you can make this change before mucking
> with coarse grid solves or multilevel methods. I'm guess that the
> point-block ILU will be a good bit better and since this is easily checked
> (just create a BAIJ matrix and set the block size appropriately and leave
> the rest of the code unchanged (well interlace the variables if they are not
> currently interlaced).
>
>   Barry
>
>
> > PETSc's ML and Hypre interfaces forward this information if you set the
> block size on the matrix. When using ML, you'll probably have to make the
> smoothers stronger. There are also some "energy minimization" options that
> may help.
> > ibcgs is slightly faster, requiring less number of ksp iterations
> compared to lgmres. Unfortunately, the iteration count scales very similar
> to lgmres and generally the lack of robustness of bcgs solvers turns out to
> problematic for tougher testcases in my experience.
> >
> > Yes, that suggestion was only an attempt to improve the constants a
> little.
>
>

-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110516/956faeb1/attachment.htm>