[petsc-users] Tuning the parallel performance of a 3D FEM CFD code

Fri May 13 10:16:18 CDT 2011

On May 13, 2011, at 9:34 AM, Jed Brown wrote:

> How small is the Reynolds number? There is a difference between 0.1 and 100, although both may be laminar.
> 
> On Fri, May 13, 2011 at 15:50, Henning Sauerland <uerland at gmail.com> wrote:
> The problem is discretized using FEM (more precisely XFEM) with stabilized, trilinear hexahedral elements. As the XFEM approximation space is time-dependant as well as the physical properties at the nodes the resulting system may change quite significantly between time steps.
> 
> I assume the XFEM basis is just resolving the jump across the interface. Does this mean the size of the matrix is changing as the interface moves? In any case, it looks like you can't amortize setup costs between time steps, so we need a solution short of a direct solve. 
>  
> ILU(2) requires less than half the number of KSP iterations, but it scales similar to ILU(0) and requires about 1/3 more time.

     I think this is a more important point then you and Jed may give it credit for. It is not clear to me that the worse convergence (of the linear solve) with the number of subdomains is the chief issue you should be concerned about. An equally important issue is that the ILU(0) is not a good preconditioner (and while user a more direct solver, LU or ILU(2)) helps the convergence it is too expensive).  Are you using AIJ matrix (and hence it is point ILU?) see below

> 
> Eventually, we're going to need a coarse level to battle the poor scaling with more processes. Hopefully that will alleviate the need for these more expensive local solves.
>  
> I guess you are talking about the nonlinear iterations? I was always referring to the KSP iterations and I thought that the ksp iteration count grows with increasing number of processors is more or less solely related to the iterative solver and preconditioner.
> 
> I meant linear iterations. It is mostly dependent on preconditioner. The increased iteration count (when a direct subdomain solver is used, inexact subdomain solves confuse things) is likely due to the fundamental scaling for elliptic problems. There are many ways to improve constants, but you need a coarse level to fix the asymptotics.
> 
> Unfortunately, multigrid methods for XFEM are a recent topic. Perhaps the best results I have seen (at conferences) use some geometric information in an otherwise algebraic framework. For this problem (unlike many fracture problems), the influence of the extended basis functions may be local enough that you can build a coarse level using the conventional part of the problem. The first thing I would try is probably to see if a direct solve with the conventional part makes an effective coarse level. If that works, I would see if ML or Hypre can do a reasonable job with that part of the problem.
> 
> I have no great confidence that this will work, it's highly dependent on how local the influence of the extended basis functions is. Perhaps you have enough experience with the method to hypothesize.
> 
> Note: for the conventional part of the problem, it is still incompressible flow. It sounds like you are using equal-order elements (Q1-Q1 stabilized; PSPG or Dohrmann&Bochev?). For those elements, you will want to interlace the velocity and pressure degrees of freedom, then use a smoother that respects the block size.

    Is Jed is correct on this? That you have equal order elements hence all the degrees of freedom live at the same points, then you can switch to BAIJ matrix format and ILU becomes automatically point block ILU and it may work much better as Jed indicates. But you can make this change before mucking with coarse grid solves or multilevel methods. I'm guess that the point-block ILU will be a good bit better and since this is easily checked (just create a BAIJ matrix and set the block size appropriately and leave the rest of the code unchanged (well interlace the variables if they are not currently interlaced). 

   Barry

> PETSc's ML and Hypre interfaces forward this information if you set the block size on the matrix. When using ML, you'll probably have to make the smoothers stronger. There are also some "energy minimization" options that may help.
> ibcgs is slightly faster, requiring less number of ksp iterations compared to lgmres. Unfortunately, the iteration count scales very similar to lgmres and generally the lack of robustness of bcgs solvers turns out to problematic for tougher testcases in my experience.
> 
> Yes, that suggestion was only an attempt to improve the constants a little.