[petsc-users] R: Best workflow for different systems with different block sizes

Thu Feb 2 02:06:40 CST 2017

Barry,

thank you very much.

Hopefully, I will soon be able to come back with some real numbers on some real machine, to compare the two approaches.

Paolo

-----Messaggio originale-----
Da: Barry Smith [mailto:bsmith at mcs.anl.gov] 
Inviato: mercoledì 1 febbraio 2017 17:53
A: Paolo Lampitella <paololampitella at hotmail.com>
Cc: petsc-users at mcs.anl.gov
Oggetto: Re: [petsc-users] Best workflow for different systems with different block sizes

   Paolo,

     If the various matrices keep the same nonzero structure throughout the simulation there is a slight performance gain to reusing them vs creating and destroying them in each outer iteration. Likely not more than say 10% of the run time if the linear solves are relatively difficult (i.e. most of the time is spent in the linear solves) but it could be a bit more maybe 20% if the linear solves are not dominating the time.

    I am not basing this on any quantitive information I have. So it really comes down to if you want to run larger problems that require reusing the memory.

     Regarding creating the right hand side in your memory and copying to PETSc. Creating the right hand side directly in PETSc vectors (with VecGetArrayF90()) will be a little faster and require slightly less memory so is a "good thing to do" but I suspect you will barely be able to measure the difference.

   Barry

> On Feb 1, 2017, at 9:21 AM, Paolo Lampitella <paololampitella at hotmail.com> wrote:
> 
> Dear PETSc users and developers,
>  
> I successfully, and very satisfactorily, use PETSc for the linear 
> algebra part of a coupled, preconditioned, density-based, unstructured, cell-centered, finite volume CFD code. In short, the method is the one presented in the paper:
>  
> J.M. Weiss, J.P. Maruszewski, W.A. Smith: Implicit Solution of 
> Preconditioned Navier-Stokes Equations Using Algebraic Multigrid
> http://dx.doi.org/10.2514/2.689
>  
> except for the AMG part as, at the moment, I use GMRES + point Jacobi trough PETSc (this will probably be the subject for another post).
> However, I also have a very simple, internally coded, symmetric block Gauss-Seidel solver.
>  
> The code, written in Fortran with MPI, manages all the aspects of the 
> solver, including outer iterations, with PETSc just handling the 
> resolution of the linear systems at each outer iteration. In particular, note that, for certain combination of models, the solver can end up having different systems of equations to solve, in sequence, at each outer iteration. For example, I might have:
>  
> -          Main equations (with block size = 5 + n species)
> -          Turbulence equations (with block size = number of equations implied by the turbulence model)
> -          Additional Transported Scalar Equations (with block size = number of required scalars)
> -          Etc.
>  
> The way I manage the workflow with the internal GS solver is such that, for each block of equations, at each outer iteration, I do the following:
>  
> -          allocate matrix, solution and rhs
> -          fill the matrix and rhs
> -          solve the system
> -          update the independent variables (system solution is in delta form)
> -          deallocate matrix, solution and rhs
>  
> so that the allocated memory is kept as low as possible at any given 
> time. However, for legacy reasons now obsolete, the PETSc workflow 
> used in the code is different as all the required matrices and rhs are instead created in advance with the routine petsc_create.f90 in attachment. Then iterations start, and at each iteration, each system is solved with the routine petsc_solve.f90, also in attachment (both are included in a dedicated petsc module).
> At the end of the iterations, before the finalization, a petsc_destroy subroutine (not attached) is obviously also called for each matrix/rhs allocated.
> So, in conclusion, I keep in memory all the matrices for the whole time (the matrix structure, of course, doesn’t change with the iterations).
> My first question then is: 
>  
> 1) Is this approach recommended? Wouldn’t, instead, calling my petsc_create inside my petsc_solve be a better option in my case?
> In this case I could avoid storing any petsc matrix or rhs outside the petsc_solve routine.
> Would the overhead implied by the routines called in my petsc_create be sustainable if that subroutine is called at every outer iteration for every system?
>  
> Also, note that the way I fill the RHS and the matrix of the systems 
> for PETSc are different. For the RHS I always allocate mine in the 
> code, which is then copied in the petsc one in the petsc_solve routine. For the matrix, instead, I directly fill the petsc one outside the subroutine, which is then passed already filled to petsc_solve. So, the second question is:
>  
> 2) Independently from the approach at question (1), which method do 
> you suggest to adopt? Storing directly the rhs and the matrix in the petsc ones Or copying them at the solve time? Is there a mainstream approach in such cases?
>  
> I don’t know if this has relevance but, consider that, in my case, every process always writes only its own matrix and rhs entries.
>  
> Unfortunately, at the moment, I have no access to a system where a 
> straightforward test would give a clear answer to this. Still, I guess, the matter is more conceptual than practical.
>  
> Thank you and sorry for the long mail
> <petsc_solve.f90><petsc_create.f90>