[petsc-users] Απ: Question about a parallel implementation of PCFIELDSPLIT

Wed Jan 24 00:29:59 CST 2024

Thanks very much Matt for the detailed explanations.

I was asking about the Schur complement because I have tried a "manual" version of this procedure without the field split. Eventually, it needs the solution of three linear systems, just like A^{-1} u. If you have the LU of A, then everything is perfect. However, if you use iterative solvers, things slow down considerably and I have found that it is faster to just add the constraint in matrix A and solve it with an iterative solver, albeit the larger iteration count that the augmented matrix needs.

I will incorporate the suggestions and I will come back with the results.

Thanks,
Pantelis
________________________________
Από: Matthew Knepley <knepley at gmail.com>
Στάλθηκε: Τρίτη, 23 Ιανουαρίου 2024 6:15 μμ
Προς: Pantelis Moschopoulos <pmoschopoulos at outlook.com>
Κοιν.: Barry Smith <bsmith at petsc.dev>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Θέμα: Re: [petsc-users] Question about a parallel implementation of PCFIELDSPLIT

On Tue, Jan 23, 2024 at 11:06 AM Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>> wrote:
Dear Matt,

Thank you for your explanation. The new methodology is straightforward to implement.
Still, I have one more question . When I use the option -pc_fieldsplit_schur_precondition full, PETSc computes internally the exact Schur complement matrix representation. Based on the example matrix that you send, the Schur complement is: S = -v^t (A^-1) u. How will PETSc will calculate the vector (A^-1) u ? Or it calculates the exact Schur complement matrix differently?

FULL calls MatSchurComplementComputeExplicitOperator(), which calls KSPMatSolve() to compute A^{-1} u, which default to KSPSolve for each column if no specialized code is available. So it should just use your solver for the (0,0) block, and then take the dot product with v.

  Thanks,

     Matt

Thanks,
Pantelis
________________________________
Από: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Στάλθηκε: Τρίτη, 23 Ιανουαρίου 2024 5:21 μμ
Προς: Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>>
Κοιν.: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Θέμα: Re: [petsc-users] Question about a parallel implementation of PCFIELDSPLIT

On Tue, Jan 23, 2024 at 9:45 AM Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>> wrote:
Dear Matt,

I read about the MATLRC. However, its correct usage is not clear to me so I have the following questions:

  1.  The U and V input matrices should be created as dense using MatCreateDense?

Yes. If you have one row, it looks like a vector, or a matrix with one column.

If you have 1 row on the bottom, then

  U = [0, 0, ..., 0, 1]
  V = [the row]
  C = [1]

will give you that. However, you have an extra row and column?

  1.
I use the command MatCreateLRC just to declare the matrix and then MatLRCSetMats to pass the values of the constituents?

You can use

  MatCreate(comm, &M)
  MatSetSizes(M, ...)
  MatSetType(M, MATLRC)
  MatLRCSetMats(M, ...)

However, you are right that it is a little more complicated, because A is not just the upper block here.

  1.

Then, how do I proceed? How I apply the step of Sherman-Morrison-Woobury formula? I intend to use iterative solvers for A (main matrix) so I will not have its A^-1 at hand which I think is what the Sherman-Morrison-Woobury formula needs.

I think I was wrong. MatLRC is not the best fit. We should use MatNest instead. Then you could have

  A   u
  v^t 0

as your matrix. We could still get an explicit Schur complement automatically using nested FieldSplit. So

  1) Make the MATNEST matrix as shown above

  2) Use PCFIELDSPLIT  for it. This should be an easy IS, since there is only one row in the second one.

  3) Select the full Schur complement

    -pc_type fieldsplit
    -pc_fieldsplit_type schur
    -pc_fieldsplit_schur_fact_type full
    -pc_fieldsplit_schur_precondition full

  4) Use a recursive FieldSplit (might be able to use -fieldsplit_0_pc_fieldsplit_detect_saddle_point)

    -fieldsplit_0_pc_type fieldsplit
    -fieldsplit_0_pc_fieldsplit_0_fields 0,1,2
    -fieldsplit_0_pc_fieldsplit_1_fields 3

I think this does what you want, and should be much easier than getting MatLRC to do it.

  Thanks,

     Matt

Thanks,
Pantelis

________________________________
Από: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Στάλθηκε: Τρίτη, 23 Ιανουαρίου 2024 3:20 μμ
Προς: Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>>
Κοιν.: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Θέμα: Re: [petsc-users] Question about a parallel implementation of PCFIELDSPLIT

On Tue, Jan 23, 2024 at 8:16 AM Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>> wrote:
Dear Matt,

Thank you for your response. This is an idealized setup where I have only one row/column. Sometimes we might need two or even three constraints based on the application. Thus, I will pursue the user-defined IS.

Anything < 50 I would use MatLRC. The bottleneck is the inversion of a dense matrix of size k x k, where k is the number of constraints. Using an IS is definitely fine, but dense rows can detract from iterative convergence.

When I supply the IS using the command PCFieldSplitSetIS, I do not specify anything in the matrix set up right?

You should just need to specify the rows for each field as an IS.

  Thanks,

     Matt

Thanks,
Pantelis
________________________________
Από: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Στάλθηκε: Τρίτη, 23 Ιανουαρίου 2024 2:51 μμ
Προς: Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>>
Κοιν.: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Θέμα: Re: [petsc-users] Question about a parallel implementation of PCFIELDSPLIT

On Tue, Jan 23, 2024 at 4:23 AM Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>> wrote:
Dear Matt and Dear Barry,

I have some follow up questions regarding FieldSplit.
Let's assume that I solve again the 3D Stokes flow but now I have also a global constraint that controls the flow rate at the inlet. Now, the matrix has the same unknowns as before, i.e. ux0,uy0,uz0,p0//ux1,uy1,uz1,p1//..., but the last line (and the last column) corresponds to the contribution of the global constraint equation. I want to incorporate the last line (and last column)  into the local block of velocities (split 0) and the pressure. The problem is how I do that. I have two questions:

  1.  Now, the block size should be 5 in the matrix and vector creation for this problem?

No. Blocksize is only useful when the vector/matrix layout is completely regular, meaning _every_ block looks the same. Here you have a single row to be added in.

  1.  I have to rely entirely on PCFieldSplitSetIS to create the two blocks? Can I augment simply the previously defined block 0 with the last line of the matrix?

If you want to add in a single row, then you have to specify the IS yourself since we cannot generate it from the regular pattern.

However, if you know that you will only ever have a single constraint row (which I assume is fairly dense), then I would suggest instead using MatLRC, which Jose developed for SLEPc. This handles the last row/col as a low-rank correction. One step of Sherman-Morrison-Woobury solves this exactly. It requires a solve for A, for which you can use FieldSplit as normal.

  Thanks,

     Matt

Up to this moment, I use the following commands to create the Field split:
ufields(3) = [0, 1, 2]
pfields(1) = [3]

call PCSetType(pc, PCFIELDSPLIT, ierr)
call PCFieldSplitSetBlockSize(pc, 4,ierr)
 call PCFieldSplitSetFields(pc, "0", 3, ufields, ufields,ierr)
 call PCFieldSplitSetFields(pc, "1", 1, pfields, pfields,ierr)

Thanks,
Pantelis

________________________________
Από: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Στάλθηκε: Παρασκευή, 19 Ιανουαρίου 2024 11:31 μμ
Προς: Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>>
Κοιν.: Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Θέμα: Re: [petsc-users] Question about a parallel implementation of PCFIELDSPLIT

On Fri, Jan 19, 2024 at 4:25 PM Barry Smith <bsmith at petsc.dev<mailto:bsmith at petsc.dev>> wrote:

   Generally fieldsplit is used on problems that have a natural "split" of the variables into two or more subsets. For example u0,v0,u1,v1,u2,v2,u3,v4 This is often indicated in the vectors and matrices with the "blocksize" argument, 2 in this case. DM also often provides this information.

   When laying out a vector/matrix with a blocksize one must ensure that an equal number of of the subsets appears on each MPI process. So, for example, if the above vector is distributed over 3 MPI processes one could use   u0,v0,u1,v1       u2,v2      u3,v3  but one cannot use u0,v0,u1    v1,u2,v2   u3,v3.  Another way to think about it is that one must split up the vector as indexed by block among the processes. For most multicomponent problems this type of decomposition is very natural in the logic of the code.

This blocking is only convenient, not necessary. You can specify your own field division using PCFieldSplitSetIS().

  Thanks,

     Matt

  Barry

On Jan 19, 2024, at 3:19 AM, Pantelis Moschopoulos <pmoschopoulos at outlook.com<mailto:pmoschopoulos at outlook.com>> wrote:

Dear all,

When I am using PCFIELDSPLIT and pc type "schur" in serial mode everything works fine. When I turn now to parallel, I observe that the number of ranks that I can use must divide the number of N without any remainder, where N is the number of unknowns. Otherwise, an error of the following form emerges: "Local columns of A10 3473 do not equal local rows of A00 3471".

Can I do something to overcome this?

Thanks,
Pantelis

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20240124/3ac442ca/attachment-0001.html>