[petsc-users] Regarding help with MatGetSubMatrix parallel use

Mon Sep 28 10:05:07 CDT 2020

Thank you, the issue was that I did not provide the ordering as in the DMDA.

Regards,
Gautham
________________________________
From: Matthew Knepley <knepley at gmail.com>
Sent: Monday, September 28, 2020 8:43 AM
To: Krishnan, Gautham <gautham3 at illinois.edu>
Cc: petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Regarding help with MatGetSubMatrix parallel use

On Sat, Sep 26, 2020 at 4:13 PM Krishnan, Gautham <gautham3 at illinois.edu<mailto:gautham3 at illinois.edu>> wrote:
In parallel, I have passed the global indices that I want on the PE. However, the result I obtain seems to change based on whether I divide the initial domain between PEs along the x axis or the y axis. This is what I am trying to resolve- for example:

You mean that you have a DMDA, and you change the pattern of division across processes? This will change the global numbers of given vertices. Remember that these are not
"natural numbers", meaning a lexicographic ordering of vertices, but rather "global numbers", meaning the PETSc ordering of unknowns which is contiguous across processes. There
is a discussion of this in the manual.

  Thanks,

    Matt

the FORTRAN code I use is basically:
PetscInt          ::      idx(ngx*ngy*ngz)
Mat               ::      D2_t, lapl, , lapl_t
IS                ::      irow, irow1, icol1

call DMCreateMatrix(grid_pressure, lapl, ierr)
call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr)
call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr)
call DMCreateMatrix(grid_pressure, lapl_t, ierr)
call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr)
call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr)
.
.
.    !(Assembling matrix lapl)

call VecCreate(petsc_xyzcom, vec_b, ierr)
call VecSetSizes(vec_b, PETSC_DECIDE, ngx*ngy*ngz-1, ierr)
call VecSetFromOptions(vec_b, ierr)
call VecSetUp(vec_b, ierr)
call VecGetLocalSize(vec_b,vecsx,ierr)
call VecGetOwnershipRange(vec_b,veclo,vechi,ierr)
call ISCreateStride(petsc_xyzcom, vecsx,veclo,1,icol1,ierr)
idx = (/ (j, j=veclo,vechi-1)/)
call ISCreateGeneral(petsc_xyzcom, vecsx,idx, PETSC_COPY_VALUES,irow,ierr)
call MatTranspose(lapl, MAT_INPLACE_MATRIX, lapl_t,ierr)   !transpose the laplacian
call MatGetSubMatrix(lapl_t, irow, icol1, MAT_INITIAL_MATRIX, D2_t,ierr)
call MatView(lapl_t, PETSC_VIEWER_STDOUT_WORLD, ierr)
call MatView(D2_t, PETSC_VIEWER_STDOUT_WORLD, ierr)

Output:

ngx=ngy=4; ngz=1; such that n=4*4*1=16

lapl_t:
row 0: (0, -3.)
row 1: (0, 3.)  (1, -3.)  (5, -9.)
row 2: (2, -3.)  (3, -3.)  (6, -9.)
row 3: (3, 3.)
row 4: (4, -3.)  (5, -9.)
row 5: (1, 3.)  (4, 3.)  (5, 36.)  (6, -9.)  (9, -9.)
row 6: (2, 3.)  (5, -9.)  (6, 36.)  (7, -3.)  (10, -9.)
row 7: (6, -9.)  (7, 3.)
row 8: (8, -3.)  (9, -9.)
row 9: (5, -9.)  (8, 3.)  (9, 36.)  (10, -9.)  (13, -3.)
row 10: (6, -9.)  (9, -9.)  (10, 36.)  (11, -3.)  (14, -3.)
row 11: (10, -9.)  (11, 3.)
row 12: (12, -3.)
row 13: (9, -9.)  (12, 3.)  (13, 3.)
row 14: (10, -9.)  (14, 3.)  (15, -3.)
row 15: (15, 3.)

Case 1:
nprocx =1; nprocy=2;   ! number of processors in x and y among which to divide the domain

Here, the (n-1 x n-1) submatrix is extracted correctly as
D2_t:
row 0: (0, -3.)
row 1: (0, 3.)  (1, -3.)  (5, -9.)
row 2: (2, -3.)  (3, -3.)  (6, -9.)
row 3: (3, 3.)
row 4: (4, -3.)  (5, -9.)
row 5: (1, 3.)  (4, 3.)  (5, 36.)  (6, -9.)  (9, -9.)
row 6: (2, 3.)  (5, -9.)  (6, 36.)  (7, -3.)  (10, -9.)
row 7: (6, -9.)  (7, 3.)
row 8: (8, -3.)  (9, -9.)
row 9: (5, -9.)  (8, 3.)  (9, 36.)  (10, -9.)  (13, -3.)
row 10: (6, -9.)  (9, -9.)  (10, 36.)  (11, -3.)  (14, -3.)
row 11: (10, -9.)  (11, 3.)
row 12: (12, -3.)
row 13: (9, -9.)  (12, 3.)  (13, 3.)
row 14: (10, -9.)  (14, 3.)

However, for
Case 2:
nprocx =2; nprocy=1;

lapl_t is correctly assembled and transposed but the (n-1 x n-1) submatrix is extracted incorrectly as
D2_t:
row 0: (0, -3.)
row 1: (0, 3.)  (1, -3.)  (3, -9.)
row 2: (2, -3.)  (3, -9.)
row 3: (1, 3.)  (2, 3.)  (3, 36.)  (5, -9.)  (10, -9.)
row 4: (4, -3.)  (5, -9.)
row 5: (3, -9.)  (4, 3.)  (5, 36.)  (7, -3.)  (12, -9.)
row 6: (6, -3.)
row 7: (5, -9.)  (6, 3.)  (7, 3.)
row 8: (8, -3.)  (9, -3.)  (10, -9.)
row 9: (9, 3.)
row 10: (3, -9.)  (8, 3.)  (10, 36.)  (11, -3.)  (12, -9.)
row 11: (10, -9.)  (11, 3.)
row 12: (5, -9.)  (10, -9.)  (12, 36.)  (13, -3.)  (14, -3.)
row 13: (12, -9.)  (13, 3.)
row 14: (12, -9.)  (14, 3.)

I am unable to understand why the extracted submatrix is incorrect when nprocx>1 but works when nprocx=1 and nprocy>=1.

P.S. the parallel IS in cases 1 and 2 are the same and are as follows:
irow:
[0] Number of indices in set 8
[0] 0 0
[0] 1 1
[0] 2 2
[0] 3 3
[0] 4 4
[0] 5 5
[0] 6 6
[0] 7 7
[1] Number of indices in set 7
[1] 0 8
[1] 1 9
[1] 2 10
[1] 3 11
[1] 4 12
[1] 5 13
[1] 6 14

icol1:
[0] Index set is permutation
[0] Number of indices in (stride) set 8
[0] 0 0
[0] 1 1
[0] 2 2
[0] 3 3
[0] 4 4
[0] 5 5
[0] 6 6
[0] 7 7
[1] Number of indices in (stride) set 7
[1] 0 8
[1] 1 9
[1] 2 10
[1] 3 11
[1] 4 12
[1] 5 13
[1] 6 14

Could you please help me find out what is going wrong here?

Regards,
Gautham

________________________________
From: Krishnan, Gautham <gautham3 at illinois.edu<mailto:gautham3 at illinois.edu>>
Sent: Saturday, September 26, 2020 2:01 PM
To: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Regarding help with MatGetSubMatrix parallel use

In parallel I have passed the global indices that I want on the PE. However, the result I obtain seems to change based on whether I divide the initial domain between PEs along the x axis or the y axis. This is what I am trying to resolve- for example:

the FORTRAN code I use is basically:
PetscInt          ::      idx(ngx*ngy*ngz)
Mat               ::      D2_t, lapl, , lapl_t
IS                ::      irow, irow1, icol1

call DMCreateMatrix(grid_pressure, lapl, ierr)
call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr)
call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr)
call DMCreateMatrix(grid_pressure, lapl_t, ierr)
call MatSetOption(lapl_t, MAT_NEW_NONZERO_LOCATIONS,PETSC_TRUE,ierr)
call MatSetOption(lapl_t, MAT_IGNORE_ZERO_ENTRIES,PETSC_TRUE,ierr)
.
.
.    !(Assembling matrix lapl)

call VecCreate(petsc_xyzcom, vec_b, ierr)
call VecSetSizes(vec_b, PETSC_DECIDE, ngx*ngy*ngz-1, ierr)
call VecSetFromOptions(vec_b, ierr)
call VecSetUp(vec_b, ierr)
call VecGetLocalSize(vec_b,vecsx,ierr)
call VecGetOwnershipRange(vec_b,veclo,vechi,ierr)
call ISCreateStride(petsc_xyzcom, vecsx,veclo,1,icol1,ierr)
idx = (/ (j, j=veclo,vechi-1)/)
call ISCreateGeneral(petsc_xyzcom, vecsx,idx, PETSC_COPY_VALUES,irow,ierr)
call MatTranspose(lapl, MAT_INPLACE_MATRIX, lapl_t,ierr)   !transpose the laplacian
call MatGetSubMatrix(lapl_t, irow, icol1, MAT_INITIAL_MATRIX, D2_t,ierr)
call MatView(lapl_t, PETSC_VIEWER_STDOUT_WORLD, ierr)
call MatView(D2_t, PETSC_VIEWER_STDOUT_WORLD, ierr)

Output:

ngx=ngy=4; ngz=1; such that n=4*4*1=16

lapl_t:
row 0: (0, -3.)
row 1: (0, 3.)  (1, -3.)  (5, -9.)
row 2: (2, -3.)  (3, -3.)  (6, -9.)
row 3: (3, 3.)
row 4: (4, -3.)  (5, -9.)
row 5: (1, 3.)  (4, 3.)  (5, 36.)  (6, -9.)  (9, -9.)
row 6: (2, 3.)  (5, -9.)  (6, 36.)  (7, -3.)  (10, -9.)
row 7: (6, -9.)  (7, 3.)
row 8: (8, -3.)  (9, -9.)
row 9: (5, -9.)  (8, 3.)  (9, 36.)  (10, -9.)  (13, -3.)
row 10: (6, -9.)  (9, -9.)  (10, 36.)  (11, -3.)  (14, -3.)
row 11: (10, -9.)  (11, 3.)
row 12: (12, -3.)
row 13: (9, -9.)  (12, 3.)  (13, 3.)
row 14: (10, -9.)  (14, 3.)  (15, -3.)
row 15: (15, 3.)

Case 1:
nprocx =1; nprocy=2;   ! number of processors in x and y among which to divide the domain

Here, the (n-1 x n-1) submatrix is extracted correctly as
D2_t:
row 0: (0, -3.)
row 1: (0, 3.)  (1, -3.)  (5, -9.)
row 2: (2, -3.)  (3, -3.)  (6, -9.)
row 3: (3, 3.)
row 4: (4, -3.)  (5, -9.)
row 5: (1, 3.)  (4, 3.)  (5, 36.)  (6, -9.)  (9, -9.)
row 6: (2, 3.)  (5, -9.)  (6, 36.)  (7, -3.)  (10, -9.)
row 7: (6, -9.)  (7, 3.)
row 8: (8, -3.)  (9, -9.)
row 9: (5, -9.)  (8, 3.)  (9, 36.)  (10, -9.)  (13, -3.)
row 10: (6, -9.)  (9, -9.)  (10, 36.)  (11, -3.)  (14, -3.)
row 11: (10, -9.)  (11, 3.)
row 12: (12, -3.)
row 13: (9, -9.)  (12, 3.)  (13, 3.)
row 14: (10, -9.)  (14, 3.)

However, for
Case 2:
nprocx =2; nprocy=1;

lapl_t is correctly assembled and transposed but the (n-1 x n-1) submatrix is extracted incorrectly as
D2_t:
row 0: (0, -3.)
row 1: (0, 3.)  (1, -3.)  (3, -9.)
row 2: (2, -3.)  (3, -9.)
row 3: (1, 3.)  (2, 3.)  (3, 36.)  (5, -9.)  (10, -9.)
row 4: (4, -3.)  (5, -9.)
row 5: (3, -9.)  (4, 3.)  (5, 36.)  (7, -3.)  (12, -9.)
row 6: (6, -3.)
row 7: (5, -9.)  (6, 3.)  (7, 3.)
row 8: (8, -3.)  (9, -3.)  (10, -9.)
row 9: (9, 3.)
row 10: (3, -9.)  (8, 3.)  (10, 36.)  (11, -3.)  (12, -9.)
row 11: (10, -9.)  (11, 3.)
row 12: (5, -9.)  (10, -9.)  (12, 36.)  (13, -3.)  (14, -3.)
row 13: (12, -9.)  (13, 3.)
row 14: (12, -9.)  (14, 3.)

I am unable to understand why the extracted submatrix is incorrect when nprocx>1 but works when nprocx=1 and nprocy>=1.

P.S. the parallel IS in cases 1 and 2 are the same and are as follows:
irow:
[0] Number of indices in set 8
[0] 0 0
[0] 1 1
[0] 2 2
[0] 3 3
[0] 4 4
[0] 5 5
[0] 6 6
[0] 7 7
[1] Number of indices in set 7
[1] 0 8
[1] 1 9
[1] 2 10
[1] 3 11
[1] 4 12
[1] 5 13
[1] 6 14

icol1:
[0] Index set is permutation
[0] Number of indices in (stride) set 8
[0] 0 0
[0] 1 1
[0] 2 2
[0] 3 3
[0] 4 4
[0] 5 5
[0] 6 6
[0] 7 7
[1] Number of indices in (stride) set 7
[1] 0 8
[1] 1 9
[1] 2 10
[1] 3 11
[1] 4 12
[1] 5 13
[1] 6 14

Could you please help me find out what is going wrong here?

Regards,
Gautham
________________________________
From: Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Sent: Wednesday, September 23, 2020 3:55 PM
To: Krishnan, Gautham <gautham3 at illinois.edu<mailto:gautham3 at illinois.edu>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: [petsc-users] Regarding help with MatGetSubMatrix parallel use

On Wed, Sep 23, 2020 at 4:12 PM Krishnan, Gautham <gautham3 at illinois.edu<mailto:gautham3 at illinois.edu>> wrote:
Hello,

For a CFD code being developed with FORTRAN and MPI, I am using PETSC matrices and for a particular operation, I require to extract a submatrix(n-1 x n-1) of a matrix created (n x n). However using the petsc MatGetSubMatrix works for serial runs but fails when the domain is split up over PEs- I suspect the indexing changed for parallel runs and hence the global indexing that worked for serial case just shuffles around matrix entries in parallel undesirably. I would like to ask whether anybody could offer some guidance regarding this.  I would like to note that the 2D domain is split along both x and y axes for parallel runs on multiple PEs.

In parallel, you pass MatGetSubmatrix() the global indices that you want on your process.

  Thanks,

     Matt

Regards,
Gautham Krishnan,

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200928/5f0de75a/attachment-0001.html>