[petsc-users] Create a nest not aligned by processors

Mon Mar 20 08:58:59 CDT 2023

On Mon, Mar 20, 2023 at 9:35 AM Berger Clement <clement.berger at ens-lyon.fr>
wrote:

> 1) Yes in my program I use PETSC_DETERMINE, but I don't see what is the
> issue there. From what I understand, it just lets PETSc set the total size
> from the local sizes provided, am I mistaken ?
>
> 2) I attached a small script, when I run it with 1 proc the output vector
> is not the same as if I run it with 2 procs, I don't know what I should do
> to make them match.
>
> PS : I precise that I am not trying to point out a bug here, I realize
> that my usage is wrong somehow, I just can't determine why, sorry if I gave
> you the wrong impression !
>
> I think I can now explain this clearly. Thank you for the nice simple
example. I attach my slightly changed version (I think better in C). Here
is running on one process:

master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 1
./nestTest -left_view -right_view -nest_view -full_view
Mat Object: 1 MPI process
  type: nest
  Matrix object:
    type=nest, rows=2, cols=2
    MatNest structure:
    (0,0) : type=constantdiagonal, rows=4, cols=4
    (0,1) : NULL
    (1,0) : NULL
    (1,1) : type=constantdiagonal, rows=4, cols=4
Mat Object: 1 MPI process
  type: seqaij
row 0: (0, 2.)
row 1: (1, 2.)
row 2: (2, 2.)
row 3: (3, 2.)
row 4: (4, 1.)
row 5: (5, 1.)
row 6: (6, 1.)
row 7: (7, 1.)
Vec Object: 1 MPI process
  type: seq
0.
1.
2.
3.
4.
5.
6.
7.
Vec Object: 1 MPI process
  type: seq
0.
2.
4.
6.
4.
5.
6.
7.

This looks like what you expect. Doubling the first four rows and
reproducing the last four. Now let's run on two processes:

master *:~/Downloads/tmp/Berger$ /PETSc3/petsc/apple/bin/mpiexec -n 2
./nestTest -left_view -right_view -nest_view -full_view
Mat Object: 2 MPI processes
  type: nest
  Matrix object:
    type=nest, rows=2, cols=2
    MatNest structure:
    (0,0) : type=constantdiagonal, rows=4, cols=4
    (0,1) : NULL
    (1,0) : NULL
    (1,1) : type=constantdiagonal, rows=4, cols=4
Mat Object: 2 MPI processes
  type: mpiaij
row 0: (0, 2.)
row 1: (1, 2.)
row 2: (2, 1.)
row 3: (3, 1.)
row 4: (4, 2.)
row 5: (5, 2.)
row 6: (6, 1.)
row 7: (7, 1.)
Vec Object: 2 MPI processes
  type: mpi
Process [0]
0.
1.
2.
3.
Process [1]
4.
5.
6.
7.
Vec Object: 2 MPI processes
  type: mpi
Process [0]
0.
2.
2.
3.
Process [1]
8.
10.
6.
7.

Let me describe what has changed. The matrices A and B are parallel, so
each has two rows on process 0 and two rows on process 1. In the MatNest
they are interleaved because we asked for contiguous numbering (by giving
NULL for the IS of global row numbers). If we want to reproduce the same
output, we would need to produce our input vector with the same interleaved
numbering.

  Thanks,

     Matt

> Thank you,
> ---
> Clément BERGER
> ENS de Lyon
>
>
> Le 2023-03-20 12:53, Matthew Knepley a écrit :
>
> On Mon, Mar 20, 2023 at 6:18 AM Berger Clement <clement.berger at ens-lyon.fr>
> wrote:
>
>> I simplified the problem with the initial test I talked about because I
>> thought I identified the issue, so I will walk you through my whole problem
>> :
>>
>> - first the solve doesn't produce the same results as mentioned
>>
>> - I noticed that the duration of the factorization step of the matrix was
>> also not consistent with the number of processors used (it is longer with 3
>> processes than with 1), I didn't think much of it but I now realize that
>> for instance with 4 processes, MUMPS crashes when factorizing
>>
>> - I thought my matrices were wrong, but it's hard for me to use MatView
>> to compare them with 1 or 2 proc because I work with a quite specific
>> geometry, so in order not to fall into some weird particular case I need to
>> use at least roughly 100 points, so looking at 100x100 matrices is not
>> really nice...Instead I tried to multiply them by a vector full of one
>> (after I used the vector v such that v(i)=i). I tried it on two matrices,
>> and the results didn't depend on the number of procs, but when I tried to
>> multiply against the nest of these two matrices (a 2x2 block diagonal
>> nest), the result changed depending on the number of processors used
>>
>> - that's why I tried the toy problem I wrote to you in the first place
>>
>> I hope it's clearer now.
>>
>
> Unfortunately, it is not clear to me. There is nothing attached to this
> email. I will try to describe things from my end.
>
> 1) There are lots of tests. Internally, Nest does not depend on the number
> of processes unless you make it so. This leads
>      me to believe that your construction of the matrix changes with the
> number of processes. For example, using PETSC_DETERMINE
>      for sizes will do this.
>
> 2) In order to understand what you want to achieve, we need to have
> something running in two cases, one with "correct" output and one
>     with something different. It sounds like you have such a small
> example, but I have missed it.
>
> Can you attach this example? Then I can run it, look at the matrices, and
> see what is different.
>
>   Thanks,
>
>       Matt
>
>
>> Thank you
>> ---
>> Clément BERGER
>> ENS de Lyon
>>
>>
>> Le 2023-03-17 21:57, Barry Smith a écrit :
>>
>>
>>   Yes, you would benefit from a VecConvert() to produce a standard
>> vector. But you should be able to use VecGetArray() on the nest array and
>> on the standard array and copy the values between the arrays any way you
>> like. You don't need to do any reordering when you copy. Is that not
>> working and what are the symptoms (more than just the answers to the linear
>> solve are different)? Again you can run on one and two MPI processes with a
>> tiny problem to see if things are not in the correct order in the vectors
>> and matrices.
>>
>>   Barry
>>
>>
>> On Mar 17, 2023, at 3:22 PM, Berger Clement <clement.berger at ens-lyon.fr>
>> wrote:
>>
>> To use MUMPS I need to convert my matrix in MATAIJ format (or at least
>> not MATNEST), after that if I use a VECNEST for the left and right hanside,
>> I get an error during the solve procedure, it is removed if I copy my data
>> in a vector with standard format, I couldn't find any other way
>> ---
>> Clément BERGER
>> ENS de Lyon
>>
>>
>> Le 2023-03-17 19:53, Matthew Knepley a écrit :
>>
>> On Fri, Mar 17, 2023 at 2:53 PM Berger Clement <
>> clement.berger at ens-lyon.fr> wrote:
>>
>>> But this is to properly fill up the VecNest am I right ? Because this
>>> one is correct, but I can't directly use it in the KSPSolve, I need to copy
>>> it into a standard vector
>>>
>>>
>> I do not understand what you mean here. You can definitely use a
>> VecNest in a KSP.
>>
>>   Thanks,
>>
>>     Matt
>>
>>
>>
>>> ---
>>> Clément BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-17 19:39, Barry Smith a écrit :
>>>
>>>
>>>   I think the intention is that you use VecNestGetSubVecs()
>>> or VecNestGetSubVec() and fill up the sub-vectors in the same style as the
>>> matrices; this decreases the change of a reordering mistake in trying to do
>>> it by hand in your code.
>>>
>>>
>>>
>>> On Mar 17, 2023, at 2:35 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>> wrote:
>>>
>>> That might be it, I didn't find the equivalent of MatConvert for the
>>> vectors, so when I need to solve my linear system, with my righthandside
>>> properly computed in nest format, I create a new vector using VecDuplicate,
>>> and then I copy into it my data using VecGetArrayF90 and copiing each
>>> element by hand. Does it create an incorrect ordering ? If so how can I get
>>> the correct one ?
>>> ---
>>> Clément BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-17 19:27, Barry Smith a écrit :
>>>
>>>
>>>   I would run your code with small sizes on 1, 2, 3 MPI ranks and use
>>> MatView() to examine the matrices. They will definitely be ordered
>>> differently but should otherwise be the same. My guess is that the right
>>> hand side may not have the correct ordering with respect to the matrix
>>> ordering in parallel. Note also that when the right hand side does have the
>>> correct ordering the solution will have a different ordering for each
>>> different number of MPI ranks when printed (but changing the ordering
>>> should give the same results up to machine precision.
>>>
>>>   Barry
>>>
>>>
>>> On Mar 17, 2023, at 2:23 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>> wrote:
>>>
>>> My issue is that it seems to improperly with some step of my process,
>>> the solve step doesn't provide the same result depending on the number of
>>> processors I use. I manually tried to multiply one the matrices I defined
>>> as a nest against a vector, and the result is not the same with e.g. 1 and
>>> 3 processors. That's why I tried the toy program I wrote in the first
>>> place, which highlights the misplacement of elements.
>>> ---
>>> Clément BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-17 19:14, Barry Smith a écrit :
>>>
>>>
>>>    This sounds  like a fine use of MATNEST. Now back to the original
>>> question
>>>
>>>
>>> I want to construct a matrix by blocs, each block having different sizes
>>> and partially stored by multiple processors. If I am not mistaken, the
>>> right way to do so is by using the MATNEST type. However, the following code
>>>
>>> Call
>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call
>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call
>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>>
>>> does not generate the same matrix depending on the number of processors.
>>> It seems that it starts by everything owned by the first proc for A and B,
>>> then goes on to the second proc and so on (I hope I am being clear).
>>>
>>> Is it possible to change that ?
>>>
>>>   If I understand correctly it is behaving as expected. It is the same
>>> matrix on 1 and 2 MPI processes, the only difference is the ordering of the
>>> rows and columns.
>>>
>>>   Both matrix blocks are split among the two MPI processes. This is how
>>> MATNEST works and likely what you want in practice.
>>>
>>> On Mar 17, 2023, at 1:19 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>> wrote:
>>>
>>> I have a matrix with four different blocks (2rows - 2columns). The block
>>> sizes differ from one another, because they correspond to a different
>>> physical variable. One of the block has the particularity that it has to be
>>> updated at each iteration. This update is performed by replacing it with a
>>> product of multiple matrices that depend on the result of the previous
>>> iteration. Note that these intermediate matrices are not square (because
>>> they also correspond to other types of variables), and that they must be
>>> completely refilled by hand (i.e. they are not the result of some simple
>>> linear operations). Finally, I use this final block matrix to solve
>>> multiple linear systems (with different righthand sides), so for now I use
>>> MUMPS as only the first solve takes time (but I might change it).
>>>
>>> Considering this setting, I created each type of variable separately,
>>> filled the different matrices, and created different nests of vectors /
>>> matrices for my operations. When the time comes to use KSPSolve, I use
>>> MatConvert on my matrix to get a MATAIJ compatible with MUMPS, I also copy
>>> the few vector data I need from my nests in a regular Vector, I solve, I
>>> get back my data in my nest and carry on with the operations needed for my
>>> updates.
>>>
>>> Is that clear ? I don't know if I provided too many or not enough
>>> details.
>>>
>>> Thank you
>>> ---
>>> Clément BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-17 17:34, Barry Smith a écrit :
>>>
>>>
>>>    Perhaps if you provide a brief summary of what you would like to do
>>> and we may have ideas on how to achieve it.
>>>
>>>    Barry
>>>
>>> Note: that MATNEST does require that all matrices live on all the MPI
>>> processes within the original communicator. That is if the original
>>> communicator has ranks 0,1, and 2 you cannot have a matrix inside MATNEST
>>> that only lives on ranks 1,2 but you could have it have 0 rows on rank zero
>>> so effectively it lives only on rank 1 and 2 (though its communicator is
>>> all three ranks).
>>>
>>> On Mar 17, 2023, at 12:14 PM, Berger Clement <clement.berger at ens-lyon.fr>
>>> wrote:
>>>
>>> It would be possible in the case I showed you but in mine that would
>>> actually be quite complicated, isn't there any other workaround ? I precise
>>> that I am not entitled to utilizing the MATNEST format, it's just that I
>>> think the other ones wouldn't work.
>>> ---
>>> Clément BERGER
>>> ENS de Lyon
>>>
>>>
>>> Le 2023-03-17 15:48, Barry Smith a écrit :
>>>
>>>
>>>    You may be able to mimic what you want by not using PETSC_DECIDE but
>>> instead computing up front how many rows of each matrix you want stored on
>>> each MPI process. You can use 0 for on certain MPI processes for certain
>>> matrices if you don't want any rows of that particular matrix stored on
>>> that particular MPI process.
>>>
>>>   Barry
>>>
>>>
>>> On Mar 17, 2023, at 10:10 AM, Berger Clement <clement.berger at ens-lyon.fr>
>>> wrote:
>>>
>>> Dear all,
>>>
>>> I want to construct a matrix by blocs, each block having different sizes
>>> and partially stored by multiple processors. If I am not mistaken, the
>>> right way to do so is by using the MATNEST type. However, the following code
>>>
>>> Call
>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,2.0E0_wp,A,ierr)
>>> Call
>>> MatCreateConstantDiagonal(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4,4,1.0E0_wp,B,ierr)
>>> Call
>>> MatCreateNest(PETSC_COMM_WORLD,2,PETSC_NULL_INTEGER,2,PETSC_NULL_INTEGER,(/A,PETSC_NULL_MAT,PETSC_NULL_MAT,B/),C,ierr)
>>>
>>> does not generate the same matrix depending on the number of processors.
>>> It seems that it starts by everything owned by the first proc for A and B,
>>> then goes on to the second proc and so on (I hope I am being clear).
>>>
>>> Is it possible to change that ?
>>>
>>> Note that I am coding in fortran if that has ay consequence.
>>>
>>> Thank you,
>>>
>>> Sincerely,
>>> --
>>> Clément BERGER
>>> ENS de Lyon
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/659e6fd3/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nestTest.c
Type: application/octet-stream
Size: 1352 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230320/659e6fd3/attachment-0001.obj>