[petsc-users] Scattering a vector to/from a subset of processors

Wed Dec 6 11:12:04 CST 2023

Hi, Sreeram,
  I made an example with your approach.  It worked fine as you see the
output at the end

#include <petscvec.h>
int main(int argc, char **argv)
{
PetscInt i, j, rstart, rend, n, N, *indices;
PetscMPIInt size, rank;
IS ix;
VecScatter vscat;
Vec x, y;

PetscFunctionBeginUser;
PetscCall(PetscInitialize(&argc, &argv, (char *)0, NULL));
PetscCallMPI(MPI_Comm_size(PETSC_COMM_WORLD, &size));
PetscCallMPI(MPI_Comm_rank(PETSC_COMM_WORLD, &rank));

PetscCall(VecCreate(PETSC_COMM_WORLD, &x));
PetscCall(VecSetFromOptions(x));
PetscCall(PetscObjectSetName((PetscObject)x, "Vec X"));
n = (rank < 4) ? 9 : 0;
PetscCall(VecSetSizes(x, n, PETSC_DECIDE));

PetscCall(VecGetOwnershipRange(x, &rstart, &rend));
for (i = rstart; i < rend; i++) PetscCall(VecSetValue(x, i, (PetscScalar)i,
INSERT_VALUES));
PetscCall(VecAssemblyBegin(x));
PetscCall(VecAssemblyEnd(x));
PetscCall(VecGetSize(x, &N));

PetscCall(VecCreate(PETSC_COMM_WORLD, &y));
PetscCall(VecSetFromOptions(y));
PetscCall(PetscObjectSetName((PetscObject)y, "Vec Y"));
PetscCall(VecSetSizes(y, PETSC_DECIDE, N));

PetscCall(VecGetOwnershipRange(y, &rstart, &rend));
PetscCall(PetscMalloc1(rend - rstart, &indices));
for (i = rstart, j = 0; i < rend; i++, j++) indices[j] = rank + size * j;

PetscCall(ISCreateGeneral(PETSC_COMM_WORLD, rend - rstart, indices,
PETSC_OWN_POINTER, &ix));
PetscCall(VecScatterCreate(x, ix, y, NULL, &vscat));

PetscCall(VecScatterBegin(vscat, x, y, INSERT_VALUES, SCATTER_FORWARD));
PetscCall(VecScatterEnd(vscat, x, y, INSERT_VALUES, SCATTER_FORWARD));

PetscCall(ISView(ix, PETSC_VIEWER_STDOUT_WORLD));
PetscCall(VecView(x, PETSC_VIEWER_STDERR_WORLD));
PetscCall(VecView(y, PETSC_VIEWER_STDERR_WORLD));

PetscCall(VecScatterDestroy(&vscat));
PetscCall(ISDestroy(&ix));
PetscCall(VecDestroy(&x));
PetscCall(VecDestroy(&y));
PetscCall(PetscFinalize());
return 0;
}

$ mpirun -n 12 ./ex100
IS Object: 12 MPI processes
  type: general
[0] Number of indices in set 3
[0] 0 0
[0] 1 12
[0] 2 24
[1] Number of indices in set 3
[1] 0 1
[1] 1 13
[1] 2 25
[2] Number of indices in set 3
[2] 0 2
[2] 1 14
[2] 2 26
[3] Number of indices in set 3
[3] 0 3
[3] 1 15
[3] 2 27
[4] Number of indices in set 3
[4] 0 4
[4] 1 16
[4] 2 28
[5] Number of indices in set 3
[5] 0 5
[5] 1 17
[5] 2 29
[6] Number of indices in set 3
[6] 0 6
[6] 1 18
[6] 2 30
[7] Number of indices in set 3
[7] 0 7
[7] 1 19
[7] 2 31
[8] Number of indices in set 3
[8] 0 8
[8] 1 20
[8] 2 32
[9] Number of indices in set 3
[9] 0 9
[9] 1 21
[9] 2 33
[10] Number of indices in set 3
[10] 0 10
[10] 1 22
[10] 2 34
[11] Number of indices in set 3
[11] 0 11
[11] 1 23
[11] 2 35
Vec Object: Vec X 12 MPI processes
  type: mpi
Process [0]
0.
1.
2.
3.
4.
5.
6.
7.
8.
Process [1]
9.
10.
11.
12.
13.
14.
15.
16.
17.
Process [2]
18.
19.
20.
21.
22.
23.
24.
25.
26.
Process [3]
27.
28.
29.
30.
31.
32.
33.
34.
35.
Process [4]
Process [5]
Process [6]
Process [7]
Process [8]
Process [9]
Process [10]
Process [11]
Vec Object: Vec Y 12 MPI processes
  type: mpi
Process [0]
0.
12.
24.
Process [1]
1.
13.
25.
Process [2]
2.
14.
26.
Process [3]
3.
15.
27.
Process [4]
4.
16.
28.
Process [5]
5.
17.
29.
Process [6]
6.
18.
30.
Process [7]
7.
19.
31.
Process [8]
8.
20.
32.
Process [9]
9.
21.
33.
Process [10]
10.
22.
34.
Process [11]
11.
23.
35.

--Junchao Zhang

On Tue, Dec 5, 2023 at 10:09 PM Sreeram R Venkat <srvenkat at utexas.edu>
wrote:

> Yes, I have an example code at github.com/s769/petsc-test. Only thing is,
> when I described the example before, I simplified the actual use case in
> the code to make things simpler. Here are the extra details relevant to
> this code:
>
>    - We assume a 2D processor grid, given by the command-line args
>    -proc_rows and -proc_cols
>    - The total length of the vector is n_m*n_t given by command-line args
>    -nm and -nt. n_m corresponds to a space index and n_t a time index.
>    - In the "Start" phase, the vector is divided into n_m blocks each of
>    size n_t (indexed space->time). The blocks are partitioned over the first
>    row of processors. For example if -nm = 4 and -proc_cols = 4, each
>    processor in the first row will get one block of size n_t. Each processor
>    in the first row gets n_m_local blocks of size n_t, where the sum of all
>    n_m_locals for the first row of processors is n_m.
>    - In the "Finish" phase, the vector is divided into n_t blocks each of
>    size n_m (indexed time->space; this is the reason for the permutation of
>    indices). The blocks are partitioned over all processors. Each processor
>    will get n_t_local blocks of size n_m, where the sum of all n_t_locals for
>    all processors is n_t.
>
> I think the basic idea is similar to the previous example, but these
> details make things a bit more complicated. Please let me know if anything
> is unclear, and I can try to explain more.
>
> Thanks for your help,
> Sreeram
>
> On Tue, Dec 5, 2023 at 9:30 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> I think your approach is correct.  Do you have an example code?
>>
>> --Junchao Zhang
>>
>>
>> On Tue, Dec 5, 2023 at 5:15 PM Sreeram R Venkat <srvenkat at utexas.edu>
>> wrote:
>>
>>> Hi, I have a follow up question on this.
>>>
>>> Now, I'm trying to do a scatter and permutation of the vector. Under the
>>> same setup as the original example, here are the new Start and Finish
>>> states I want to achieve:
>>>  Start                                Finish
>>> Proc | Entries                 Proc | Entries
>>>     0   |  0,...,8                     0   | 0, 12, 24
>>>     1   |  9,...,17                   1   | 1, 13, 25
>>>     2   |  18,...,26                 2   | 2, 14, 26
>>>     3   |  27,...,35                 3   | 3, 15, 27
>>>     4   |  None                     4   | 4, 16, 28
>>>     5   |  None                     5   | 5, 17, 29
>>>     6   |  None                     6   | 6, 18, 30
>>>     7   |  None                     7   | 7, 19, 31
>>>     8   |  None                     8   | 8, 20, 32
>>>     9   |  None                     9   | 9, 21, 33
>>>     10   |  None                   10 | 10, 22, 34
>>>     11   |  None                   11  | 11, 23, 35
>>>
>>> So far, I've tried to use ISCreateGeneral(), with each process giving an
>>> idx array corresponding to the indices it wants (i.e. idx on P0 looks like
>>> [0,12,24] P1 -> [1,13, 25], and so on).
>>> Then I use this to create the VecScatter with VecScatterCreate(x, is, y,
>>> NULL, &scatter).
>>>
>>> However, when I try to do the scatter, I get some illegal memory access
>>> errors.
>>>
>>> Is there something wrong with how I define the index sets?
>>>
>>> Thanks,
>>> Sreeram
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Oct 5, 2023 at 12:57 PM Sreeram R Venkat <srvenkat at utexas.edu>
>>> wrote:
>>>
>>>> Thank you. This works for me.
>>>>
>>>> Sreeram
>>>>
>>>> On Wed, Oct 4, 2023 at 6:41 PM Junchao Zhang <junchao.zhang at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi, Sreeram,
>>>>> You can try this code. Since x, y are both MPI vectors, we just need
>>>>> to say we want to scatter x[0:N] to y[0:N]. The 12 index sets with your
>>>>> example on the 12 processes would be [0..8], [9..17], [18..26], [27..35],
>>>>> [], ..., [].  Actually, you can do it arbitrarily, say, with 12 index sets
>>>>> [0..17], [18..35], .., [].  PETSc will figure out how to do the
>>>>> communication.
>>>>>
>>>>> PetscInt rstart, rend, N;
>>>>> IS ix;
>>>>> VecScatter vscat;
>>>>> Vec y;
>>>>> MPI_Comm comm;
>>>>> VecType type;
>>>>>
>>>>> PetscObjectGetComm((PetscObject)x, &comm);
>>>>> VecGetType(x, &type);
>>>>> VecGetSize(x, &N);
>>>>> VecGetOwnershipRange(x, &rstart, &rend);
>>>>>
>>>>> VecCreate(comm, &y);
>>>>> VecSetSizes(y, PETSC_DECIDE, N);
>>>>> VecSetType(y, type);
>>>>>
>>>>> ISCreateStride(PetscObjectComm((PetscObject)x), rend - rstart, rstart,
>>>>> 1, &ix);
>>>>> VecScatterCreate(x, ix, y, ix, &vscat);
>>>>>
>>>>> --Junchao Zhang
>>>>>
>>>>>
>>>>> On Wed, Oct 4, 2023 at 6:03 PM Sreeram R Venkat <srvenkat at utexas.edu>
>>>>> wrote:
>>>>>
>>>>>> Suppose I am running on 12 processors, and I have a vector "v" of
>>>>>> size 36 partitioned over the first 4. v still uses the PETSC_COMM_WORLD, so
>>>>>> it has a layout of (9, 9, 9, 9, 0, 0, ..., 0). Now, I would like to
>>>>>> repartition it over all 12 processors, so that the layout becomes (3, 3, 3,
>>>>>> ..., 3). I've been trying to use VecScatter to do this, but I'm not sure
>>>>>> what IndexSets to use for the sender and receiver.
>>>>>>
>>>>>> The result I am trying to achieve is this:
>>>>>>
>>>>>> Assume the vector is v = <0, 1, 2, ..., 35>
>>>>>>
>>>>>>      Start                                Finish
>>>>>> Proc | Entries                 Proc | Entries
>>>>>>     0   |  0,...,8                     0   | 0, 1, 2
>>>>>>     1   |  9,...,17                   1   | 3, 4, 5
>>>>>>     2   |  18,...,26                 2   | 6, 7, 8
>>>>>>     3   |  27,...,35                 3   | 9, 10, 11
>>>>>>     4   |  None                     4   | 12, 13, 14
>>>>>>     5   |  None                     5   | 15, 16, 17
>>>>>>     6   |  None                     6   | 18, 19, 20
>>>>>>     7   |  None                     7   | 21, 22, 23
>>>>>>     8   |  None                     8   | 24, 25, 26
>>>>>>     9   |  None                     9   | 27, 28, 29
>>>>>>     10   |  None                   10 | 30, 31, 32
>>>>>>     11   |  None                   11  | 33, 34, 35
>>>>>>
>>>>>> Appreciate any help you can provide on this.
>>>>>>
>>>>>> Thanks,
>>>>>> Sreeram
>>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231206/6ec83f20/attachment-0001.html>