[petsc-users] 2D Partitioning matrix-shell and KSP

Sreeram R Venkat srvenkat at utexas.edu
Tue Sep 19 19:44:46 CDT 2023


Thank you for your reply.

Let's call this matrix *M*:
(A B C D)
(E F G H)
(I  J K L)

Now, instead of doing KSP with just *M*, what if I want *M^TM*? In this
case, the matvec implementation would be as follows:


   - same partitioning of blocks A, B, ..., L among the 12 MPI ranks
   - matvec looks like:

(a)                  (w)
(b) = (*M^TM* ) (x)
(c)                   (y)
(d)                   (z)

   - w, x, y, z stored on ranks A, B, C, D (as before)
   - a, b, c, d now also stored on ranks A, B, C, D

Based on your message, I believe using a PetscLayout for both the (a,b,c,d)
and (w,x,y,z) vector of (number of columns of A, number of columns of B,
number of columns of C, number of columns of D,0,0,0,0,0,0,0,0,0) should
work.


I see there are functions "VecSetLayout" and "MatSetLayouts" to set the
PetscLayouts of the matrix and vectors. When I create the vectors (I need
VecCreateMPICUDA) or matrix shell (with MatCreateShell), I need to pass the
local and global sizes. I'm not sure what to do there.


Thanks,
Sreeram

On Tue, Sep 19, 2023, 7:13 PM Barry Smith <bsmith at petsc.dev> wrote:

>
>    The PetscLayout local sizes for PETSc (a,b,c) vector (0,0,0,number of
> rows of D, 0,0,0, number of rows of H, 0,0,0,number of rows of L)
>
>
>    The PetscLayout local sizes for PETSc (w,x,y,z) vector (number of
> columns of A, number of columns of B, number of columns of C, number of
> columns of D,0,0,0,0,0,0,0,0,0)
>
>    The left and right layouts of the shell matrix need to match the two
> above.
>
>    There is a huge problem. KSP is written assuming that the left vector
> layout is the same as the right vector layout. So it can do dot products
> MPI rank by MPI rank without needing to send individual vector values
> around.
>
>    I don't it makes sense to use PETSc with such vector decompositions as
> you would like.
>
>   Barry
>
>
>
> On Sep 19, 2023, at 7:44 PM, Sreeram R Venkat <srvenkat at utexas.edu> wrote:
>
> With the example you have given, here is what I would like to do:
>
>    - 12 MPI ranks
>    - Each rank has one block (rank 0 has A, rank 1 has B, ..., rank 11
>    has L) - to make the rest of this easier I'll refer to the rank containing
>    block A as "rank A", and so on
>    - rank A, rank B, rank C, and rank D have w, x, y, z respectively -
>    the first step of the custom matvec implementation broadcasts w to rank E
>    and rank I (similarly x is broadcast to rank F and rank J ...)
>    - at the end of the matvec computation, ranks D, H, and L have a, b,
>    and c respectively
>
> Thanks,
> Sreeram
>
>
> On Tue, Sep 19, 2023 at 6:23 PM Barry Smith <bsmith at petsc.dev> wrote:
>
>>
>>  (  a )       (  A  B  C  D ) (   w )
>>  (  b )   =  (  E  F  G H  ) (  x )
>>  (  c )        ( I    J   K L  )  ( y  )
>>                                        ( z  )
>>
>> I have no idea what "The input vector is partitioned across each row,
>> and the output vector is partitioned across each column" means.
>>
>> Anyways the shell matrix needs to live on MPI_COMM_WORLD, as do both the
>> (a,b,c) and (w,x,y,z) vector.
>>
>> Now how many MPI ranks do you want to do the compution on? 12?
>> Do you want one matrix A .. Z on each rank?
>>
>> Do you want the (a,b,c) vector spread over all ranks? What about the (w,x,y,z)
>> vector?
>>
>>   Barry
>>
>>
>>
>> On Sep 19, 2023, at 4:42 PM, Sreeram R Venkat <srvenkat at utexas.edu>
>> wrote:
>>
>> I have a custom implementation of a matrix-vector product that inherently
>> relies on a 2D processor partitioning of the matrix. That is, if the matrix
>> looks like:
>>
>> A B C D
>> E F G H
>> I  J K L
>>
>> in block form, we use 12 processors, each having one block. The input
>> vector is partitioned across each row, and the output vector is partitioned
>> across each column.
>>
>> Each processor has 3 communicators: the WORLD_COMM, a ROW_COMM, and a
>> COL_COMM. The ROW/COL communicators are used to do reductions over
>> rows/columns of processors.
>>
>> With this setup, I am a bit confused about how to set up the matrix
>> shell. The "MatCreateShell" function only accepts one communicator. If I
>> give the WORLD_COMM, the local/global sizes won't match since PETSc will
>> try to multiply local_size * total_processors instead of local_size *
>> processors_per_row (or col). I have gotten around this temporarily by
>> giving ROW_COMM here instead. What I think happens is a different MatShell
>> is created on each row, but when computing the matvec, they all work
>> together.
>>
>> However, if I try to use KSP (CG) with this setup (giving ROW_COMM as the
>> communicator), the process hangs. I believe this is due to the partitioning
>> of the input/output vectors. The matvec itself is fine, but the inner
>> products and other steps of CG fail. In fact, if I restrict to the case
>> where I only have one row of processors, I am able to successfully use KSP.
>>
>> Is there a way to use KSP with this 2D partitioning setup when there are
>> multiple rows of processors? I'd also prefer to work with one global
>> MatShell object instead of this one object per row thing that I'm doing
>> right now.
>>
>> Thanks for your help,
>> Sreeram
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230919/0dc0d7aa/attachment-0001.html>


More information about the petsc-users mailing list