[petsc-users] 2D Partitioning matrix-shell and KSP

Barry Smith bsmith at petsc.dev
Wed Sep 20 09:23:50 CDT 2023


  Use VecCreate(), VecSetSizes(), VecSetType() and MatCreate(), MatSetSizes(), and MatSetType() instead of the convience functions VecCreateMPICUDA() and MatCreateShell().


> On Sep 19, 2023, at 8:44 PM, Sreeram R Venkat <srvenkat at utexas.edu> wrote:
> 
> Thank you for your reply.
> 
> Let's call this matrix M: 
> (A B C D)
> (E F G H)
> (I  J K L)
> 
> Now, instead of doing KSP with just M, what if I want M^TM? In this case, the matvec implementation would be as follows:
> 
> same partitioning of blocks A, B, ..., L among the 12 MPI ranks
> matvec looks like:
> (a)                  (w)
> (b) = (M^TM ) (x)
> (c)                   (y)
> (d)                   (z)
> w, x, y, z stored on ranks A, B, C, D (as before)
> a, b, c, d now also stored on ranks A, B, C, D
> Based on your message, I believe using a PetscLayout for both the (a,b,c,d) and (w,x,y,z) vector of (number of columns of A, number of columns of B, number of columns of C, number of columns of D,0,0,0,0,0,0,0,0,0) should work.
> 
> 
> I see there are functions "VecSetLayout" and "MatSetLayouts" to set the PetscLayouts of the matrix and vectors. When I create the vectors (I need VecCreateMPICUDA) or matrix shell (with MatCreateShell), I need to pass the local and global sizes. I'm not sure what to do there.
> 
> 
> Thanks,
> Sreeram
> 
> On Tue, Sep 19, 2023, 7:13 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>> 
>>    The PetscLayout local sizes for PETSc (a,b,c) vector (0,0,0,number of rows of D, 0,0,0, number of rows of H, 0,0,0,number of rows of L)
>> 
>>    
>>    The PetscLayout local sizes for PETSc (w,x,y,z) vector (number of columns of A, number of columns of B, number of columns of C, number of columns of D,0,0,0,0,0,0,0,0,0)
>> 
>>    The left and right layouts of the shell matrix need to match the two above. 
>> 
>>    There is a huge problem. KSP is written assuming that the left vector layout is the same as the right vector layout. So it can do dot products MPI rank by MPI rank without needing to send individual vector values around.
>> 
>>    I don't it makes sense to use PETSc with such vector decompositions as you would like.
>> 
>>   Barry
>> 
>> 
>> 
>>> On Sep 19, 2023, at 7:44 PM, Sreeram R Venkat <srvenkat at utexas.edu <mailto:srvenkat at utexas.edu>> wrote:
>>> 
>>> With the example you have given, here is what I would like to do:
>>> 12 MPI ranks
>>> Each rank has one block (rank 0 has A, rank 1 has B, ..., rank 11 has L) - to make the rest of this easier I'll refer to the rank containing block A as "rank A", and so on
>>> rank A, rank B, rank C, and rank D have w, x, y, z respectively - the first step of the custom matvec implementation broadcasts w to rank E and rank I (similarly x is broadcast to rank F and rank J ...)
>>> at the end of the matvec computation, ranks D, H, and L have a, b, and c respectively
>>> Thanks,
>>> Sreeram
>>> 
>>> 
>>> On Tue, Sep 19, 2023 at 6:23 PM Barry Smith <bsmith at petsc.dev <mailto:bsmith at petsc.dev>> wrote:
>>>> 
>>>>  (  a )       (  A  B  C  D ) (   w )
>>>>  (  b )   =  (  E  F  G H  ) (  x )
>>>>  (  c )        ( I    J   K L  )  ( y  )
>>>>                                        ( z  )
>>>> 
>>>> I have no idea what "The input vector is partitioned across each row, and the output vector is partitioned across each column" means.
>>>> 
>>>> Anyways the shell matrix needs to live on MPI_COMM_WORLD, as do both the (a,b,c) and (w,x,y,z) vector. 
>>>> 
>>>> Now how many MPI ranks do you want to do the compution on? 12?  
>>>> Do you want one matrix A .. Z on each rank?
>>>> 
>>>> Do you want the (a,b,c) vector spread over all ranks? What about the (w,x,y,z) vector?
>>>> 
>>>>   Barry
>>>> 
>>>> 
>>>> 
>>>>> On Sep 19, 2023, at 4:42 PM, Sreeram R Venkat <srvenkat at utexas.edu <mailto:srvenkat at utexas.edu>> wrote:
>>>>> 
>>>>> I have a custom implementation of a matrix-vector product that inherently relies on a 2D processor partitioning of the matrix. That is, if the matrix looks like:
>>>>> 
>>>>> A B C D
>>>>> E F G H
>>>>> I  J K L
>>>>> in block form, we use 12 processors, each having one block. The input vector is partitioned across each row, and the output vector is partitioned across each column.
>>>>> 
>>>>> Each processor has 3 communicators: the WORLD_COMM, a ROW_COMM, and a COL_COMM. The ROW/COL communicators are used to do reductions over rows/columns of processors.
>>>>> 
>>>>> With this setup, I am a bit confused about how to set up the matrix shell. The "MatCreateShell" function only accepts one communicator. If I give the WORLD_COMM, the local/global sizes won't match since PETSc will try to multiply local_size * total_processors instead of local_size * processors_per_row (or col). I have gotten around this temporarily by giving ROW_COMM here instead. What I think happens is a different MatShell is created on each row, but when computing the matvec, they all work together. 
>>>>> 
>>>>> However, if I try to use KSP (CG) with this setup (giving ROW_COMM as the communicator), the process hangs. I believe this is due to the partitioning of the input/output vectors. The matvec itself is fine, but the inner products and other steps of CG fail. In fact, if I restrict to the case where I only have one row of processors, I am able to successfully use KSP. 
>>>>> 
>>>>> Is there a way to use KSP with this 2D partitioning setup when there are multiple rows of processors? I'd also prefer to work with one global MatShell object instead of this one object per row thing that I'm doing right now.
>>>>> 
>>>>> Thanks for your help,
>>>>> Sreeram
>>>> 
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20230920/baee8aff/attachment-0001.html>


More information about the petsc-users mailing list