[petsc-users] MatCreateSubMatricesMPI strange behavior

Matthew Knepley knepley at gmail.com
Tue Aug 26 08:46:11 CDT 2025


On Tue, Aug 26, 2025 at 9:42 AM Pierre Jolivet <pierre at joliv.et> wrote:

> It’s indeed very suspicious (to me) that we are using rmap to change a
> column index.
> Switching to cmap gets your code running, but I’ll need to see if this
> triggers regressions.
>

That looks right to me. I am sure this has only been tested for GASM, which
would be symmetric.

  Thanks,

     Matt


> Thanks for the report,
> Pierre
>
> diff --git a/src/mat/impls/aij/mpi/mpiov.c b/src/mat/impls/aij/mpi/mpiov.c
> index d1037d7d817..051981ebe9a 100644
> --- a/src/mat/impls/aij/mpi/mpiov.c
> +++ b/src/mat/impls/aij/mpi/mpiov.c
> @@ -2948,3 +2948,3 @@ PetscErrorCode MatSetSeqMats_MPIAIJ(Mat C, IS
> rowemb, IS dcolemb, IS ocolemb, Ma
>
> -    PetscCall(PetscLayoutGetRange(C->rmap, &rstart, &rend));
> +    PetscCall(PetscLayoutGetRange(C->cmap, &rstart, &rend));
>      shift      = rend - rstart;
>
> $ cat proc_0_output.txt
> rstart 0 rend 4
> Mat Object: 3 MPI processes
>   type: mpiaij
>   row 0:   (0, 101.)    (3, 104.)    (6, 107.)    (9, 110.)
>   row 1:   (2, 203.)    (5, 206.)    (8, 209.)    (11, 212.)
>   row 2:   (1, 302.)    (4, 305.)    (7, 308.)    (10, 311.)
>   row 3:   (0, 401.)    (3, 404.)    (6, 407.)    (9, 410.)
>   row 4:   (2, 503.)    (5, 506.)    (8, 509.)    (11, 512.)
>   row 5:   (1, 602.)    (4, 605.)    (7, 608.)    (10, 611.)
>   row 6:   (0, 701.)    (3, 704.)    (6, 707.)    (9, 710.)
>   row 7:   (2, 803.)    (5, 806.)    (8, 809.)    (11, 812.)
>   row 8:   (1, 902.)    (4, 905.)    (7, 908.)    (10, 911.)
>   row 9:   (0, 1001.)    (3, 1004.)    (6, 1007.)    (9, 1010.)
>   row 10:   (2, 1103.)    (5, 1106.)    (8, 1109.)    (11, 1112.)
>   row 11:   (1, 1202.)    (4, 1205.)    (7, 1208.)    (10, 1211.)
> idxr proc
> IS Object: 2 MPI processes
>   type: general
> [0] Number of indices in set 4
> [0] 0 0
> [0] 1 1
> [0] 2 2
> [0] 3 3
> [1] Number of indices in set 4
> [1] 0 4
> [1] 1 5
> [1] 2 6
> [1] 3 7
> idxc proc
> IS Object: 2 MPI processes
>   type: general
> [0] Number of indices in set 2
> [0] 0 0
> [0] 1 1
> [1] Number of indices in set 2
> [1] 0 6
> [1] 1 7
> Mat Object: 2 MPI processes
>   type: mpiaij
>   row 0:   (0, 101.)    (2, 107.)
>   row 1:
>   row 2:   (1, 302.)    (3, 308.)
>   row 3:   (0, 401.)    (2, 407.)
>   row 4:
>   row 5:   (1, 602.)    (3, 608.)
>   row 6:   (0, 701.)    (2, 707.)
>   row 7:
> rstart 0 rend 4
> local row 0: ( 0 , 1.010000e+02) ( 2 , 1.070000e+02)
> local row 1:
> local row 2: ( 1 , 3.020000e+02) ( 3 , 3.080000e+02)
> local row 3: ( 0 , 4.010000e+02) ( 2 , 4.070000e+02)
>
> On 26 Aug 2025, at 3:18 PM, Pierre Jolivet <pierre at joliv.et> wrote:
>
>
> On 26 Aug 2025, at 12:50 PM, Alexis SALZMAN <alexis.salzman at ec-nantes.fr>
> wrote:
>
> Mark, you were right and I was wrong about the dense matrix. Adding
> explicit zeros to the distributed matrix used to extract the sub-matrices
> (making it dense) in my test does not change the behaviour: there is still
> an error.
>
> I am finding it increasingly difficult to understand the logic of the row
> and column 'IS' creation. I ran many tests to achieve the desired result: a
> rectangular sub-matrix (so a rectangular or square sub-matrix appears to be
> possible). However, many others resulted in the same kind of error.
>
> This may be a PETSc bug in MatSetSeqMats_MPIAIJ().
> -> 2965        PetscCall(MatSetValues(aij->B, 1, &row, 1, &col, &v,
> INSERT_VALUES));
> col has a value of 4, which doesn’t make sense since the output Mat has 4
> columns (thus, has the error message suggests, the value should be lower
> than or equal to 3).
>
> Thanks,
> Pierre
>
> From what I observed, the test only works if the column selection
> contribution (size_c in the test) has a specific value related to the row
> selection contribution (size_r in the test) for proc 0 (rank for both
> communicator and sub-communicator):
>
>    - if size_r==2 then if size_c<=2 it works.
>    - if size_r>=3 and size_r<=5 then size_c==size_r is the only working
>    case.
>
> This occurs "regardless" of what is requested in proc 1 and in selr/selc
> (It can't be a dummy setting, though). In any case, it's certainly not an
> exhaustive analysis.
>
> Many thanks to anyone who can explain to me the logic behind the
> construction of row and column 'IS'.
>
> Regards
>
> A.S.
>
>
> Le 25/08/2025 à 20:00, Alexis SALZMAN a écrit :
>
> Thanks Mark for your attention.
>
> The uncleaned error message, compared to my post in July, is as follows:
>
> [0]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [0]PETSC ERROR: Argument out of range
> [0]PETSC ERROR: Column too large: col 4 max 3
> [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!bHTBIkomOzv2YXBZD7vNmmrb0Ijc6Xd4lvZJ15CTTDE3_ewzI1zbkHE8LP2mjO3qm6VWQ8yao8uuUJllLwYe$ 
> <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dWBkCu100EMuxu8ooVUnqSFN7OhzOBoNHAiwDYEQ5cJ921sU5hdFb-G24ounZFeUQgZkfWqGRX4iIHyQ-xLQElJst5RbKa2pGnk$>
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.22.2, unknown
> [0]PETSC ERROR: subnb with 3 MPI process(es) and PETSC_ARCH  on
> pc-str97.ec-nantes.fr by salzman Mon Aug 25 19:11:37 2025
> [0]PETSC ERROR: Configure options: PETSC_ARCH=real_fc41_Release_gcc_i4
> PETSC_DIR=/home/salzman/devel/ExternalLib/build/PETSC/petsc --doCleanup=1
> --with-scalar-type=real --known-level1-dcach
> e-linesize=64 --with-cc=gcc --CFLAGS="-fPIC " --CC_LINKER_FLAGS=-fopenmp
> --with-cxx=g++ --with-cxx-dialect=c++20 --CXXFLAGS="-fPIC "
> --CXX_LINKER_FLAGS=-fopenmp --with-fc=gfortran --FFLAGS=
> "-fPIC " --FC_LINKER_FLAGS=-fopenmp --with-debugging=0
> --with-fortran-bindings=0 --with-fortran-kernels=1 --with-mpi-compilers=0
> --with-mpi-include=/usr/include/openmpi-x86_64 --with-mpi-li
> b="[/usr/lib64/openmpi/lib/libmpi.so,/usr/lib64/openmpi/lib/libmpi.so,/usr/lib64/openmpi/lib/libmpi_mpifh.so]"
> --with-blas-lib="[/opt/intel/oneapi/mkl/latest/lib/libmkl_intel_lp64.so,/opt/i
> ntel/oneapi/mkl/latest/lib/libmkl_gnu_thread.so,/opt/intel/oneapi/mkl/latest/lib/libmkl_core.so]"
> --with-lapack-lib="[/opt/intel/oneapi/mkl/latest/lib/libmkl_intel_lp64.so,/opt/intel/oneapi
> /mkl/latest/lib/libmkl_gnu_thread.so,/opt/intel/oneapi/mkl/latest/lib/libmkl_core.so]"
> --with-mumps=1 --with-mumps-include=/home/salzman/local/i4_gcc/include
> --with-mumps-lib="[/home/salzma
> n/local/i4_gcc/lib/libdmumps.so,/home/salzman/local/i4_gcc/lib/libmumps_common.so,/home/salzman/local/i4_gcc/lib/libpord.so]"
> --with-scalapack-lib="[/opt/intel/oneapi/mkl/latest/lib/libmkl_
> scalapack_lp64.so,/opt/intel/oneapi/mkl/latest/lib/libmkl_blacs_openmpi_lp64.so]"
> --with-mkl_pardiso=1
> --with-mkl_pardiso-include=/opt/intel/oneapi/mkl/latest/include
> --with-mkl_pardiso-lib
> ="[/opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_lp64.so]"
> --with-hdf5=1 --with-hdf5-include=/usr/include/openmpi-x86_64
> --with-hdf5-lib="[/usr/lib64/openmpi/lib/libhdf5.so]" --with
> -pastix=0 --download-pastix=no --with-hwloc=1
> --with-hwloc-dir=/home/salzman/local/i4_gcc --download-hwloc=no
> --with-ptscotch-include=/home/salzman/local/i4_gcc/include
> --with-ptscotch-lib=
>
> "[/home/salzman/local/i4_gcc/lib/libptscotch.a,/home/salzman/local/i4_gcc/lib/libptscotcherr.a,/home/salzman/local/i4_gcc/lib/libptscotcherrexit.a,/home/salzman/local/i4_gcc/lib/libscotch.a
> ,/home/salzman/local/i4_gcc/lib/libscotcherr.a,/home/salzman/local/i4_gcc/lib/libscotcherrexit.a]"
> --with-hypre=1 --download-hypre=yes --with-suitesparse=1
> --with-suitesparse-include=/home/
> salzman/local/i4_gcc/include
> --with-suitesparse-lib="[/home/salzman/local/i4_gcc/lib/libsuitesparseconfig.so,/home/salzman/local/i4_gcc/lib/libumfpack.so,/home/salzman/local/i4_gcc/lib/libk
>
> lu.so,/home/salzman/local/i4_gcc/lib/libcholmod.so,/home/salzman/local/i4_gcc/lib/libspqr.so,/home/salzman/local/i4_gcc/lib/libcolamd.so,/home/salzman/local/i4_gcc/lib/libccolamd.so,/home/s
> alzman/local/i4_gcc/lib/libcamd.so,/home/salzman/local/i4_gcc/lib/libamd.so,/home/salzman/local/i4_gcc/lib/libmetis.so]"
> --download-suitesparse=no --with-python-exec=python3.12 --have-numpy
> =1 ---with-petsc4py=1 ---with-petsc4py-test-np=4 ---with-mpi4py=1
> --prefix=/home/salzman/local/i4_gcc/real_arithmetic COPTFLAGS="-O3 -g "
> CXXOPTFLAGS="-O3 -g " FOPTFLAGS="-O3 -g "
> [0]PETSC ERROR: #1 MatSetValues_SeqAIJ() at
> /home/salzman/devel/PETSc/petsc/src/mat/impls/aij/seq/aij.c:426
> [0]PETSC ERROR: #2 MatSetValues() at
> /home/salzman/devel/PETSc/petsc/src/mat/interface/matrix.c:1543
> [0]PETSC ERROR: #3 MatSetSeqMats_MPIAIJ() at
> /home/salzman/devel/PETSc/petsc/src/mat/impls/aij/mpi/mpiov.c:2965
> [0]PETSC ERROR: #4 MatCreateSubMatricesMPI_MPIXAIJ() at
> /home/salzman/devel/PETSc/petsc/src/mat/impls/aij/mpi/mpiov.c:3163
> [0]PETSC ERROR: #5 MatCreateSubMatricesMPI_MPIAIJ() at
> /home/salzman/devel/PETSc/petsc/src/mat/impls/aij/mpi/mpiov.c:3196
> [0]PETSC ERROR: #6 MatCreateSubMatricesMPI() at
> /home/salzman/devel/PETSc/petsc/src/mat/interface/matrix.c:7293
> [0]PETSC ERROR: #7 main() at subnb.c:181
> [0]PETSC ERROR: No PETSc Option Table entries
> [0]PETSC ERROR: ----------------End of Error Message -------send entire
> error message to petsc-maint at mcs.anl.gov----------
> --------------------------------------------------------------------------
>
> This message comes from executing the attached test (I simplified the test
> by removing the block size from the matrix used for extraction, compared to
> the July test). In proc_xx_output.txt, you will find the output from the
> code execution with the -ok option (i.e. irow/idxr and icol/idxc are the
> same, i.e. a square sub-block for colour 0 distributed across the first two
> processes).
>
> Has expected in this case we obtain the 0,3,6,9 sub-block terms, which are
> distributed across processes 0 and 1 (two rows per proc).
>
> When asking for rectangular sub-block (i.e. with no option) it crash with
> column to large on process 0: 4 col max 3 ??? I ask for 4 rows and 2
> columns in this process ???
>
> Otherwise, I mention the dense aspect of the matrix in ex183.c, because,
> in this case, no matter what selection is requested, all terms are
> non-null. If there is an issue with the way the selection is coded in the
> user program, I think it will be masked thanks to the full graph
> representation. However, this may not be the case — I should test it.
>
> I'll take a look at ex23.c.
>
> Thanks,
>
> A.S.
>
>
>
> Le 25/08/2025 à 17:55, Mark Adams a écrit :
>
> Ah, OK, never say never.
>
> MatCreateSubMatrices seems to support creating a new matrix with the
> communicator of the IS.
> It just needs to read from the input matrix and does not use it for
> communication, so it can do that.
>
> As far as rectangular matrices, there is no reason not to support that
> (the row IS and column IS can be distinct).
> Can you send the whole error message?
> There may not be a test that does this, but src/mat/tests/ex23.c looks
> like it may be a rectangular matrix output.
>
> And, it should not matter if the input matrix has a 100% full sparse
> matrix. It is still MatAIJ.
> The semantics and API is the same for sparse or dense matrices.
>
> Thanks,
> Mark
>
> On Mon, Aug 25, 2025 at 7:31 AM Alexis SALZMAN <
> alexis.salzman at ec-nantes.fr> wrote:
>
>> Hi,
>>
>> Thanks for your answer, Mark. Perhaps MatCreateSubMatricesMPI is the only
>> PETSc function that acts on a sub-communicator — I'm not sure — but it's
>> clear that there's no ambiguity on that point. The first line of the
>> documentation for that function states that it 'may live on subcomms'. This
>> is confirmed by the 'src/mat/tests/ex183.c' test case. I used this test
>> case to understand the function, which helped me with my code and the
>> example I provided in my initial post. Unfortunately, in this example, the
>> matrix from which the sub-matrices are extracted is dense, even though it
>> uses a sparse structure. This does not clarify how to define sub-matrices
>> when extracting from a sparse distributed matrix. Since my initial post, I
>> have discovered that having more columns than rows can also result in the
>> same error message.
>>
>> So, my questions boil down to:
>>
>> Can MatCreateSubMatricesMPI extract rectangular matrices from a square
>> distributed sparse matrix?
>>
>> If not, the fact that only square matrices can be extracted in this
>> context should perhaps be mentioned in the documentation.
>>
>> If so, I would be very grateful for any assistance in defining an IS pair
>> in this context.
>>
>> Regards
>>
>> A.S.
>> Le 27/07/2025 à 00:15, Mark Adams a écrit :
>>
>> First, you can not mix communicators in PETSc calls in general (ever?),
>> but this error looks like you might be asking for a row from the matrix
>> that does not exist.
>> You should start with a PETSc example code. Test it and modify it to suit
>> your needs.
>>
>> Good luck,
>> Mark
>>
>> On Fri, Jul 25, 2025 at 9:31 AM Alexis SALZMAN <
>> alexis.salzman at ec-nantes.fr> wrote:
>>
>>> Hi,
>>>
>>> As I am relatively new to Petsc, I may have misunderstood how to use the
>>> MatCreateSubMatricesMPI function. The attached code is tuned for three
>>> processes and extracts one matrix for each colour of a subcommunicator
>>> that has been created using the MPI_Comm_split function from an  MPIAij
>>> matrix. The following error message appears when the code is set to its
>>> default configuration (i.e. when a rectangular matrix is extracted with
>>> more rows than columns for colour 0):
>>>
>>> [0]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> [0]PETSC ERROR: Argument out of range
>>> [0]PETSC ERROR: Column too large: col 4 max 3
>>> [0]PETSC ERROR: See
>>> https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ZqH097BZ0G0O3WI7RWrwIKFNpyk0czSWEqfusAeTlgEygAffwpgBUzsLw1TIoGkjZ3mYG-NRQxxFoxU4y8EyY0ofiz9I43Qwe0w$
>>> for trouble shooting.
>>> [0]PETSC ERROR: Petsc Release Version 3.22.2, unknown
>>>
>>> ... petsc git hash 2a89477b25f compiled on a dell i9 computer with Gcc
>>> 14.3, mkl 2025.2, .....
>>> [0]PETSC ERROR: #1 MatSetValues_SeqAIJ() at
>>> ...petsc/src/mat/impls/aij/seq/aij.c:426
>>> [0]PETSC ERROR: #2 MatSetValues() at
>>> ...petsc/src/mat/interface/matrix.c:1543
>>> [0]PETSC ERROR: #3 MatSetSeqMats_MPIAIJ() at
>>> .../petsc/src/mat/impls/aij/mpi/mpiov.c:2965
>>> [0]PETSC ERROR: #4 MatCreateSubMatricesMPI_MPIXAIJ() at
>>> .../petsc/src/mat/impls/aij/mpi/mpiov.c:3163
>>> [0]PETSC ERROR: #5 MatCreateSubMatricesMPI_MPIAIJ() at
>>> .../petsc/src/mat/impls/aij/mpi/mpiov.c:3196
>>> [0]PETSC ERROR: #6 MatCreateSubMatricesMPI() at
>>> .../petsc/src/mat/interface/matrix.c:7293
>>> [0]PETSC ERROR: #7 main() at sub.c:169
>>>
>>> When the '-ok' option is selected, the code extracts a square matrix for
>>> colour 0, which runs smoothly in this case. Selecting the '-trans'
>>> option swaps the row and column selection indices, providing a
>>> transposed submatrix smoothly. For colour 1, which uses only one process
>>> and is therefore sequential, rectangular extraction is OK regardless of
>>> the shape.
>>>
>>> Is this dependency on the shape expected? Have I missed an important
>>> tuning step somewhere?
>>>
>>> Thank you in advance for any clarification.
>>>
>>> Regards
>>>
>>> A.S.
>>>
>>> P.S.: I'm sorry, but as I'm leaving my office for the following weeks
>>> this evening, I won't be very responsive during this period.
>>>
>>>
>>>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bHTBIkomOzv2YXBZD7vNmmrb0Ijc6Xd4lvZJ15CTTDE3_ewzI1zbkHE8LP2mjO3qm6VWQ8yao8uuUIZe95ad$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!bHTBIkomOzv2YXBZD7vNmmrb0Ijc6Xd4lvZJ15CTTDE3_ewzI1zbkHE8LP2mjO3qm6VWQ8yao8uuUAzrhtGa$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250826/cb0255a3/attachment-0001.html>


More information about the petsc-users mailing list