[petsc-users] MatCreateSubMatricesMPI strange behavior

Matthew Knepley knepley at gmail.com
Wed Aug 27 07:04:50 CDT 2025


On Wed, Aug 27, 2025 at 5:15 AM Alexis SALZMAN <alexis.salzman at ec-nantes.fr>
wrote:

> Hello Pierre,
>
> After applying your patch to my local version of PETSc, all of the cases
> that previously caused the provided test to fail are now running smoothly.
> In a more complex context (with more processes and colors in my
> application), no errors are found and the sub-matrices look OK.
>
> Thank you very much for your time. This debugged function will greatly
> simplify the development of my application.
>
> Thanks for taking the time to report the error in a clear fashion. We
can't make the code better without this kind of cooperative effort.

   Matt

> Best regards
>
> A.S.
> Le 26/08/2025 à 15:46, Matthew Knepley a écrit :
>
> On Tue, Aug 26, 2025 at 9:42 AM Pierre Jolivet <pierre at joliv.et> wrote:
>
>> It’s indeed very suspicious (to me) that we are using rmap to change a
>> column index.
>> Switching to cmap gets your code running, but I’ll need to see if this
>> triggers regressions.
>>
>
> That looks right to me. I am sure this has only been tested for GASM,
> which would be symmetric.
>
>   Thanks,
>
>      Matt
>
>
>> Thanks for the report,
>> Pierre
>>
>> diff --git a/src/mat/impls/aij/mpi/mpiov.c b/src/mat/impls/aij/mpi/mpiov.c
>> index d1037d7d817..051981ebe9a 100644
>> --- a/src/mat/impls/aij/mpi/mpiov.c
>> +++ b/src/mat/impls/aij/mpi/mpiov.c
>> @@ -2948,3 +2948,3 @@ PetscErrorCode MatSetSeqMats_MPIAIJ(Mat C, IS
>> rowemb, IS dcolemb, IS ocolemb, Ma
>>
>> -    PetscCall(PetscLayoutGetRange(C->rmap, &rstart, &rend));
>> +    PetscCall(PetscLayoutGetRange(C->cmap, &rstart, &rend));
>>      shift      = rend - rstart;
>>
>> $ cat proc_0_output.txt
>> rstart 0 rend 4
>> Mat Object: 3 MPI processes
>>   type: mpiaij
>>   row 0:   (0, 101.)    (3, 104.)    (6, 107.)    (9, 110.)
>>   row 1:   (2, 203.)    (5, 206.)    (8, 209.)    (11, 212.)
>>   row 2:   (1, 302.)    (4, 305.)    (7, 308.)    (10, 311.)
>>   row 3:   (0, 401.)    (3, 404.)    (6, 407.)    (9, 410.)
>>   row 4:   (2, 503.)    (5, 506.)    (8, 509.)    (11, 512.)
>>   row 5:   (1, 602.)    (4, 605.)    (7, 608.)    (10, 611.)
>>   row 6:   (0, 701.)    (3, 704.)    (6, 707.)    (9, 710.)
>>   row 7:   (2, 803.)    (5, 806.)    (8, 809.)    (11, 812.)
>>   row 8:   (1, 902.)    (4, 905.)    (7, 908.)    (10, 911.)
>>   row 9:   (0, 1001.)    (3, 1004.)    (6, 1007.)    (9, 1010.)
>>   row 10:   (2, 1103.)    (5, 1106.)    (8, 1109.)    (11, 1112.)
>>   row 11:   (1, 1202.)    (4, 1205.)    (7, 1208.)    (10, 1211.)
>> idxr proc
>> IS Object: 2 MPI processes
>>   type: general
>> [0] Number of indices in set 4
>> [0] 0 0
>> [0] 1 1
>> [0] 2 2
>> [0] 3 3
>> [1] Number of indices in set 4
>> [1] 0 4
>> [1] 1 5
>> [1] 2 6
>> [1] 3 7
>> idxc proc
>> IS Object: 2 MPI processes
>>   type: general
>> [0] Number of indices in set 2
>> [0] 0 0
>> [0] 1 1
>> [1] Number of indices in set 2
>> [1] 0 6
>> [1] 1 7
>> Mat Object: 2 MPI processes
>>   type: mpiaij
>>   row 0:   (0, 101.)    (2, 107.)
>>   row 1:
>>   row 2:   (1, 302.)    (3, 308.)
>>   row 3:   (0, 401.)    (2, 407.)
>>   row 4:
>>   row 5:   (1, 602.)    (3, 608.)
>>   row 6:   (0, 701.)    (2, 707.)
>>   row 7:
>> rstart 0 rend 4
>> local row 0: ( 0 , 1.010000e+02) ( 2 , 1.070000e+02)
>> local row 1:
>> local row 2: ( 1 , 3.020000e+02) ( 3 , 3.080000e+02)
>> local row 3: ( 0 , 4.010000e+02) ( 2 , 4.070000e+02)
>>
>> On 26 Aug 2025, at 3:18 PM, Pierre Jolivet <pierre at joliv.et> wrote:
>>
>>
>> On 26 Aug 2025, at 12:50 PM, Alexis SALZMAN <alexis.salzman at ec-nantes.fr>
>> wrote:
>>
>> Mark, you were right and I was wrong about the dense matrix. Adding
>> explicit zeros to the distributed matrix used to extract the sub-matrices
>> (making it dense) in my test does not change the behaviour: there is still
>> an error.
>>
>> I am finding it increasingly difficult to understand the logic of the row
>> and column 'IS' creation. I ran many tests to achieve the desired result: a
>> rectangular sub-matrix (so a rectangular or square sub-matrix appears to be
>> possible). However, many others resulted in the same kind of error.
>>
>> This may be a PETSc bug in MatSetSeqMats_MPIAIJ().
>> -> 2965        PetscCall(MatSetValues(aij->B, 1, &row, 1, &col, &v,
>> INSERT_VALUES));
>> col has a value of 4, which doesn’t make sense since the output Mat has 4
>> columns (thus, has the error message suggests, the value should be lower
>> than or equal to 3).
>>
>> Thanks,
>> Pierre
>>
>> From what I observed, the test only works if the column selection
>> contribution (size_c in the test) has a specific value related to the row
>> selection contribution (size_r in the test) for proc 0 (rank for both
>> communicator and sub-communicator):
>>
>>    - if size_r==2 then if size_c<=2 it works.
>>    - if size_r>=3 and size_r<=5 then size_c==size_r is the only working
>>    case.
>>
>> This occurs "regardless" of what is requested in proc 1 and in selr/selc
>> (It can't be a dummy setting, though). In any case, it's certainly not an
>> exhaustive analysis.
>>
>> Many thanks to anyone who can explain to me the logic behind the
>> construction of row and column 'IS'.
>>
>> Regards
>>
>> A.S.
>>
>>
>> Le 25/08/2025 à 20:00, Alexis SALZMAN a écrit :
>>
>> Thanks Mark for your attention.
>>
>> The uncleaned error message, compared to my post in July, is as follows:
>>
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: Argument out of range
>> [0]PETSC ERROR: Column too large: col 4 max 3
>> [0]PETSC ERROR: See https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!eHMhc5L1tNjyBrEpH0recWwhp5pSwBnZuh09zH8p-ZQyM8m3kxz8ryLVwi7nXr0NNKedHWcHfY5etKLEhuGb$ 
>> <https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!dWBkCu100EMuxu8ooVUnqSFN7OhzOBoNHAiwDYEQ5cJ921sU5hdFb-G24ounZFeUQgZkfWqGRX4iIHyQ-xLQElJst5RbKa2pGnk$>
>> for trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.22.2, unknown
>> [0]PETSC ERROR: subnb with 3 MPI process(es) and PETSC_ARCH  on
>> pc-str97.ec-nantes.fr
>> <https://urldefense.us/v3/__http://pc-str97.ec-nantes.fr__;!!G_uCfscf7eWS!eQa_exf2PCgmMQ0L4h9al-nkWsWRJJ1Zwkjm_qHJsqT0zwLzW7eMjKlkRssc6loRju6u04y4yp9L0U39POoDIvyQKcfm9nWwEJg$>
>> by salzman Mon Aug 25 19:11:37 2025
>> [0]PETSC ERROR: Configure options: PETSC_ARCH=real_fc41_Release_gcc_i4
>> PETSC_DIR=/home/salzman/devel/ExternalLib/build/PETSC/petsc --doCleanup=1
>> --with-scalar-type=real --known-level1-dcach
>> e-linesize=64 --with-cc=gcc --CFLAGS="-fPIC " --CC_LINKER_FLAGS=-fopenmp
>> --with-cxx=g++ --with-cxx-dialect=c++20 --CXXFLAGS="-fPIC "
>> --CXX_LINKER_FLAGS=-fopenmp --with-fc=gfortran --FFLAGS=
>> "-fPIC " --FC_LINKER_FLAGS=-fopenmp --with-debugging=0
>> --with-fortran-bindings=0 --with-fortran-kernels=1 --with-mpi-compilers=0
>> --with-mpi-include=/usr/include/openmpi-x86_64 --with-mpi-li
>> b="[/usr/lib64/openmpi/lib/libmpi.so,/usr/lib64/openmpi/lib/libmpi.so,/usr/lib64/openmpi/lib/libmpi_mpifh.so]"
>> --with-blas-lib="[/opt/intel/oneapi/mkl/latest/lib/libmkl_intel_lp64.so,/opt/i
>> ntel/oneapi/mkl/latest/lib/libmkl_gnu_thread.so,/opt/intel/oneapi/mkl/latest/lib/libmkl_core.so]"
>> --with-lapack-lib="[/opt/intel/oneapi/mkl/latest/lib/libmkl_intel_lp64.so,/opt/intel/oneapi
>> /mkl/latest/lib/libmkl_gnu_thread.so,/opt/intel/oneapi/mkl/latest/lib/libmkl_core.so]"
>> --with-mumps=1 --with-mumps-include=/home/salzman/local/i4_gcc/include
>> --with-mumps-lib="[/home/salzma
>> n/local/i4_gcc/lib/libdmumps.so,/home/salzman/local/i4_gcc/lib/libmumps_common.so,/home/salzman/local/i4_gcc/lib/libpord.so]"
>> --with-scalapack-lib="[/opt/intel/oneapi/mkl/latest/lib/libmkl_
>> scalapack_lp64.so,/opt/intel/oneapi/mkl/latest/lib/libmkl_blacs_openmpi_lp64.so]"
>> --with-mkl_pardiso=1
>> --with-mkl_pardiso-include=/opt/intel/oneapi/mkl/latest/include
>> --with-mkl_pardiso-lib
>> ="[/opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_lp64.so]"
>> --with-hdf5=1 --with-hdf5-include=/usr/include/openmpi-x86_64
>> --with-hdf5-lib="[/usr/lib64/openmpi/lib/libhdf5.so]" --with
>> -pastix=0 --download-pastix=no --with-hwloc=1
>> --with-hwloc-dir=/home/salzman/local/i4_gcc --download-hwloc=no
>> --with-ptscotch-include=/home/salzman/local/i4_gcc/include
>> --with-ptscotch-lib=
>>
>> "[/home/salzman/local/i4_gcc/lib/libptscotch.a,/home/salzman/local/i4_gcc/lib/libptscotcherr.a,/home/salzman/local/i4_gcc/lib/libptscotcherrexit.a,/home/salzman/local/i4_gcc/lib/libscotch.a
>> ,/home/salzman/local/i4_gcc/lib/libscotcherr.a,/home/salzman/local/i4_gcc/lib/libscotcherrexit.a]"
>> --with-hypre=1 --download-hypre=yes --with-suitesparse=1
>> --with-suitesparse-include=/home/
>> salzman/local/i4_gcc/include
>> --with-suitesparse-lib="[/home/salzman/local/i4_gcc/lib/libsuitesparseconfig.so,/home/salzman/local/i4_gcc/lib/libumfpack.so,/home/salzman/local/i4_gcc/lib/libk
>>
>> lu.so,/home/salzman/local/i4_gcc/lib/libcholmod.so,/home/salzman/local/i4_gcc/lib/libspqr.so,/home/salzman/local/i4_gcc/lib/libcolamd.so,/home/salzman/local/i4_gcc/lib/libccolamd.so,/home/s
>> alzman/local/i4_gcc/lib/libcamd.so,/home/salzman/local/i4_gcc/lib/libamd.so,/home/salzman/local/i4_gcc/lib/libmetis.so]"
>> --download-suitesparse=no --with-python-exec=python3.12 --have-numpy
>> =1 ---with-petsc4py=1 ---with-petsc4py-test-np=4 ---with-mpi4py=1
>> --prefix=/home/salzman/local/i4_gcc/real_arithmetic COPTFLAGS="-O3 -g "
>> CXXOPTFLAGS="-O3 -g " FOPTFLAGS="-O3 -g "
>> [0]PETSC ERROR: #1 MatSetValues_SeqAIJ() at
>> /home/salzman/devel/PETSc/petsc/src/mat/impls/aij/seq/aij.c:426
>> [0]PETSC ERROR: #2 MatSetValues() at
>> /home/salzman/devel/PETSc/petsc/src/mat/interface/matrix.c:1543
>> [0]PETSC ERROR: #3 MatSetSeqMats_MPIAIJ() at
>> /home/salzman/devel/PETSc/petsc/src/mat/impls/aij/mpi/mpiov.c:2965
>> [0]PETSC ERROR: #4 MatCreateSubMatricesMPI_MPIXAIJ() at
>> /home/salzman/devel/PETSc/petsc/src/mat/impls/aij/mpi/mpiov.c:3163
>> [0]PETSC ERROR: #5 MatCreateSubMatricesMPI_MPIAIJ() at
>> /home/salzman/devel/PETSc/petsc/src/mat/impls/aij/mpi/mpiov.c:3196
>> [0]PETSC ERROR: #6 MatCreateSubMatricesMPI() at
>> /home/salzman/devel/PETSc/petsc/src/mat/interface/matrix.c:7293
>> [0]PETSC ERROR: #7 main() at subnb.c:181
>> [0]PETSC ERROR: No PETSc Option Table entries
>> [0]PETSC ERROR: ----------------End of Error Message -------send entire
>> error message to petsc-maint at mcs.anl.gov----------
>> --------------------------------------------------------------------------
>>
>> This message comes from executing the attached test (I simplified the
>> test by removing the block size from the matrix used for extraction,
>> compared to the July test). In proc_xx_output.txt, you will find the output
>> from the code execution with the -ok option (i.e. irow/idxr and icol/idxc
>> are the same, i.e. a square sub-block for colour 0 distributed across the
>> first two processes).
>>
>> Has expected in this case we obtain the 0,3,6,9 sub-block terms, which
>> are distributed across processes 0 and 1 (two rows per proc).
>>
>> When asking for rectangular sub-block (i.e. with no option) it crash with
>> column to large on process 0: 4 col max 3 ??? I ask for 4 rows and 2
>> columns in this process ???
>>
>> Otherwise, I mention the dense aspect of the matrix in ex183.c, because,
>> in this case, no matter what selection is requested, all terms are
>> non-null. If there is an issue with the way the selection is coded in the
>> user program, I think it will be masked thanks to the full graph
>> representation. However, this may not be the case — I should test it.
>>
>> I'll take a look at ex23.c.
>>
>> Thanks,
>>
>> A.S.
>>
>>
>>
>> Le 25/08/2025 à 17:55, Mark Adams a écrit :
>>
>> Ah, OK, never say never.
>>
>> MatCreateSubMatrices seems to support creating a new matrix with the
>> communicator of the IS.
>> It just needs to read from the input matrix and does not use it for
>> communication, so it can do that.
>>
>> As far as rectangular matrices, there is no reason not to support that
>> (the row IS and column IS can be distinct).
>> Can you send the whole error message?
>> There may not be a test that does this, but src/mat/tests/ex23.c looks
>> like it may be a rectangular matrix output.
>>
>> And, it should not matter if the input matrix has a 100% full sparse
>> matrix. It is still MatAIJ.
>> The semantics and API is the same for sparse or dense matrices.
>>
>> Thanks,
>> Mark
>>
>> On Mon, Aug 25, 2025 at 7:31 AM Alexis SALZMAN <
>> alexis.salzman at ec-nantes.fr> wrote:
>>
>>> Hi,
>>>
>>> Thanks for your answer, Mark. Perhaps MatCreateSubMatricesMPI is the
>>> only PETSc function that acts on a sub-communicator — I'm not sure — but
>>> it's clear that there's no ambiguity on that point. The first line of the
>>> documentation for that function states that it 'may live on subcomms'. This
>>> is confirmed by the 'src/mat/tests/ex183.c' test case. I used this test
>>> case to understand the function, which helped me with my code and the
>>> example I provided in my initial post. Unfortunately, in this example, the
>>> matrix from which the sub-matrices are extracted is dense, even though it
>>> uses a sparse structure. This does not clarify how to define sub-matrices
>>> when extracting from a sparse distributed matrix. Since my initial post, I
>>> have discovered that having more columns than rows can also result in the
>>> same error message.
>>>
>>> So, my questions boil down to:
>>>
>>> Can MatCreateSubMatricesMPI extract rectangular matrices from a square
>>> distributed sparse matrix?
>>>
>>> If not, the fact that only square matrices can be extracted in this
>>> context should perhaps be mentioned in the documentation.
>>>
>>> If so, I would be very grateful for any assistance in defining an IS
>>> pair in this context.
>>>
>>> Regards
>>>
>>> A.S.
>>> Le 27/07/2025 à 00:15, Mark Adams a écrit :
>>>
>>> First, you can not mix communicators in PETSc calls in general (ever?),
>>> but this error looks like you might be asking for a row from the matrix
>>> that does not exist.
>>> You should start with a PETSc example code. Test it and modify it to
>>> suit your needs.
>>>
>>> Good luck,
>>> Mark
>>>
>>> On Fri, Jul 25, 2025 at 9:31 AM Alexis SALZMAN <
>>> alexis.salzman at ec-nantes.fr> wrote:
>>>
>>>> Hi,
>>>>
>>>> As I am relatively new to Petsc, I may have misunderstood how to use
>>>> the
>>>> MatCreateSubMatricesMPI function. The attached code is tuned for three
>>>> processes and extracts one matrix for each colour of a subcommunicator
>>>> that has been created using the MPI_Comm_split function from an  MPIAij
>>>> matrix. The following error message appears when the code is set to its
>>>> default configuration (i.e. when a rectangular matrix is extracted with
>>>> more rows than columns for colour 0):
>>>>
>>>> [0]PETSC ERROR: --------------------- Error Message
>>>> --------------------------------------------------------------
>>>> [0]PETSC ERROR: Argument out of range
>>>> [0]PETSC ERROR: Column too large: col 4 max 3
>>>> [0]PETSC ERROR: See
>>>> https://urldefense.us/v3/__https://petsc.org/release/faq/__;!!G_uCfscf7eWS!ZqH097BZ0G0O3WI7RWrwIKFNpyk0czSWEqfusAeTlgEygAffwpgBUzsLw1TIoGkjZ3mYG-NRQxxFoxU4y8EyY0ofiz9I43Qwe0w$
>>>> for trouble shooting.
>>>> [0]PETSC ERROR: Petsc Release Version 3.22.2, unknown
>>>>
>>>> ... petsc git hash 2a89477b25f compiled on a dell i9 computer with Gcc
>>>> 14.3, mkl 2025.2, .....
>>>> [0]PETSC ERROR: #1 MatSetValues_SeqAIJ() at
>>>> ...petsc/src/mat/impls/aij/seq/aij.c:426
>>>> [0]PETSC ERROR: #2 MatSetValues() at
>>>> ...petsc/src/mat/interface/matrix.c:1543
>>>> [0]PETSC ERROR: #3 MatSetSeqMats_MPIAIJ() at
>>>> .../petsc/src/mat/impls/aij/mpi/mpiov.c:2965
>>>> [0]PETSC ERROR: #4 MatCreateSubMatricesMPI_MPIXAIJ() at
>>>> .../petsc/src/mat/impls/aij/mpi/mpiov.c:3163
>>>> [0]PETSC ERROR: #5 MatCreateSubMatricesMPI_MPIAIJ() at
>>>> .../petsc/src/mat/impls/aij/mpi/mpiov.c:3196
>>>> [0]PETSC ERROR: #6 MatCreateSubMatricesMPI() at
>>>> .../petsc/src/mat/interface/matrix.c:7293
>>>> [0]PETSC ERROR: #7 main() at sub.c:169
>>>>
>>>> When the '-ok' option is selected, the code extracts a square matrix
>>>> for
>>>> colour 0, which runs smoothly in this case. Selecting the '-trans'
>>>> option swaps the row and column selection indices, providing a
>>>> transposed submatrix smoothly. For colour 1, which uses only one
>>>> process
>>>> and is therefore sequential, rectangular extraction is OK regardless of
>>>> the shape.
>>>>
>>>> Is this dependency on the shape expected? Have I missed an important
>>>> tuning step somewhere?
>>>>
>>>> Thank you in advance for any clarification.
>>>>
>>>> Regards
>>>>
>>>> A.S.
>>>>
>>>> P.S.: I'm sorry, but as I'm leaving my office for the following weeks
>>>> this evening, I won't be very responsive during this period.
>>>>
>>>>
>>>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eHMhc5L1tNjyBrEpH0recWwhp5pSwBnZuh09zH8p-ZQyM8m3kxz8ryLVwi7nXr0NNKedHWcHfY5etJ5Zn86b$ 
> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eQa_exf2PCgmMQ0L4h9al-nkWsWRJJ1Zwkjm_qHJsqT0zwLzW7eMjKlkRssc6loRju6u04y4yp9L0U39POoDIvyQKcfmEYm5G4Y$>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eHMhc5L1tNjyBrEpH0recWwhp5pSwBnZuh09zH8p-ZQyM8m3kxz8ryLVwi7nXr0NNKedHWcHfY5etJ5Zn86b$  <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!eHMhc5L1tNjyBrEpH0recWwhp5pSwBnZuh09zH8p-ZQyM8m3kxz8ryLVwi7nXr0NNKedHWcHfY5etM4RQkAm$ >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20250827/9d45e286/attachment-0001.html>


More information about the petsc-users mailing list