[petsc-users] 'Inserting a new nonzero' issue on a reassembled matrix in parallel

Thibaut Appel t.appel17 at imperial.ac.uk
Tue Oct 22 12:37:37 CDT 2019


Hi both,

Please find attached a tiny example (in Fortran, sorry Matthew) that - I 
think - reproduces the problem we mentioned.

Let me know.

Thibaut


On 22/10/2019 17:48, Matthew Knepley wrote:
> On Tue, Oct 22, 2019 at 12:43 PM Thibaut Appel via petsc-users 
> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>> wrote:
>
>     Hi Hong,
>
>     Thank you for having a look, I copied/pasted your code snippet
>     into ex28.c and the error indeed appears if you change that
>     col[0]. That's because you did not allow a new non-zero location
>     in the matrix with the option MAT_NEW_NONZERO_LOCATION_ERR.
>
>     I spent the day debugging the code and already checked my calls to
>     MatSetValues:
>
>     For all MatSetValues calls corresponding to the row/col location
>     in the error messages in the subsequent assembly, the numerical
>     value associated with that row/col was exactly (0.0,0.0) (complex
>     arithmetic) so it shouldn't be inserted w.r.t. the option
>     MAT_IGNORE_ZERO_ENTRIES. It seems MatSetValues still did it anyway.
>
> Okay, lets solve this problem first. You say that
>
>   - You called MatSetOption(A,  MAT_IGNORE_ZERO_ENTRIES, PETSC_TRUE)
>   - You called MatSetValues(A, ,,,, ADD_VALUES, ..., val) and val had 
> a complex 0 in it
>   - PETSc tried to insert the complex 0
>
> This should be really easy to test in a tiny example. Do you mind 
> making it? If its broken, I will fix it.
>
>   Thanks,
>
>     Matt
>
>     I was able to solve the problem by adding
>     MatSetOption(L,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) after my
>     first assembly. However I don't know why it fixed it as the manual
>     seems to say this is just for efficiency purposes
>
>     It "should be specified after the first matrix has been fully
>     assembled. This option ensures that certain data structures and
>     communication information will be reused (instead of regenerated
>     during successive steps, thereby increasing efficiency"
>
>     So I'm still puzzled by why I got that error in the first place.
>     Unless "regenerated" implies resetting some attributes of the
>     preallocated non-zero structure / first assembly?
>
>
>     Thibaut
>
>
>     On 22/10/2019 17:06, Zhang, Hong wrote:
>>     Thibaut:
>>     Check your code on MatSetValues(), likely you set a value "to a
>>     new nonzero at global row/column (200, 160) into matrix" L.
>>     I tested petsc/src/mat/examples/tests/ex28.c by adding
>>     @@ -95,6 +95,26 @@ int main(int argc,char **args)
>>        /* Compute numeric factors using same F, then solve */
>>        for (k = 0; k < num_numfac; k++) {
>>          /* Get numeric factor of A[k] */
>>     +    if (k==0) {
>>     +      ierr = MatZeroEntries(A[0]);CHKERRQ(ierr);
>>     +      for (i=rstart; i<rend; i++) {
>>     +        col[0] = i-1; col[1] = i; col[2] = i+1;
>>     +        if (i == 0) {
>>     +          ierr =
>>     MatSetValues(A[k],1,&i,2,col+1,value+1,INSERT_VALUES);CHKERRQ(ierr);
>>     +        } else if (i == N-1) {
>>     +          ierr =
>>     MatSetValues(A[k],1,&i,2,col,value,INSERT_VALUES);CHKERRQ(ierr);
>>     +        } else {
>>     +          ierr =
>>     MatSetValues(A[k],1,&i,3,col,value,INSERT_VALUES);CHKERRQ(ierr);
>>     +        }
>>     +      }
>>     +      if (!rank) {
>>     +      i = N - 1; col[0] = N - 1;
>>     +        ierr =
>>     MatSetValues(A[k],1,&i,1,col,value,INSERT_VALUES);CHKERRQ(ierr);
>>     +      }
>>     +      ierr =
>>     MatAssemblyBegin(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>     +      ierr = MatAssemblyEnd(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>     +    }
>>     +
>>
>>     It works in both sequential and parallel. If I set col[0] = 0,
>>     then I get the same error as yours.
>>     Hong
>>
>>         Dear PETSc developers,
>>
>>         I'm extending a validated matrix preallocation/assembly part
>>         of my code to solve multiple linear systems with MUMPS at
>>         each iteration of a main loop, following the example
>>         src/mat/examples/tests/ex28.c that Hong Zhang added a few
>>         weeks ago. The difference is that I'm using just 1 matrix to
>>         solve different systems.
>>
>>         I'm trying to investigate a nasty bug arising when I try to
>>         assemble "for a second time" that MPIAIJ matrix. The issue
>>         arises only in parallel, serial works fine.
>>
>>         Creating 1 MPIAIJ matrix, preallocating it "perfectly" with
>>         the case where I have the fewest zero entries in the non-zero
>>         structure, before getting its symbolic factorization.
>>
>>         Further in the main loop, I'm solely changing its entries
>>         *retaining the non-zero structure*.
>>
>>         Here is the simplified Fortran code I'm using:
>>
>>         ! Fill (M,N) case to ensure all non-zero entries are preallocated
>>         CALL set_equations(M,N)
>>
>>         CALL alloc_matrix(L)
>>           ! --> Call MatSeqAIJSetPreallocation/MatMPIAIJSetPreallocation
>>           ! --> Sets MAT_IGNORE_ZERO_ENTRIES,
>>         MAT_NEW_NONZERO_ALLOCATION_ERR, MAT_NO_OFF_PROC_ENTRIES to true
>>
>>         CALL assemble_matrix(L)
>>           ! --> Calls MatSetValues with ADD_VALUES
>>           ! --> Call MatAssemblyBegin/MatAssemblyEnd
>>
>>         ! Tell PETSc that new non-zero insertions in matrix are forbidden
>>         CALL MatSetOption(L,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr)
>>
>>         CALL set_mumps_parameters()
>>
>>         ! Get symbolic LU factorization using MUMPS
>>         CALL MatGetFactor(L,MATSOLVERMUMPS,MAT_FACTOR_LU,F,ierr)
>>         CALL MatGetOrdering(L,MATORDERINGNATURAL,rperm,cperm,ierr)
>>         CALL MatLUFactorSymbolic(F,L,rperm,cperm,info,ierr)
>>
>>         CALL initialize_right_hand_sides()
>>
>>         ! Zero matrix entries
>>         CALL MatZeroEntries(L,ierr)
>>
>>         ! Main loop
>>         DO itr=1, maxitr
>>
>>           DO m = 1, M
>>             DO n = 1, N
>>
>>             CALL set_equations(m,n)
>>             CALL assemble_matrix(L) ! ERROR HERE when m=1, n=1, CRASH
>>         IN MatSetValues call
>>
>>             ! Solving the linear system associated with (m,n)
>>             CALL MatLUFactorNumeric(F,L,info,ierr)
>>             CALL MatSolve(F,v_rhs(m,n),v_sol(m,n),ierr)
>>
>>             ! Process v_rhs's from v_sol's for next iteration
>>
>>             CALL MatZeroEntries(L,ierr)
>>
>>             END DO
>>           END DO
>>
>>         END DO
>>
>>
>>         Testing on a small case, the error I get is
>>
>>         [1]PETSC ERROR: --------------------- Error Message
>>         --------------------------------------------------------------
>>         [1]PETSC ERROR: Argument out of range
>>         [1]PETSC ERROR: Inserting a new nonzero at global row/column
>>         (200, 160) into matrix
>>         [1]PETSC ERROR: See
>>         https://www.mcs.anl.gov/petsc/documentation/faq.html
>>         <https://www.mcs.anl.gov/petsc/documentation/faq.html> for
>>         trouble shooting.
>>         [1]PETSC ERROR: Petsc Release Version 3.12.0, unknown
>>         [1]PETSC ERROR: Configure options --PETSC_ARCH=cplx_gcc_debug
>>         --with-scalar-type=complex --with-precision=double
>>         --with-debugging=1 --with-valgrind=1 --with-debugger=gdb
>>         --with-fortran-kernels=1 --download-mpich
>>         --download-fblaslapack --download-scalapack --download-metis
>>         --download-parmetis --download-ptscotch --download-mumps
>>         --download-slepc --COPTFLAGS="-O0 -g" --CXXOPTFLAGS="-O0 -g"
>>         --FOPTFLAGS="-O0 -g -fbacktrace"
>>         [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 634 in
>>         /home/thibaut/Packages/petsc/src/mat/impls/aij/mpi/mpiaij.c
>>         [1]PETSC ERROR: #2 MatSetValues() line 1375 in
>>         /home/thibaut/Packages/petsc/src/mat/interface/matrix.c
>>         [1]PETSC ERROR: #3 User provided function() line 0 in User file
>>         application called MPI_Abort(MPI_COMM_SELF, 63) - process 0
>>
>>
>>         which I don't understand. That element was not in the
>>         non-zero structure and wasn't preallocated. I printed the
>>         value to be inserted at this location (200,160) and it is
>>         exactly (0.0000000000000000,0.0000000000000000) so this entry
>>         should not be inserted due to MAT_IGNORE_ZERO_ENTRIES,
>>         however it seems it is. I'm using ADD_VALUES in MatSetValues
>>         but it is the only call where (200,160) is inserted.
>>
>>
>>             - I zero the matrix entries with MatZeroEntries which
>>         retains the non-zero structure (checked when I print the
>>         matrix) but tried to comment the corresponding calls.
>>
>>             - I tried to set MAT_NEW_NONZERO_LOCATION_ERR AND
>>         MAT_NEW_NONZERO_ALLOCATION_ERR to PETSC_FALSE without effect.
>>
>>
>>         Perhaps there's something fundamentally wrong in my approach,
>>         in any case would you have any suggestions to identify the
>>         exact problem?
>>
>>         Using PETSc 3.12.0. Thanks for your support,
>>
>>
>>         Thibaut
>>
>
>
> -- 
> What most experimenters take for granted before they begin their 
> experiments is infinitely more interesting than any results to which 
> their experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 
> <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191022/4d1284cd/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.F90
Type: text/x-fortran
Size: 2692 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191022/4d1284cd/attachment.bin>


More information about the petsc-users mailing list