[petsc-users] 'Inserting a new nonzero' issue on a reassembled matrix in parallel

Matthew Knepley knepley at gmail.com
Wed Oct 23 20:31:19 CDT 2019


On Tue, Oct 22, 2019 at 1:37 PM Thibaut Appel <t.appel17 at imperial.ac.uk>
wrote:

> Hi both,
>
> Please find attached a tiny example (in Fortran, sorry Matthew) that - I
> think - reproduces the problem we mentioned.
>
> Let me know.
>
> Okay, I converted to C so I could understand, and it runs fine for me:

master *:~/Downloads/tmp$ PETSC_ARCH=arch-master-complex-debug make main

/PETSc3/petsc/bin/mpicc -o main.o -c -Wall -Wwrite-strings
-Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector
-Qunused-arguments -fvisibility=hidden -g3   -I/PETSc3/petsc/petsc-dev/include
-I/PETSc3/petsc/petsc-dev/arch-master-complex-debug/include
-I/PETSc3/petsc/include -I/opt/X11/include    `pwd`/main.c

/PETSc3/petsc/bin/mpicc -Wl,-multiply_defined,suppress
-Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs
-Wl,-search_paths_first -Wl,-no_compact_unwind  -Wall -Wwrite-strings
-Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector
-Qunused-arguments -fvisibility=hidden -g3  -o main main.o
-Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
-L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
-Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
-L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
-Wl,-rpath,/PETSc3/petsc/lib -L/PETSc3/petsc/lib -Wl,-rpath,/opt/X11/lib
-L/opt/X11/lib
-Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0
-L/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0
-Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lfftw3_mpi -lfftw3
-llapack -lblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco
-lparmetis -lmetis -ltriangle -lz -lX11 -lctetgen -lstdc++ -ldl -lmpichf90
-lpmpich -lmpich -lopa -lmpl -lpthread -lgfortran -lgcc_s.10.5 -lstdc++ -ldl

master *:~/Downloads/tmp$ ./main

After first assembly:

Mat Object: 1 MPI processes

  type: seqaij

row 0: (0, 1. + 1. i)

row 1: (1, 1. + 1. i)

row 2: (2, 1. + 1. i)

row 3: (3, 1. + 1. i)

row 4: (4, 1. + 1. i)

row 5: (5, 1. + 1. i)

row 6: (6, 1. + 1. i)

row 7: (7, 1. + 1. i)

row 8: (8, 1. + 1. i)

row 9: (9, 1. + 1. i)

After second assembly:

Mat Object: 1 MPI processes

  type: seqaij

row 0: (0, 1. + 1. i)

row 1: (1, 1. + 1. i)

row 2: (2, 1. + 1. i)

row 3: (3, 1. + 1. i)

row 4: (4, 1. + 1. i)

row 5: (5, 1. + 1. i)

row 6: (6, 1. + 1. i)

row 7: (7, 1. + 1. i)

row 8: (8, 1. + 1. i)

row 9: (9, 1. + 1. i)
row 0 col: 9 val: 0. + 0. i

I attach my code.  So then I ran your Fortran:

/PETSc3/petsc/bin/mpif90 -c -Wall -ffree-line-length-0
-Wno-unused-dummy-argument -Wno-unused-variable -g
-I/PETSc3/petsc/petsc-dev/include
-I/PETSc3/petsc/petsc-dev/arch-master-complex-debug/include
-I/PETSc3/petsc/include -I/opt/X11/include      -o main2.o main2.F90

/PETSc3/petsc/bin/mpif90 -Wl,-multiply_defined,suppress
-Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs
-Wl,-search_paths_first -Wl,-no_compact_unwind  -Wall -ffree-line-length-0
-Wno-unused-dummy-argument -Wno-unused-variable -g   -o main2 main2.o
-Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
-L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
-Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
-L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
-Wl,-rpath,/PETSc3/petsc/lib -L/PETSc3/petsc/lib -Wl,-rpath,/opt/X11/lib
-L/opt/X11/lib
-Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0
-L/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0
-Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lfftw3_mpi -lfftw3
-llapack -lblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lchaco
-lparmetis -lmetis -ltriangle -lz -lX11 -lctetgen -lstdc++ -ldl -lmpichf90
-lpmpich -lmpich -lopa -lmpl -lpthread -lgfortran -lgcc_s.10.5 -lstdc++ -ldl

master *:~/Downloads/tmp$ ./main2

 After first assembly:

Mat Object: 1 MPI processes

  type: seqaij

row 0: (0, 1.)

row 1: (1, 1.)

row 2: (2, 1.)

row 3: (3, 1.)

row 4: (4, 1.)

row 5: (5, 1.)

row 6: (6, 1.)

row 7: (7, 1.)

row 8: (8, 1.)

row 9: (9, 1.)

 After second assembly:

Mat Object: 1 MPI processes

  type: seqaij

row 0: (0, 1.)

row 1: (1, 1.)

row 2: (2, 1.)

row 3: (3, 1.)

row 4: (4, 1.)

row 5: (5, 1.)

row 6: (6, 1.)

row 7: (7, 1.)

row 8: (8, 1.)

row 9: (9, 1.)

 row:  0 col:  9 val:  0.000000000000000000E+00  0.000000000000000000E+00

I am not seeing an error. Am I not running it correctly?

  Thanks,

     MAtt

> Thibaut
>
>
> On 22/10/2019 17:48, Matthew Knepley wrote:
>
> On Tue, Oct 22, 2019 at 12:43 PM Thibaut Appel via petsc-users <
> petsc-users at mcs.anl.gov> wrote:
>
>> Hi Hong,
>>
>> Thank you for having a look, I copied/pasted your code snippet into
>> ex28.c and the error indeed appears if you change that col[0]. That's
>> because you did not allow a new non-zero location in the matrix with the
>> option MAT_NEW_NONZERO_LOCATION_ERR.
>>
>> I spent the day debugging the code and already checked my calls to
>> MatSetValues:
>>
>> For all MatSetValues calls corresponding to the row/col location in the
>> error messages in the subsequent assembly, the numerical value associated
>> with that row/col was exactly (0.0,0.0) (complex arithmetic) so it
>> shouldn't be inserted w.r.t. the option MAT_IGNORE_ZERO_ENTRIES. It seems
>> MatSetValues still did it anyway.
>>
> Okay, lets solve this problem first. You say that
>
>   - You called MatSetOption(A,  MAT_IGNORE_ZERO_ENTRIES, PETSC_TRUE)
>   - You called MatSetValues(A, ,,,, ADD_VALUES, ..., val) and val had a
> complex 0 in it
>   - PETSc tried to insert the complex 0
>
> This should be really easy to test in a tiny example. Do you mind making
> it? If its broken, I will fix it.
>
>   Thanks,
>
>     Matt
>
>> I was able to solve the problem by adding
>> MatSetOption(L,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) after my first
>> assembly. However I don't know why it fixed it as the manual seems to say
>> this is just for efficiency purposes
>>
>> It "should be specified after the first matrix has been fully assembled.
>> This option ensures that certain data structures and communication
>> information will be reused (instead of regenerated during successive steps,
>> thereby increasing efficiency"
>>
>> So I'm still puzzled by why I got that error in the first place. Unless
>> "regenerated" implies resetting some attributes of the preallocated
>> non-zero structure / first assembly?
>>
>>
>> Thibaut
>>
>>
>> On 22/10/2019 17:06, Zhang, Hong wrote:
>>
>> Thibaut:
>> Check your code on MatSetValues(), likely you set a value "to a new
>> nonzero at global row/column (200, 160) into matrix" L.
>> I tested petsc/src/mat/examples/tests/ex28.c by adding
>> @@ -95,6 +95,26 @@ int main(int argc,char **args)
>>    /* Compute numeric factors using same F, then solve */
>>    for (k = 0; k < num_numfac; k++) {
>>      /* Get numeric factor of A[k] */
>> +    if (k==0) {
>> +      ierr = MatZeroEntries(A[0]);CHKERRQ(ierr);
>> +      for (i=rstart; i<rend; i++) {
>> +        col[0] = i-1; col[1] = i; col[2] = i+1;
>> +        if (i == 0) {
>> +          ierr =
>> MatSetValues(A[k],1,&i,2,col+1,value+1,INSERT_VALUES);CHKERRQ(ierr);
>> +        } else if (i == N-1) {
>> +          ierr =
>> MatSetValues(A[k],1,&i,2,col,value,INSERT_VALUES);CHKERRQ(ierr);
>> +        } else {
>> +          ierr =
>> MatSetValues(A[k],1,&i,3,col,value,INSERT_VALUES);CHKERRQ(ierr);
>> +        }
>> +      }
>> +      if (!rank) {
>> +      i = N - 1; col[0] = N - 1;
>> +        ierr =
>> MatSetValues(A[k],1,&i,1,col,value,INSERT_VALUES);CHKERRQ(ierr);
>> +      }
>> +      ierr = MatAssemblyBegin(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>> +      ierr = MatAssemblyEnd(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>> +    }
>> +
>>
>> It works in both sequential and parallel. If I set col[0] = 0, then I get
>> the same error as yours.
>> Hong
>>
>>> Dear PETSc developers,
>>>
>>> I'm extending a validated matrix preallocation/assembly part of my code
>>> to solve multiple linear systems with MUMPS at each iteration of a main
>>> loop, following the example src/mat/examples/tests/ex28.c that Hong Zhang
>>> added a few weeks ago. The difference is that I'm using just 1 matrix to
>>> solve different systems.
>>>
>>> I'm trying to investigate a nasty bug arising when I try to assemble
>>> "for a second time" that MPIAIJ matrix. The issue arises only in parallel,
>>> serial works fine.
>>>
>>> Creating 1 MPIAIJ matrix, preallocating it "perfectly" with the case
>>> where I have the fewest zero entries in the non-zero structure, before
>>> getting its symbolic factorization.
>>>
>>> Further in the main loop, I'm solely changing its entries *retaining
>>> the non-zero structure*.
>>>
>>> Here is the simplified Fortran code I'm using:
>>>
>>> ! Fill (M,N) case to ensure all non-zero entries are preallocated
>>> CALL set_equations(M,N)
>>>
>>> CALL alloc_matrix(L)
>>>   ! --> Call MatSeqAIJSetPreallocation/MatMPIAIJSetPreallocation
>>>   ! --> Sets MAT_IGNORE_ZERO_ENTRIES, MAT_NEW_NONZERO_ALLOCATION_ERR,
>>> MAT_NO_OFF_PROC_ENTRIES to true
>>>
>>> CALL assemble_matrix(L)
>>>   ! --> Calls MatSetValues with ADD_VALUES
>>>   ! --> Call MatAssemblyBegin/MatAssemblyEnd
>>>
>>> ! Tell PETSc that new non-zero insertions in matrix are forbidden
>>> CALL MatSetOption(L,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr)
>>>
>>> CALL set_mumps_parameters()
>>>
>>> ! Get symbolic LU factorization using MUMPS
>>> CALL MatGetFactor(L,MATSOLVERMUMPS,MAT_FACTOR_LU,F,ierr)
>>> CALL MatGetOrdering(L,MATORDERINGNATURAL,rperm,cperm,ierr)
>>> CALL MatLUFactorSymbolic(F,L,rperm,cperm,info,ierr)
>>>
>>> CALL initialize_right_hand_sides()
>>>
>>> ! Zero matrix entries
>>> CALL MatZeroEntries(L,ierr)
>>>
>>> ! Main loop
>>> DO itr=1, maxitr
>>>
>>>   DO m = 1, M
>>>     DO n = 1, N
>>>
>>>     CALL set_equations(m,n)
>>>     CALL assemble_matrix(L) ! ERROR HERE when m=1, n=1, CRASH IN
>>> MatSetValues call
>>>
>>>     ! Solving the linear system associated with (m,n)
>>>     CALL MatLUFactorNumeric(F,L,info,ierr)
>>>     CALL MatSolve(F,v_rhs(m,n),v_sol(m,n),ierr)
>>>
>>>     ! Process v_rhs's from v_sol's for next iteration
>>>
>>>     CALL MatZeroEntries(L,ierr)
>>>
>>>     END DO
>>>   END DO
>>>
>>> END DO
>>>
>>>
>>> Testing on a small case, the error I get is
>>> [1]PETSC ERROR: --------------------- Error Message
>>> --------------------------------------------------------------
>>> [1]PETSC ERROR: Argument out of range
>>> [1]PETSC ERROR: Inserting a new nonzero at global row/column (200, 160)
>>> into matrix
>>> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
>>> for trouble shooting.
>>> [1]PETSC ERROR: Petsc Release Version 3.12.0, unknown
>>> [1]PETSC ERROR: Configure options --PETSC_ARCH=cplx_gcc_debug
>>> --with-scalar-type=complex --with-precision=double --with-debugging=1
>>> --with-valgrind=1 --with-debugger=gdb --with-fortran-kernels=1
>>> --download-mpich --download-fblaslapack --download-scalapack
>>> --download-metis --download-parmetis --download-ptscotch --download-mumps
>>> --download-slepc --COPTFLAGS="-O0 -g" --CXXOPTFLAGS="-O0 -g"
>>> --FOPTFLAGS="-O0 -g -fbacktrace"
>>> [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 634 in
>>> /home/thibaut/Packages/petsc/src/mat/impls/aij/mpi/mpiaij.c
>>> [1]PETSC ERROR: #2 MatSetValues() line 1375 in
>>> /home/thibaut/Packages/petsc/src/mat/interface/matrix.c
>>> [1]PETSC ERROR: #3 User provided function() line 0 in User file
>>> application called MPI_Abort(MPI_COMM_SELF, 63) - process 0
>>>
>>>
>>>
>>> which I don't understand. That element was not in the non-zero structure
>>> and wasn't preallocated. I printed the value to be inserted at this
>>> location (200,160) and it is exactly
>>> (0.0000000000000000,0.0000000000000000) so this entry should not be
>>> inserted due to MAT_IGNORE_ZERO_ENTRIES, however it seems it is. I'm using
>>> ADD_VALUES in MatSetValues but it is the only call where (200,160) is
>>> inserted.
>>>
>>>
>>>     - I zero the matrix entries with MatZeroEntries which retains the
>>> non-zero structure (checked when I print the matrix) but tried to comment
>>> the corresponding calls.
>>>
>>>     - I tried to set MAT_NEW_NONZERO_LOCATION_ERR AND
>>> MAT_NEW_NONZERO_ALLOCATION_ERR to PETSC_FALSE without effect.
>>>
>>>
>>> Perhaps there's something fundamentally wrong in my approach, in any
>>> case would you have any suggestions to identify the exact problem?
>>>
>>> Using PETSc 3.12.0. Thanks for your support,
>>>
>>>
>>> Thibaut
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191023/6eaad5c0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.c
Type: application/octet-stream
Size: 2786 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191023/6eaad5c0/attachment-0001.obj>


More information about the petsc-users mailing list