[petsc-users] 'Inserting a new nonzero' issue on a reassembled matrix in parallel
Thibaut Appel
t.appel17 at imperial.ac.uk
Fri Oct 25 08:56:14 CDT 2019
Hi Barry,
I was referring to:
row 0: (0, 1.)
row 1: (1, 1.)
row 2: (2, 1.)
row 3: (3, 1.)
row 4: (4, 1.)
row 5: (5, 1.)
row 6: (6, 1.)
row 7: (7, 1.)
row 8: (8, 1.)
row 9: (9, 1.)
*row: 0 col: 9 val: 0.000000000000000000E+00 0.000000000000000000E+00*
*
*
The last line. But I was probably mistaken - if it was inserted it would
have been
*row 0: (0, 1.), (9, 0.)*
on the first line instead?
Thibaut*
*
On 25/10/2019 14:41, Smith, Barry F. wrote:
>
>
> > On Oct 24, 2019, at 5:09 AM, Thibaut Appel
> <t.appel17 at imperial.ac.uk> wrote:
> >
> > Hi Matthew,
> >
> > Thanks for having a look, your example runs just like mine in Fortran.
> >
> > In serial, the value (0.0,0.0) was inserted whereas it shouldn't have.
>
> I'm sorry, I don't see this for the serial case:
>
> $ petscmpiexec -n 1 ./ex241f
> Mat Object: 1 MPI processes
> type: seqaij
> row 0: (0, 2.)
> row 1: (1, 2.)
> row 2: (2, 2.)
> row 3: (3, 2.)
> row 4: (4, 2.)
> row 5: (5, 2.)
> row 6: (6, 2.)
> row 7: (7, 2.)
> row 8: (8, 2.)
> row 9: (9, 2.)
>
> >
> Where is the "(0.0,0.0) was inserted whereas it shouldn't have."?
>
>
> Barry
>
>
> > In parallel, you'll see that an error "Inserting a new nonzero at
> global row/column" is triggered.
> >
> > In both cases, MatSetValues tries to insert a zero value whereas
> IGNORE_ZERO_ENTRIES was set. That's what Barry is looking into, if I'm
> not mistaken.
> >
> >
> >
> > Thibaut
> >
> >
> >
> > On 24/10/2019 02:31, Matthew Knepley wrote:
> >> On Tue, Oct 22, 2019 at 1:37 PM Thibaut Appel
> <t.appel17 at imperial.ac.uk> wrote:
> >> Hi both,
> >>
> >> Please find attached a tiny example (in Fortran, sorry Matthew)
> that - I think - reproduces the problem we mentioned.
> >>
> >> Let me know.
> >>
> >> Okay, I converted to C so I could understand, and it runs fine for me:
> >>
> >> master *:~/Downloads/tmp$ PETSC_ARCH=arch-master-complex-debug make
> main
> >> /PETSc3/petsc/bin/mpicc -o main.o -c -Wall -Wwrite-strings
> -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector
> -Qunused-arguments -fvisibility=hidden -g3
> -I/PETSc3/petsc/petsc-dev/include
> -I/PETSc3/petsc/petsc-dev/arch-master-complex-debug/include
> -I/PETSc3/petsc/include -I/opt/X11/include `pwd`/main.c
> >> /PETSc3/petsc/bin/mpicc -Wl,-multiply_defined,suppress
> -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs
> -Wl,-search_paths_first -Wl,-no_compact_unwind -Wall -Wwrite-strings
> -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector
> -Qunused-arguments -fvisibility=hidden -g3 -o main main.o
> -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
> -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
> -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
> -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
> -Wl,-rpath,/PETSc3/petsc/lib -L/PETSc3/petsc/lib
> -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib
> -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0
> -L/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0
> -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lfftw3_mpi -lfftw3
> -llapack -lblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5
> -lchaco -lparmetis -lmetis -ltriangle -lz -lX11 -lctetgen -lstdc++
> -ldl -lmpichf90 -lpmpich -lmpich -lopa -lmpl -lpthread -lgfortran
> -lgcc_s.10.5 -lstdc++ -ldl
> >> master *:~/Downloads/tmp$ ./main
> >> After first assembly:
> >> Mat Object: 1 MPI processes
> >> type: seqaij
> >> row 0: (0, 1. + 1. i)
> >> row 1: (1, 1. + 1. i)
> >> row 2: (2, 1. + 1. i)
> >> row 3: (3, 1. + 1. i)
> >> row 4: (4, 1. + 1. i)
> >> row 5: (5, 1. + 1. i)
> >> row 6: (6, 1. + 1. i)
> >> row 7: (7, 1. + 1. i)
> >> row 8: (8, 1. + 1. i)
> >> row 9: (9, 1. + 1. i)
> >> After second assembly:
> >> Mat Object: 1 MPI processes
> >> type: seqaij
> >> row 0: (0, 1. + 1. i)
> >> row 1: (1, 1. + 1. i)
> >> row 2: (2, 1. + 1. i)
> >> row 3: (3, 1. + 1. i)
> >> row 4: (4, 1. + 1. i)
> >> row 5: (5, 1. + 1. i)
> >> row 6: (6, 1. + 1. i)
> >> row 7: (7, 1. + 1. i)
> >> row 8: (8, 1. + 1. i)
> >> row 9: (9, 1. + 1. i)
> >> row 0 col: 9 val: 0. + 0. i
> >>
> >> I attach my code. So then I ran your Fortran:
> >>
> >> /PETSc3/petsc/bin/mpif90 -c -Wall -ffree-line-length-0
> -Wno-unused-dummy-argument -Wno-unused-variable -g
> -I/PETSc3/petsc/petsc-dev/include
> -I/PETSc3/petsc/petsc-dev/arch-master-complex-debug/include
> -I/PETSc3/petsc/include -I/opt/X11/include -o main2.o main2.F90
> >> /PETSc3/petsc/bin/mpif90 -Wl,-multiply_defined,suppress
> -Wl,-multiply_defined -Wl,suppress -Wl,-commons,use_dylibs
> -Wl,-search_paths_first -Wl,-no_compact_unwind -Wall
> -ffree-line-length-0 -Wno-unused-dummy-argument -Wno-unused-variable
> -g -o main2 main2.o
> -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
> -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
> -Wl,-rpath,/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
> -L/PETSc3/petsc/petsc-dev/arch-master-complex-debug/lib
> -Wl,-rpath,/PETSc3/petsc/lib -L/PETSc3/petsc/lib
> -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib
> -Wl,-rpath,/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0
> -L/usr/local/lib/gcc/x86_64-apple-darwin10.4.0/4.6.0
> -Wl,-rpath,/usr/local/lib -L/usr/local/lib -lpetsc -lfftw3_mpi -lfftw3
> -llapack -lblas -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5
> -lchaco -lparmetis -lmetis -ltriangle -lz -lX11 -lctetgen -lstdc++
> -ldl -lmpichf90 -lpmpich -lmpich -lopa -lmpl -lpthread -lgfortran
> -lgcc_s.10.5 -lstdc++ -ldl
> >> master *:~/Downloads/tmp$ ./main2
> >> After first assembly:
> >> Mat Object: 1 MPI processes
> >> type: seqaij
> >> row 0: (0, 1.)
> >> row 1: (1, 1.)
> >> row 2: (2, 1.)
> >> row 3: (3, 1.)
> >> row 4: (4, 1.)
> >> row 5: (5, 1.)
> >> row 6: (6, 1.)
> >> row 7: (7, 1.)
> >> row 8: (8, 1.)
> >> row 9: (9, 1.)
> >> After second assembly:
> >> Mat Object: 1 MPI processes
> >> type: seqaij
> >> row 0: (0, 1.)
> >> row 1: (1, 1.)
> >> row 2: (2, 1.)
> >> row 3: (3, 1.)
> >> row 4: (4, 1.)
> >> row 5: (5, 1.)
> >> row 6: (6, 1.)
> >> row 7: (7, 1.)
> >> row 8: (8, 1.)
> >> row 9: (9, 1.)
> >> row: 0 col: 9 val: 0.000000000000000000E+00
> 0.000000000000000000E+00
> >>
> >> I am not seeing an error. Am I not running it correctly?
> >>
> >> Thanks,
> >>
> >> MAtt
> >> Thibaut
> >>
> >>
> >>
> >> On 22/10/2019 17:48, Matthew Knepley wrote:
> >>> On Tue, Oct 22, 2019 at 12:43 PM Thibaut Appel via petsc-users
> <petsc-users at mcs.anl.gov> wrote:
> >>> Hi Hong,
> >>>
> >>> Thank you for having a look, I copied/pasted your code snippet
> into ex28.c and the error indeed appears if you change that col[0].
> That's because you did not allow a new non-zero location in the matrix
> with the option MAT_NEW_NONZERO_LOCATION_ERR.
> >>>
> >>> I spent the day debugging the code and already checked my calls to
> MatSetValues:
> >>>
> >>> For all MatSetValues calls corresponding to the row/col location
> in the error messages in the subsequent assembly, the numerical value
> associated with that row/col was exactly (0.0,0.0) (complex
> arithmetic) so it shouldn't be inserted w.r.t. the option
> MAT_IGNORE_ZERO_ENTRIES. It seems MatSetValues still did it anyway.
> >>>
> >>> Okay, lets solve this problem first. You say that
> >>>
> >>> - You called MatSetOption(A, MAT_IGNORE_ZERO_ENTRIES, PETSC_TRUE)
> >>> - You called MatSetValues(A, ,,,, ADD_VALUES, ..., val) and val
> had a complex 0 in it
> >>> - PETSc tried to insert the complex 0
> >>>
> >>> This should be really easy to test in a tiny example. Do you mind
> making it? If its broken, I will fix it.
> >>>
> >>> Thanks,
> >>>
> >>> Matt
> >>> I was able to solve the problem by adding
> MatSetOption(L,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) after my first
> assembly. However I don't know why it fixed it as the manual seems to
> say this is just for efficiency purposes
> >>>
> >>> It "should be specified after the first matrix has been fully
> assembled. This option ensures that certain data structures and
> communication information will be reused (instead of regenerated
> during successive steps, thereby increasing efficiency"
> >>>
> >>> So I'm still puzzled by why I got that error in the first place.
> Unless "regenerated" implies resetting some attributes of the
> preallocated non-zero structure / first assembly?
> >>>
> >>>
> >>>
> >>> Thibaut
> >>>
> >>>
> >>>
> >>> On 22/10/2019 17:06, Zhang, Hong wrote:
> >>>> Thibaut:
> >>>> Check your code on MatSetValues(), likely you set a value "to a
> new nonzero at global row/column (200, 160) into matrix" L.
> >>>> I tested petsc/src/mat/examples/tests/ex28.c by adding
> >>>> @@ -95,6 +95,26 @@ int main(int argc,char **args)
> >>>> /* Compute numeric factors using same F, then solve */
> >>>> for (k = 0; k < num_numfac; k++) {
> >>>> /* Get numeric factor of A[k] */
> >>>> + if (k==0) {
> >>>> + ierr = MatZeroEntries(A[0]);CHKERRQ(ierr);
> >>>> + for (i=rstart; i<rend; i++) {
> >>>> + col[0] = i-1; col[1] = i; col[2] = i+1;
> >>>> + if (i == 0) {
> >>>> + ierr =
> MatSetValues(A[k],1,&i,2,col+1,value+1,INSERT_VALUES);CHKERRQ(ierr);
> >>>> + } else if (i == N-1) {
> >>>> + ierr =
> MatSetValues(A[k],1,&i,2,col,value,INSERT_VALUES);CHKERRQ(ierr);
> >>>> + } else {
> >>>> + ierr =
> MatSetValues(A[k],1,&i,3,col,value,INSERT_VALUES);CHKERRQ(ierr);
> >>>> + }
> >>>> + }
> >>>> + if (!rank) {
> >>>> + i = N - 1; col[0] = N - 1;
> >>>> + ierr =
> MatSetValues(A[k],1,&i,1,col,value,INSERT_VALUES);CHKERRQ(ierr);
> >>>> + }
> >>>> + ierr =
> MatAssemblyBegin(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
> >>>> + ierr = MatAssemblyEnd(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
> >>>> + }
> >>>> +
> >>>>
> >>>> It works in both sequential and parallel. If I set col[0] = 0,
> then I get the same error as yours.
> >>>> Hong
> >>>> Dear PETSc developers,
> >>>>
> >>>> I'm extending a validated matrix preallocation/assembly part of
> my code to solve multiple linear systems with MUMPS at each iteration
> of a main loop, following the example src/mat/examples/tests/ex28.c
> that Hong Zhang added a few weeks ago. The difference is that I'm
> using just 1 matrix to solve different systems.
> >>>>
> >>>> I'm trying to investigate a nasty bug arising when I try to
> assemble "for a second time" that MPIAIJ matrix. The issue arises only
> in parallel, serial works fine.
> >>>>
> >>>> Creating 1 MPIAIJ matrix, preallocating it "perfectly" with the
> case where I have the fewest zero entries in the non-zero structure,
> before getting its symbolic factorization.
> >>>>
> >>>> Further in the main loop, I'm solely changing its entries
> retaining the non-zero structure.
> >>>>
> >>>> Here is the simplified Fortran code I'm using:
> >>>>
> >>>> ! Fill (M,N) case to ensure all non-zero entries are preallocated
> >>>> CALL set_equations(M,N)
> >>>>
> >>>> CALL alloc_matrix(L)
> >>>> ! --> Call MatSeqAIJSetPreallocation/MatMPIAIJSetPreallocation
> >>>> ! --> Sets MAT_IGNORE_ZERO_ENTRIES,
> MAT_NEW_NONZERO_ALLOCATION_ERR, MAT_NO_OFF_PROC_ENTRIES to true
> >>>>
> >>>> CALL assemble_matrix(L)
> >>>> ! --> Calls MatSetValues with ADD_VALUES
> >>>> ! --> Call MatAssemblyBegin/MatAssemblyEnd
> >>>>
> >>>> ! Tell PETSc that new non-zero insertions in matrix are forbidden
> >>>> CALL MatSetOption(L,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr)
> >>>>
> >>>> CALL set_mumps_parameters()
> >>>>
> >>>> ! Get symbolic LU factorization using MUMPS
> >>>> CALL MatGetFactor(L,MATSOLVERMUMPS,MAT_FACTOR_LU,F,ierr)
> >>>> CALL MatGetOrdering(L,MATORDERINGNATURAL,rperm,cperm,ierr)
> >>>> CALL MatLUFactorSymbolic(F,L,rperm,cperm,info,ierr)
> >>>>
> >>>> CALL initialize_right_hand_sides()
> >>>>
> >>>> ! Zero matrix entries
> >>>> CALL MatZeroEntries(L,ierr)
> >>>>
> >>>> ! Main loop
> >>>> DO itr=1, maxitr
> >>>>
> >>>> DO m = 1, M
> >>>> DO n = 1, N
> >>>>
> >>>> CALL set_equations(m,n)
> >>>> CALL assemble_matrix(L) ! ERROR HERE when m=1, n=1, CRASH IN
> MatSetValues call
> >>>>
> >>>> ! Solving the linear system associated with (m,n)
> >>>> CALL MatLUFactorNumeric(F,L,info,ierr)
> >>>> CALL MatSolve(F,v_rhs(m,n),v_sol(m,n),ierr)
> >>>>
> >>>> ! Process v_rhs's from v_sol's for next iteration
> >>>>
> >>>> CALL MatZeroEntries(L,ierr)
> >>>>
> >>>> END DO
> >>>> END DO
> >>>>
> >>>> END DO
> >>>>
> >>>>
> >>>>
> >>>> Testing on a small case, the error I get is
> >>>>
> >>>> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> >>>> [1]PETSC ERROR: Argument out of range
> >>>> [1]PETSC ERROR: Inserting a new nonzero at global row/column
> (200, 160) into matrix
> >>>> [1]PETSC ERROR: See
> https://www.mcs.anl.gov/petsc/documentation/faq.html
> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble
> shooting.
> >>>> [1]PETSC ERROR: Petsc Release Version 3.12.0, unknown
> >>>> [1]PETSC ERROR: Configure options --PETSC_ARCH=cplx_gcc_debug
> --with-scalar-type=complex --with-precision=double --with-debugging=1
> --with-valgrind=1 --with-debugger=gdb --with-fortran-kernels=1
> --download-mpich --download-fblaslapack --download-scalapack
> --download-metis --download-parmetis --download-ptscotch
> --download-mumps --download-slepc --COPTFLAGS="-O0 -g"
> --CXXOPTFLAGS="-O0 -g" --FOPTFLAGS="-O0 -g -fbacktrace"
> >>>> [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 634 in
> /home/thibaut/Packages/petsc/src/mat/impls/aij/mpi/mpiaij.c
> >>>> [1]PETSC ERROR: #2 MatSetValues() line 1375 in
> /home/thibaut/Packages/petsc/src/mat/interface/matrix.c
> >>>> [1]PETSC ERROR: #3 User provided function() line 0 in User file
> >>>> application called MPI_Abort(MPI_COMM_SELF, 63) - process 0
> >>>>
> >>>>
> >>>> which I don't understand. That element was not in the non-zero
> structure and wasn't preallocated. I printed the value to be inserted
> at this location (200,160) and it is exactly
> (0.0000000000000000,0.0000000000000000) so this entry should not be
> inserted due to MAT_IGNORE_ZERO_ENTRIES, however it seems it is. I'm
> using ADD_VALUES in MatSetValues but it is the only call where
> (200,160) is inserted.
> >>>>
> >>>>
> >>>>
> >>>> - I zero the matrix entries with MatZeroEntries which retains
> the non-zero structure (checked when I print the matrix) but tried to
> comment the corresponding calls.
> >>>>
> >>>> - I tried to set MAT_NEW_NONZERO_LOCATION_ERR AND
> MAT_NEW_NONZERO_ALLOCATION_ERR to PETSC_FALSE without effect.
> >>>>
> >>>>
> >>>>
> >>>> Perhaps there's something fundamentally wrong in my approach, in
> any case would you have any suggestions to identify the exact problem?
> >>>>
> >>>> Using PETSc 3.12.0. Thanks for your support,
> >>>>
> >>>>
> >>>>
> >>>> Thibaut
> >>>>
> >>>
> >>>
> >>> --
> >>> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> >>> -- Norbert Wiener
> >>>
> >>> https://www.cse.buffalo.edu/~knepley/
> >>
> >>
> >> --
> >> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which
> their experiments lead.
> >> -- Norbert Wiener
> >>
> >> https://www.cse.buffalo.edu/~knepley/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191025/581c4985/attachment-0001.html>
More information about the petsc-users
mailing list