[petsc-users] 'Inserting a new nonzero' issue on a reassembled matrix in parallel
Thibaut Appel
t.appel17 at imperial.ac.uk
Tue Oct 22 11:40:47 CDT 2019
Hi Hong,
Thank you for having a look, I copied/pasted your code snippet into
ex28.c and the error indeed appears if you change that col[0]. That's
because you did not allow a new non-zero location in the matrix with the
option MAT_NEW_NONZERO_LOCATION_ERR.
I spent the day debugging the code and already checked my calls to
MatSetValues:
For all MatSetValues calls corresponding to the row/col location in the
error messages in the subsequent assembly, the numerical value
associated with that row/col was exactly (0.0,0.0) (complex arithmetic)
so it shouldn't be inserted w.r.t. the option MAT_IGNORE_ZERO_ENTRIES.
It seems MatSetValues still did it anyway.
I was able to solve the problem by adding
MatSetOption(L,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) after my first
assembly. However I don't know why it fixed it as the manual seems to
say this is just for efficiency purposes
It "should be specified after the first matrix has been fully assembled.
This option ensures that certain data structures and communication
information will be reused (instead of regenerated during successive
steps, thereby increasing efficiency"
So I'm still puzzled by why I got that error in the first place. Unless
"regenerated" implies resetting some attributes of the preallocated
non-zero structure / first assembly?
Thibaut
On 22/10/2019 17:06, Zhang, Hong wrote:
> Thibaut:
> Check your code on MatSetValues(), likely you set a value "to a new
> nonzero at global row/column (200, 160) into matrix" L.
> I tested petsc/src/mat/examples/tests/ex28.c by adding
> @@ -95,6 +95,26 @@ int main(int argc,char **args)
> /* Compute numeric factors using same F, then solve */
> for (k = 0; k < num_numfac; k++) {
> /* Get numeric factor of A[k] */
> + if (k==0) {
> + ierr = MatZeroEntries(A[0]);CHKERRQ(ierr);
> + for (i=rstart; i<rend; i++) {
> + col[0] = i-1; col[1] = i; col[2] = i+1;
> + if (i == 0) {
> + ierr =
> MatSetValues(A[k],1,&i,2,col+1,value+1,INSERT_VALUES);CHKERRQ(ierr);
> + } else if (i == N-1) {
> + ierr =
> MatSetValues(A[k],1,&i,2,col,value,INSERT_VALUES);CHKERRQ(ierr);
> + } else {
> + ierr =
> MatSetValues(A[k],1,&i,3,col,value,INSERT_VALUES);CHKERRQ(ierr);
> + }
> + }
> + if (!rank) {
> + i = N - 1; col[0] = N - 1;
> + ierr =
> MatSetValues(A[k],1,&i,1,col,value,INSERT_VALUES);CHKERRQ(ierr);
> + }
> + ierr = MatAssemblyBegin(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
> + ierr = MatAssemblyEnd(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
> + }
> +
>
> It works in both sequential and parallel. If I set col[0] = 0, then I
> get the same error as yours.
> Hong
>
> Dear PETSc developers,
>
> I'm extending a validated matrix preallocation/assembly part of my
> code to solve multiple linear systems with MUMPS at each iteration
> of a main loop, following the example
> src/mat/examples/tests/ex28.c that Hong Zhang added a few weeks
> ago. The difference is that I'm using just 1 matrix to solve
> different systems.
>
> I'm trying to investigate a nasty bug arising when I try to
> assemble "for a second time" that MPIAIJ matrix. The issue arises
> only in parallel, serial works fine.
>
> Creating 1 MPIAIJ matrix, preallocating it "perfectly" with the
> case where I have the fewest zero entries in the non-zero
> structure, before getting its symbolic factorization.
>
> Further in the main loop, I'm solely changing its entries
> *retaining the non-zero structure*.
>
> Here is the simplified Fortran code I'm using:
>
> ! Fill (M,N) case to ensure all non-zero entries are preallocated
> CALL set_equations(M,N)
>
> CALL alloc_matrix(L)
> ! --> Call MatSeqAIJSetPreallocation/MatMPIAIJSetPreallocation
> ! --> Sets MAT_IGNORE_ZERO_ENTRIES,
> MAT_NEW_NONZERO_ALLOCATION_ERR, MAT_NO_OFF_PROC_ENTRIES to true
>
> CALL assemble_matrix(L)
> ! --> Calls MatSetValues with ADD_VALUES
> ! --> Call MatAssemblyBegin/MatAssemblyEnd
>
> ! Tell PETSc that new non-zero insertions in matrix are forbidden
> CALL MatSetOption(L,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr)
>
> CALL set_mumps_parameters()
>
> ! Get symbolic LU factorization using MUMPS
> CALL MatGetFactor(L,MATSOLVERMUMPS,MAT_FACTOR_LU,F,ierr)
> CALL MatGetOrdering(L,MATORDERINGNATURAL,rperm,cperm,ierr)
> CALL MatLUFactorSymbolic(F,L,rperm,cperm,info,ierr)
>
> CALL initialize_right_hand_sides()
>
> ! Zero matrix entries
> CALL MatZeroEntries(L,ierr)
>
> ! Main loop
> DO itr=1, maxitr
>
> DO m = 1, M
> DO n = 1, N
>
> CALL set_equations(m,n)
> CALL assemble_matrix(L) ! ERROR HERE when m=1, n=1, CRASH IN
> MatSetValues call
>
> ! Solving the linear system associated with (m,n)
> CALL MatLUFactorNumeric(F,L,info,ierr)
> CALL MatSolve(F,v_rhs(m,n),v_sol(m,n),ierr)
>
> ! Process v_rhs's from v_sol's for next iteration
>
> CALL MatZeroEntries(L,ierr)
>
> END DO
> END DO
>
> END DO
>
>
> Testing on a small case, the error I get is
>
> [1]PETSC ERROR: --------------------- Error Message
> --------------------------------------------------------------
> [1]PETSC ERROR: Argument out of range
> [1]PETSC ERROR: Inserting a new nonzero at global row/column (200,
> 160) into matrix
> [1]PETSC ERROR: See
> https://www.mcs.anl.gov/petsc/documentation/faq.html
> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble
> shooting.
> [1]PETSC ERROR: Petsc Release Version 3.12.0, unknown
> [1]PETSC ERROR: Configure options --PETSC_ARCH=cplx_gcc_debug
> --with-scalar-type=complex --with-precision=double
> --with-debugging=1 --with-valgrind=1 --with-debugger=gdb
> --with-fortran-kernels=1 --download-mpich --download-fblaslapack
> --download-scalapack --download-metis --download-parmetis
> --download-ptscotch --download-mumps --download-slepc
> --COPTFLAGS="-O0 -g" --CXXOPTFLAGS="-O0 -g" --FOPTFLAGS="-O0 -g
> -fbacktrace"
> [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 634 in
> /home/thibaut/Packages/petsc/src/mat/impls/aij/mpi/mpiaij.c
> [1]PETSC ERROR: #2 MatSetValues() line 1375 in
> /home/thibaut/Packages/petsc/src/mat/interface/matrix.c
> [1]PETSC ERROR: #3 User provided function() line 0 in User file
> application called MPI_Abort(MPI_COMM_SELF, 63) - process 0
>
>
> which I don't understand. That element was not in the non-zero
> structure and wasn't preallocated. I printed the value to be
> inserted at this location (200,160) and it is exactly
> (0.0000000000000000,0.0000000000000000) so this entry should not
> be inserted due to MAT_IGNORE_ZERO_ENTRIES, however it seems it
> is. I'm using ADD_VALUES in MatSetValues but it is the only call
> where (200,160) is inserted.
>
>
> - I zero the matrix entries with MatZeroEntries which retains
> the non-zero structure (checked when I print the matrix) but tried
> to comment the corresponding calls.
>
> - I tried to set MAT_NEW_NONZERO_LOCATION_ERR AND
> MAT_NEW_NONZERO_ALLOCATION_ERR to PETSC_FALSE without effect.
>
>
> Perhaps there's something fundamentally wrong in my approach, in
> any case would you have any suggestions to identify the exact
> problem?
>
> Using PETSc 3.12.0. Thanks for your support,
>
>
> Thibaut
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191022/36c0778d/attachment-0001.html>
More information about the petsc-users
mailing list