[petsc-users] 'Inserting a new nonzero' issue on a reassembled matrix in parallel

Matthew Knepley knepley at gmail.com
Tue Oct 22 11:48:44 CDT 2019


On Tue, Oct 22, 2019 at 12:43 PM Thibaut Appel via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> Hi Hong,
>
> Thank you for having a look, I copied/pasted your code snippet into ex28.c
> and the error indeed appears if you change that col[0]. That's because you
> did not allow a new non-zero location in the matrix with the option
> MAT_NEW_NONZERO_LOCATION_ERR.
>
> I spent the day debugging the code and already checked my calls to
> MatSetValues:
>
> For all MatSetValues calls corresponding to the row/col location in the
> error messages in the subsequent assembly, the numerical value associated
> with that row/col was exactly (0.0,0.0) (complex arithmetic) so it
> shouldn't be inserted w.r.t. the option MAT_IGNORE_ZERO_ENTRIES. It seems
> MatSetValues still did it anyway.
>
> Okay, lets solve this problem first. You say that

  - You called MatSetOption(A,  MAT_IGNORE_ZERO_ENTRIES, PETSC_TRUE)
  - You called MatSetValues(A, ,,,, ADD_VALUES, ..., val) and val had a
complex 0 in it
  - PETSc tried to insert the complex 0

This should be really easy to test in a tiny example. Do you mind making
it? If its broken, I will fix it.

  Thanks,

    Matt

> I was able to solve the problem by adding
> MatSetOption(L,MAT_NEW_NONZERO_LOCATIONS,PETSC_FALSE) after my first
> assembly. However I don't know why it fixed it as the manual seems to say
> this is just for efficiency purposes
>
> It "should be specified after the first matrix has been fully assembled.
> This option ensures that certain data structures and communication
> information will be reused (instead of regenerated during successive steps,
> thereby increasing efficiency"
>
> So I'm still puzzled by why I got that error in the first place. Unless
> "regenerated" implies resetting some attributes of the preallocated
> non-zero structure / first assembly?
>
>
> Thibaut
>
>
> On 22/10/2019 17:06, Zhang, Hong wrote:
>
> Thibaut:
> Check your code on MatSetValues(), likely you set a value "to a new
> nonzero at global row/column (200, 160) into matrix" L.
> I tested petsc/src/mat/examples/tests/ex28.c by adding
> @@ -95,6 +95,26 @@ int main(int argc,char **args)
>    /* Compute numeric factors using same F, then solve */
>    for (k = 0; k < num_numfac; k++) {
>      /* Get numeric factor of A[k] */
> +    if (k==0) {
> +      ierr = MatZeroEntries(A[0]);CHKERRQ(ierr);
> +      for (i=rstart; i<rend; i++) {
> +        col[0] = i-1; col[1] = i; col[2] = i+1;
> +        if (i == 0) {
> +          ierr =
> MatSetValues(A[k],1,&i,2,col+1,value+1,INSERT_VALUES);CHKERRQ(ierr);
> +        } else if (i == N-1) {
> +          ierr =
> MatSetValues(A[k],1,&i,2,col,value,INSERT_VALUES);CHKERRQ(ierr);
> +        } else {
> +          ierr =
> MatSetValues(A[k],1,&i,3,col,value,INSERT_VALUES);CHKERRQ(ierr);
> +        }
> +      }
> +      if (!rank) {
> +      i = N - 1; col[0] = N - 1;
> +        ierr =
> MatSetValues(A[k],1,&i,1,col,value,INSERT_VALUES);CHKERRQ(ierr);
> +      }
> +      ierr = MatAssemblyBegin(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
> +      ierr = MatAssemblyEnd(A[k],MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
> +    }
> +
>
> It works in both sequential and parallel. If I set col[0] = 0, then I get
> the same error as yours.
> Hong
>
>> Dear PETSc developers,
>>
>> I'm extending a validated matrix preallocation/assembly part of my code
>> to solve multiple linear systems with MUMPS at each iteration of a main
>> loop, following the example src/mat/examples/tests/ex28.c that Hong Zhang
>> added a few weeks ago. The difference is that I'm using just 1 matrix to
>> solve different systems.
>>
>> I'm trying to investigate a nasty bug arising when I try to assemble "for
>> a second time" that MPIAIJ matrix. The issue arises only in parallel,
>> serial works fine.
>>
>> Creating 1 MPIAIJ matrix, preallocating it "perfectly" with the case
>> where I have the fewest zero entries in the non-zero structure, before
>> getting its symbolic factorization.
>>
>> Further in the main loop, I'm solely changing its entries *retaining the
>> non-zero structure*.
>>
>> Here is the simplified Fortran code I'm using:
>>
>> ! Fill (M,N) case to ensure all non-zero entries are preallocated
>> CALL set_equations(M,N)
>>
>> CALL alloc_matrix(L)
>>   ! --> Call MatSeqAIJSetPreallocation/MatMPIAIJSetPreallocation
>>   ! --> Sets MAT_IGNORE_ZERO_ENTRIES, MAT_NEW_NONZERO_ALLOCATION_ERR,
>> MAT_NO_OFF_PROC_ENTRIES to true
>>
>> CALL assemble_matrix(L)
>>   ! --> Calls MatSetValues with ADD_VALUES
>>   ! --> Call MatAssemblyBegin/MatAssemblyEnd
>>
>> ! Tell PETSc that new non-zero insertions in matrix are forbidden
>> CALL MatSetOption(L,MAT_NEW_NONZERO_LOCATION_ERR,PETSC_TRUE,ierr)
>>
>> CALL set_mumps_parameters()
>>
>> ! Get symbolic LU factorization using MUMPS
>> CALL MatGetFactor(L,MATSOLVERMUMPS,MAT_FACTOR_LU,F,ierr)
>> CALL MatGetOrdering(L,MATORDERINGNATURAL,rperm,cperm,ierr)
>> CALL MatLUFactorSymbolic(F,L,rperm,cperm,info,ierr)
>>
>> CALL initialize_right_hand_sides()
>>
>> ! Zero matrix entries
>> CALL MatZeroEntries(L,ierr)
>>
>> ! Main loop
>> DO itr=1, maxitr
>>
>>   DO m = 1, M
>>     DO n = 1, N
>>
>>     CALL set_equations(m,n)
>>     CALL assemble_matrix(L) ! ERROR HERE when m=1, n=1, CRASH IN
>> MatSetValues call
>>
>>     ! Solving the linear system associated with (m,n)
>>     CALL MatLUFactorNumeric(F,L,info,ierr)
>>     CALL MatSolve(F,v_rhs(m,n),v_sol(m,n),ierr)
>>
>>     ! Process v_rhs's from v_sol's for next iteration
>>
>>     CALL MatZeroEntries(L,ierr)
>>
>>     END DO
>>   END DO
>>
>> END DO
>>
>>
>> Testing on a small case, the error I get is
>> [1]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [1]PETSC ERROR: Argument out of range
>> [1]PETSC ERROR: Inserting a new nonzero at global row/column (200, 160)
>> into matrix
>> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
>> for trouble shooting.
>> [1]PETSC ERROR: Petsc Release Version 3.12.0, unknown
>> [1]PETSC ERROR: Configure options --PETSC_ARCH=cplx_gcc_debug
>> --with-scalar-type=complex --with-precision=double --with-debugging=1
>> --with-valgrind=1 --with-debugger=gdb --with-fortran-kernels=1
>> --download-mpich --download-fblaslapack --download-scalapack
>> --download-metis --download-parmetis --download-ptscotch --download-mumps
>> --download-slepc --COPTFLAGS="-O0 -g" --CXXOPTFLAGS="-O0 -g"
>> --FOPTFLAGS="-O0 -g -fbacktrace"
>> [1]PETSC ERROR: #1 MatSetValues_MPIAIJ() line 634 in
>> /home/thibaut/Packages/petsc/src/mat/impls/aij/mpi/mpiaij.c
>> [1]PETSC ERROR: #2 MatSetValues() line 1375 in
>> /home/thibaut/Packages/petsc/src/mat/interface/matrix.c
>> [1]PETSC ERROR: #3 User provided function() line 0 in User file
>> application called MPI_Abort(MPI_COMM_SELF, 63) - process 0
>>
>>
>>
>> which I don't understand. That element was not in the non-zero structure
>> and wasn't preallocated. I printed the value to be inserted at this
>> location (200,160) and it is exactly
>> (0.0000000000000000,0.0000000000000000) so this entry should not be
>> inserted due to MAT_IGNORE_ZERO_ENTRIES, however it seems it is. I'm using
>> ADD_VALUES in MatSetValues but it is the only call where (200,160) is
>> inserted.
>>
>>
>>     - I zero the matrix entries with MatZeroEntries which retains the
>> non-zero structure (checked when I print the matrix) but tried to comment
>> the corresponding calls.
>>
>>     - I tried to set MAT_NEW_NONZERO_LOCATION_ERR AND
>> MAT_NEW_NONZERO_ALLOCATION_ERR to PETSC_FALSE without effect.
>>
>>
>> Perhaps there's something fundamentally wrong in my approach, in any case
>> would you have any suggestions to identify the exact problem?
>>
>> Using PETSc 3.12.0. Thanks for your support,
>>
>>
>> Thibaut
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20191022/04c733ac/attachment.html>


More information about the petsc-users mailing list