[petsc-dev] [petsc-users] Column #j is wrong in parallel from message "Inserting a new nonzero (i, j) into matrix"
Barry Smith
bsmith at mcs.anl.gov
Wed Mar 25 18:25:04 CDT 2015
This is an issue with any use of "nested" matrices (of which MPIBAIJ is a special, two level case). One way we could handle this is (I think universally) would be to introduce a unique PETSC_ERR_MAT_ENTRY_NONZERO error flag and then when any outer MatSetValues() calls an inner MatSetValues() check the error code and if this one print an error message with its level of row and column indices. So for example
Inside MatSetValuesBlocked_MPIBAIJ()
ierr = MatSetValuesBlocked_SeqBAIJ(baij->B,1,&row,1,&col,barray,addv);
if (ierr == PETSC_ERR_MAT_ENTRY_NONZERO) (*PetscErrorPrintf)("%s Matrix row and column block indices (%d %d)\n",__FUNC__,im[i],in[j]);
CHKERRQ(ierr);
The resulting error message would look something like
[0]PETSC ERROR: MatSetValuesBlocked_SeqBAIJ() line 564 in /home/mefpp_ericc/petsc-3.5.3/src/mat/impls/baij/mpi/mpibaij.c Inserting a new nonzero (135, 9) into matrix
... a stack trace line or two here
[0] PETSC ERROR: MatSetValuesBlocked_MPIBAIJ() line ... Matrix row and column block indices (135,537)
In three level nesting there would be three sets of indices.
We could use a macro to handle the error (to minimize the ugliness of the code).
Is this worth pursing? The resulting messages are still a bit cryptic and ugly but at least they provide the user with correct information as opposed to the current situations where for nested matrices it is meaningless misleading numbers.
The only other alternative I see is passing the global numbers all the way down in the calls (as extra variables) which seems pretty bad.
Any other thoughts?
Barry
> [0]PETSC ERROR: MatSetValuesBlocked_MPIBAIJ() line 564 in /home/mefpp_ericc/petsc-3.5.3/src/mat/impls/baij/mpi/mpibaij.c Inserting a new nonzero (135, 9) into matrix
> On Mar 25, 2015, at 1:03 PM, Eric Chamberland <Eric.Chamberland at giref.ulaval.ca> wrote:
>
> Hi,
>
> while looking for where in the world do I insert the (135,9) entry in my matrix, I have discovered that the column # shown is wrong in parallel!
>
> I am using PETsc 3.5.3.
>
> The full error message is:
>
> [0]PETSC ERROR: MatSetValues_MPIAIJ() line 564 in /home/mefpp_ericc/petsc-3.5.3/src/mat/impls/aij/mpi/mpiaij.c Inserting a new nonzero (135, 9) into matrix
>
> This line code is a call to a #defined macro:
>
> MatSetValues_SeqAIJ_B_Private(row,col,value,addv);
>
> where the "col" parameter is not equal to "in[j]"!!!
>
> in gdb, printing "in[j]" gave me:
>
> print in[j]
> $6 = 537
>
> while "col" is:
>
> print col
> $7 = 9
>
> So, I expected to have a message telling me that (135,537) and not (135,9) is a new entry matrix!!!
>
> Would it be a big work to fix this so that the col # displayed is correct?
>
> Thanks!
>
> Eric
More information about the petsc-dev
mailing list