[petsc-dev] Bug in MatShift_MPIAIJ ?

Eric Chamberland Eric.Chamberland at giref.ulaval.ca
Wed Oct 21 08:04:13 CDT 2015


Thanks Barry! :)

another question: while trying to understand this, I used the 
"-on_error_attach_debugger ddd" which worked, BUT the line where it was 
breaking all seemed ok to me.  I mean, the value of "op" variable tested 
by at line petsc-3.6.2/src/mat/interface/matrix.c:5264 :

5256 PetscErrorCode  MatSetOption(Mat mat,MatOption op,PetscBool flg)
5257 {
5258   PetscErrorCode ierr;
5259
5260   PetscFunctionBegin;
5261   PetscValidHeaderSpecific(mat,MAT_CLASSID,1);
5262   PetscValidType(mat,1);
5263   if (op > 0) {
5264     PetscValidLogicalCollectiveEnum(mat,op,2);
...

was the same on all (2) processes, but printing the values of variables 
"op", "b1" and "b2" used in the macro PetscValidLogicalCollectiveEnum 
gave me:

=====
process rank 1:
=====
(gdb) print b2[0]
$1 = 5
(gdb) print b2[1]
$2 = 17

and for b1:

(gdb) print b1[1]
$3 = 17
(gdb) print b1[0]
$4 = -17

and:
(gdb) print (int)op
$7 = 17

=====
process rank 0:
=====
(gdb) print b2[0]
$1 = 5
(gdb) print b2[1]
$2 = 17
(gdb) print b1[0]
$3 = -17
(gdb) print b1[1]
$4 = 17
(gdb) print (int)(op)
$5 = 17

So local values of "op" and "b1" are all correct, but there is an 
invalid value resulting from the "MPI_Allreduce" ????

I am not quite a PETSc expert, but I would have expected that the 
debugger started at that point would have gave me a chance to understand 
what is happening...  is there something to do with that verification to 
help it help users like me debugging more easily?

Thanks anyway!

Eric

On 20/10/15 10:47 PM, Barry Smith wrote:
>
>    Eric,
>
>     Thanks for the test case. I have determined the problem, it is a nasty bug caused by overly convoluted code.
>
> The MatSeqAIJSetPreallocation() is there because if the matrix had been assembled but had no values in it the MatShift_Basic() took forever since
> a new malloc needed to be done for each local row. The problem is that MatSeqAIJSetPreallocation changed the value of the aij->nonew flag of that sequential object, BUT MatAssemblyEnd_MPIA() assumed that the value of this flag was identical on all processes. In your case since aij->nz = 0 on your matrix with no local rows the value of nonew was changed on one process but not on others triggering disaster in the MatAssemblyEnd_MPIA().
>
> This is now fixed in the maint, master and next branches and will be in the next patch release. I have also attached the patch to this email.
>
>    Barry
>




More information about the petsc-dev mailing list