[petsc-dev] Bug in MatShift_MPIAIJ ?
Eric Chamberland
Eric.Chamberland at giref.ulaval.ca
Wed Oct 21 08:04:13 CDT 2015
Thanks Barry! :)
another question: while trying to understand this, I used the
"-on_error_attach_debugger ddd" which worked, BUT the line where it was
breaking all seemed ok to me. I mean, the value of "op" variable tested
by at line petsc-3.6.2/src/mat/interface/matrix.c:5264 :
5256 PetscErrorCode MatSetOption(Mat mat,MatOption op,PetscBool flg)
5257 {
5258 PetscErrorCode ierr;
5259
5260 PetscFunctionBegin;
5261 PetscValidHeaderSpecific(mat,MAT_CLASSID,1);
5262 PetscValidType(mat,1);
5263 if (op > 0) {
5264 PetscValidLogicalCollectiveEnum(mat,op,2);
...
was the same on all (2) processes, but printing the values of variables
"op", "b1" and "b2" used in the macro PetscValidLogicalCollectiveEnum
gave me:
=====
process rank 1:
=====
(gdb) print b2[0]
$1 = 5
(gdb) print b2[1]
$2 = 17
and for b1:
(gdb) print b1[1]
$3 = 17
(gdb) print b1[0]
$4 = -17
and:
(gdb) print (int)op
$7 = 17
=====
process rank 0:
=====
(gdb) print b2[0]
$1 = 5
(gdb) print b2[1]
$2 = 17
(gdb) print b1[0]
$3 = -17
(gdb) print b1[1]
$4 = 17
(gdb) print (int)(op)
$5 = 17
So local values of "op" and "b1" are all correct, but there is an
invalid value resulting from the "MPI_Allreduce" ????
I am not quite a PETSc expert, but I would have expected that the
debugger started at that point would have gave me a chance to understand
what is happening... is there something to do with that verification to
help it help users like me debugging more easily?
Thanks anyway!
Eric
On 20/10/15 10:47 PM, Barry Smith wrote:
>
> Eric,
>
> Thanks for the test case. I have determined the problem, it is a nasty bug caused by overly convoluted code.
>
> The MatSeqAIJSetPreallocation() is there because if the matrix had been assembled but had no values in it the MatShift_Basic() took forever since
> a new malloc needed to be done for each local row. The problem is that MatSeqAIJSetPreallocation changed the value of the aij->nonew flag of that sequential object, BUT MatAssemblyEnd_MPIA() assumed that the value of this flag was identical on all processes. In your case since aij->nz = 0 on your matrix with no local rows the value of nonew was changed on one process but not on others triggering disaster in the MatAssemblyEnd_MPIA().
>
> This is now fixed in the maint, master and next branches and will be in the next patch release. I have also attached the patch to this email.
>
> Barry
>
More information about the petsc-dev
mailing list