[petsc-dev] Infinite loop in A*B
Zhang, Hong
hzhang at mcs.anl.gov
Mon Mar 1 17:52:05 CST 2021
Pierre,
I pushed a fix in branch hzhang/fix-matmatmult_aij_dense/release. https://gitlab.com/petsc/petsc/-/merge_requests/3667
[https://assets.gitlab-static.net/uploads/-/system/project/avatar/13882401/PETSc_RBG-logo.png]<https://gitlab.com/petsc/petsc/-/merge_requests/3667>
bugfix for MatMatMultSymbolic_MPIAIJ_MPIDense() when Bbn1 = 0. (!3667) · Merge Requests · PETSc / petsc<https://gitlab.com/petsc/petsc/-/merge_requests/3667>
Reported-by: Pierre Jolivet pierre at joliv.et Bb (column block size) cannot be zero; it leads to infinite loop in MatMatMultNumeric_MPIAIJ_MPIDense() with n=0
gitlab.com
Give it a try. Let me know if the bug is not fixed.
Your code is very helpful in debugging.
Hong
________________________________
From: Pierre Jolivet <pierre at joliv.et>
Sent: Monday, March 1, 2021 12:51 AM
To: Zhang, Hong <hzhang at mcs.anl.gov>
Cc: For users of the development version of PETSc <petsc-dev at mcs.anl.gov>
Subject: Re: [petsc-dev] Infinite loop in A*B
On 1 Mar 2021, at 6:29 AM, Zhang, Hong <hzhang at mcs.anl.gov<mailto:hzhang at mcs.anl.gov>> wrote:
Pierre,
This is a bug in MatMatMultSymbolic_MPIAIJ_MPIDense() during optimization of block column size of B. Run your code with
'-matmatmult_Bbn 1', the infinite loop should not occur.
Thanks Hong, I can confirm this option makes the more complex use case run smoothly as well.
I'll try to figure out a fix tomorrow.
Great.
Thanks,
Pierre
Hong
________________________________
From: Zhang, Hong <hzhang at mcs.anl.gov<mailto:hzhang at mcs.anl.gov>>
Sent: Sunday, February 28, 2021 11:05 PM
To: Pierre Jolivet <pierre at joliv.et<mailto:pierre at joliv.et>>; For users of the development version of PETSc <petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>>; Zhang, Hong <hzhang at mcs.anl.gov<mailto:hzhang at mcs.anl.gov>>
Subject: Re: [petsc-dev] Infinite loop in A*B
The infinite loop in MatMatMultNumeric_MPIAIJ_MPIDense()
for (i=0; i<BN; i+=n) {
}
is cause by n=contents->workB->cmap->n=0 (line 590 in mpimatmatmult.c)
Hong
________________________________
From: petsc-dev <petsc-dev-bounces at mcs.anl.gov<mailto:petsc-dev-bounces at mcs.anl.gov>> on behalf of Zhang, Hong via petsc-dev <petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>>
Sent: Sunday, February 28, 2021 10:33 PM
To: Pierre Jolivet <pierre at joliv.et<mailto:pierre at joliv.et>>; For users of the development version of PETSc <petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>>
Subject: Re: [petsc-dev] Infinite loop in A*B
I can reproduce the hang with
mpiexec -n 2 ./matmatmult
It seems in an infinite loop of calling MatDensePlaceArray() from
#0 MatDensePlaceArray (mat=0xda5c50, array=0xd15e60)
at /home/hongsu/soft/petsc/src/mat/impls/dense/mpi/mpidense.c:2047
#1 0x00007fa0d13bf4f7 in MatDenseGetSubMatrix_SeqDense (A=0xcfb2b0, cbegin=0,
cend=0, v=0xd90370)
at /home/hongsu/soft/petsc/src/mat/impls/dense/seq/dense.c:2997
#2 0x00007fa0d13c574e in MatDenseGetSubMatrix (A=0xcfb2b0, cbegin=0, cend=0,
v=0xd90370) at /home/hongsu/soft/petsc/src/mat/impls/dense/seq/dense.c:3371
#3 0x00007fa0d13db5ce in MatDenseGetSubMatrix_MPIDense (A=0xca5250, cbegin=0,
cend=0, v=0x7ffe87d41de0)
at /home/hongsu/soft/petsc/src/mat/impls/dense/mpi/mpidense.c:1835
#4 0x00007fa0d13c574e in MatDenseGetSubMatrix (A=0xca5250, cbegin=0, cend=0,
v=0x7ffe87d41de0)
at /home/hongsu/soft/petsc/src/mat/impls/dense/seq/dense.c:3371
#5 0x00007fa0d179c2fa in MatMatMultNumeric_MPIAIJ_MPIDense (A=0xc55490,
B=0xca5250, C=0xd282b0)
at /home/hongsu/soft/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c:593
#6 0x00007fa0d1181331 in MatProductNumeric_AB (mat=0xd282b0)
at /home/hongsu/soft/petsc/src/mat/interface/matproduct.c:567
#7 0x00007fa0d1182c14 in MatProductNumeric (mat=0xd282b0)
at /home/hongsu/soft/petsc/src/mat/interface/matproduct.c:679
#8 0x00007fa0d115ef69 in MatProduct_Private (A=0xc55490, B=0xca5250,
scall=MAT_INITIAL_MATRIX, fill=-2, ptype=MATPRODUCT_AB, C=0x7ffe87d42018)
at /home/hongsu/soft/petsc/src/mat/interface/matrix.c:9405
---Type <return> to continue, or q <return> to quit---
#9 0x00007fa0d115f274 in MatMatMult (A=0xc55490, B=0xca5250, scall=MAT_INITIAL_MATRIX, fill=-2,
C=0x7ffe87d42018) at /home/hongsu/soft/petsc/src/mat/interface/matrix.c:9445
#10 0x000000000040130a in main (argc=2, argv=0x7ffe87d42108) at ex1.c:20
I'll try to figure out what is going on. If anyone has a clue, please help. The above stack comes from 'release' branch.
Hong
________________________________
From: petsc-dev <petsc-dev-bounces at mcs.anl.gov<mailto:petsc-dev-bounces at mcs.anl.gov>> on behalf of Pierre Jolivet <pierre at joliv.et<mailto:pierre at joliv.et>>
Sent: Sunday, February 28, 2021 4:17 PM
To: For users of the development version of PETSc <petsc-dev at mcs.anl.gov<mailto:petsc-dev at mcs.anl.gov>>
Subject: [petsc-dev] Infinite loop in A*B
Hello,
The following MWE loops indefinitely for MPI_Comm_size in {2; 3}.
Nothing fancy, just MatAIJ and MatDense.
The problem is either in MatMPIDenseScatter() or MatMatMultSymbolic_MPIAIJ_MPIDense(), I believe, so if someone familiar with those routines can figure out a hot fix, I’m all ears.
I could of course switch to a MatMult(), but the same infinite loop happens in another more complex code with
A = rows=8, cols=35212
B = rows=35212, cols=9
So I’ll need a fix eventually.
Thanks,
Pierre
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20210301/c3dabcfd/attachment.html>
More information about the petsc-dev
mailing list