<div dir="ltr"><div><div>Figured out:<br><br></div>The reason is that in MatCreate_HYPRE(Mat B), we call MPI_Comm_dup instead of PetscCommDuplicate. The PetscCommDuplicate is better, and it does not actually create a communicator if the communicator is already known to PETSc. <br><br></div><div>Furthermore, I do not think we should a comm in <br><br><i>typedef struct {<br> HYPRE_IJMatrix ij;<br> HYPRE_IJVector x;<br> HYPRE_IJVector b;<br> MPI_Comm comm;<br>} Mat_HYPRE;</i><br></div><div><br></div><div>It is an inner data of Mat, and it should already the same comm as the Mat. I do not understand why the internal data has its own comm.<br><br></div><div>The following patch fixed the issue (just deleted this extra comm).<br><br>diff --git a/src/mat/impls/hypre/mhypre.c b/src/mat/impls/hypre/mhypre.c<br>index dc19892..d8cfe3d 100644<br>--- a/src/mat/impls/hypre/mhypre.c<br>+++ b/src/mat/impls/hypre/mhypre.c<br>@@ -74,7 +74,7 @@ static PetscErrorCode MatHYPRE_CreateFromMat(Mat A, Mat_HYPRE *hA)<br> rend = A->rmap->rend;<br> cstart = A->cmap->rstart;<br> cend = A->cmap->rend;<br>- PetscStackCallStandard(HYPRE_IJMatrixCreate,(hA->comm,rstart,rend-1,cstart,cend-1,&hA->ij));<br>+ PetscStackCallStandard(HYPRE_IJMatrixCreate,(PetscObjectComm((PetscObject)A),rstart,rend-1,cstart,cend-1,&hA->ij));<br> PetscStackCallStandard(HYPRE_IJMatrixSetObjectType,(hA->ij,HYPRE_PARCSR));<br> {<br> PetscBool same;<br>@@ -434,7 +434,7 @@ PetscErrorCode MatDestroy_HYPRE(Mat A)<br> if (hA->x) PetscStackCallStandard(HYPRE_IJVectorDestroy,(hA->x));<br> if (hA->b) PetscStackCallStandard(HYPRE_IJVectorDestroy,(hA->b));<br> if (hA->ij) PetscStackCallStandard(HYPRE_IJMatrixDestroy,(hA->ij));<br>- if (hA->comm) { ierr = MPI_Comm_free(&hA->comm);CHKERRQ(ierr);}<br>+ /*if (hA->comm) { ierr = MPI_Comm_free(&hA->comm);CHKERRQ(ierr);}*/<br> ierr = PetscObjectComposeFunction((PetscObject)A,"MatConvert_hypre_aij_C",NULL);CHKERRQ(ierr);<br> ierr = PetscFree(A->data);CHKERRQ(ierr);<br> PetscFunctionReturn(0);<br>@@ -500,7 +500,8 @@ PETSC_EXTERN PetscErrorCode MatCreate_HYPRE(Mat B)<br> B->ops->destroy = MatDestroy_HYPRE;<br> B->ops->assemblyend = MatAssemblyEnd_HYPRE;<br> <br>- ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)B),&hB->comm);CHKERRQ(ierr);<br>+ /*ierr = MPI_Comm_dup(PetscObjectComm((PetscObject)B),&hB->comm);CHKERRQ(ierr); */<br>+ /*ierr = PetscCommDuplicate(PetscObjectComm((PetscObject)B),&hB->comm,NULL);CHKERRQ(ierr);*/<br> ierr = PetscObjectChangeTypeName((PetscObject)B,MATHYPRE);CHKERRQ(ierr);<br> ierr = PetscObjectComposeFunction((PetscObject)B,"MatConvert_hypre_aij_C",MatConvert_HYPRE_AIJ);CHKERRQ(ierr);<br> PetscFunctionReturn(0);<br>diff --git a/src/mat/impls/hypre/mhypre.h b/src/mat/impls/hypre/mhypre.h<br>index 3d9ddd2..1189020 100644<br>--- a/src/mat/impls/hypre/mhypre.h<br>+++ b/src/mat/impls/hypre/mhypre.h<br>@@ -10,7 +10,7 @@ typedef struct {<br> HYPRE_IJMatrix ij;<br> HYPRE_IJVector x;<br> HYPRE_IJVector b;<br>- MPI_Comm comm;<br>+ /*MPI_Comm comm;*/<br> } Mat_HYPRE;<br> <br><br></div><div><br></div><div>Fande,<br></div><div><br></div><div><br><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Apr 3, 2018 at 10:35 AM, Satish Balay <span dir="ltr"><<a href="mailto:balay@mcs.anl.gov" target="_blank">balay@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Tue, 3 Apr 2018, Satish Balay wrote:<br>
<br>
> On Tue, 3 Apr 2018, Derek Gaston wrote:<br>
><br>
> > One thing I want to be clear of here: is that we're not trying to solve<br>
> > this particular problem (where we're creating 1000 instances of Hypre to<br>
> > precondition each variable independently)... this particular problem is<br>
> > just a test (that we've had in our test suite for a long time) to stress<br>
> > test some of this capability.<br>
> ><br>
> > We really do have needs for thousands (tens of thousands) of simultaneous<br>
> > solves (each with their own Hypre instances). That's not what this<br>
> > particular problem is doing - but it is representative of a class of our<br>
> > problems we need to solve.<br>
> ><br>
> > Which does bring up a point: I have been able to do solves before with<br>
> > ~50,000 separate PETSc solves without issue. Is it because I was working<br>
> > with MVAPICH on a cluster? Does it just have a higher limit?<br>
><br>
> Don't know - but thats easy to find out with a simple test code..<br>
><br>
> >>>>>><br>
> $ cat comm_dup_test.c<br>
> #include <mpi.h><br>
> #include <stdio.h><br>
><br>
> int main(int argc, char** argv) {<br>
> MPI_Comm newcomm;<br>
> int i, err;<br>
> MPI_Init(NULL, NULL);<br>
> for (i=0; i<100000; i++) {<br>
> err = MPI_Comm_dup(MPI_COMM_WORLD, &newcomm);<br>
> if (err) {<br>
> printf("%5d - fail\n",i);fflush(stdout);<br>
> break;<br>
> } else {<br>
> printf("%5d - success\n",i);fflush(stdout);<br>
> }<br>
> }<br>
> MPI_Finalize();<br>
> }<br>
> <<<<<<<<br>
><br>
> OpenMPI fails after '65531' and mpich after '2044'. MVAPICH is derived<br>
> off MPICH - but its possible they have a different limit than MPICH.<br>
<br>
</div></div>BTW: the above is with: openmpi-2.1.2 and mpich-3.3b1<br>
<br>
mvapich2-1.9.5 - and I get error after '2044' comm dupes<br>
<span class="HOEnZb"><font color="#888888"><br>
Satish<br>
</font></span></blockquote></div><br></div>