[petsc-users] A bad commit affects MOOSE

Satish Balay balay at mcs.anl.gov
Tue Apr 3 11:30:54 CDT 2018


On Tue, 3 Apr 2018, Derek Gaston wrote:

> One thing I want to be clear of here: is that we're not trying to solve
> this particular problem (where we're creating 1000 instances of Hypre to
> precondition each variable independently)... this particular problem is
> just a test (that we've had in our test suite for a long time) to stress
> test some of this capability.
> 
> We really do have needs for thousands (tens of thousands) of simultaneous
> solves (each with their own Hypre instances).  That's not what this
> particular problem is doing - but it is representative of a class of our
> problems we need to solve.
> 
> Which does bring up a point: I have been able to do solves before with
> ~50,000 separate PETSc solves without issue.  Is it because I was working
> with MVAPICH on a cluster?  Does it just have a higher limit?

Don't know - but thats easy to find out with a simple test code..

>>>>>>
$ cat comm_dup_test.c
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    MPI_Comm newcomm;
    int i, err;
    MPI_Init(NULL, NULL);
    for (i=0; i<100000; i++) {
      err = MPI_Comm_dup(MPI_COMM_WORLD, &newcomm);
      if (err) {
          printf("%5d - fail\n",i);fflush(stdout);
          break;
        } else {
          printf("%5d - success\n",i);fflush(stdout);
      }
    }
    MPI_Finalize();
}
<<<<<<<

OpenMPI fails after '65531' and mpich after '2044'. MVAPICH is derived
off MPICH - but its possible they have a different limit than MPICH.

Satish


More information about the petsc-users mailing list