[petsc-users] PETSc initialization error

Junchao Zhang junchao.zhang at gmail.com
Sat Jun 20 18:06:09 CDT 2020


Hi, Sam,
 Sorry I am late.  Please try branch jczhang/fix-mpiuni, based off petsc
maint, or directly apply the attached patch. Under your PETSC_DIR,
 git apply patch.txt

 Thanks.
--Junchao Zhang


On Sat, Jun 20, 2020 at 1:40 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> Sam,
>    There are more problems. I am working on a fix.  Please wait an hour.
>  Thanks.
>
> --Junchao Zhang
>
>
> On Sat, Jun 20, 2020 at 1:12 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>
>> Junchao,
>>    I debugged: MPI_Finalize is not called for serial.
>>
>> Barry,
>>    I tried your patch and it seems better but eventually got following
>> error:
>>
>> [0]PETSC ERROR: #1 PetscCommDuplicate() line 160 in
>> ../../../petsc/src/sys/objects/tagm.c
>> [0]PETSC ERROR: #2 PetscHeaderCreate_Private() line 64 in
>> ../../../petsc/src/sys/objects/inherit.c
>> [0]PETSC ERROR: #3 MatCreate() line 91 in
>> ../../../petsc/src/mat/utils/gcreate.c
>> [0]PETSC ERROR: #4 MatCreateShell() line 787 in
>> ../../../petsc/src/mat/impls/shell/shell.c
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: Null argument, when expecting valid pointer
>> [0]PETSC ERROR: Null Object: Parameter # 1
>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>> for trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.11.3, Jun, 26, 2019
>> [0]PETSC ERROR: #6 MatShellSetOperation() line 1052 in
>> ../../../petsc/src/mat/impls/shell/shell.c
>>
>> On Sat, Jun 20, 2020 at 10:24 AM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>>    Junchao,
>>>
>>>      This is a good bug fix. It solves the problem when PETSc initialize
>>> is called many times.
>>>
>>>      There is another fix you can do to limit PETSc mpiuni running out
>>> of attributes inside a single PETSc run:
>>>
>>>
>>> int MPI_Comm_create_keyval(MPI_Copy_function
>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state)
>>> {
>>>
>>>  if (num_attr >= MAX_ATTR){
>>>    for (i=0; i<num_attr; i++) {
>>>      if (!attr_keyval[i].extra_state) {
>>>         /* reuse this slot */
>>>         attr_keyval[i].extra_state = extra_state;
>>>        attr_keyval[i.]del         = delete_fn;
>>>        *keyval = i;
>>>         return MPI_SUCCESS;
>>>      }
>>>   }
>>>   return MPIUni_Abort(MPI_COMM_WORLD,1);
>>> }
>>>  return MPIUni_Abort(MPI_COMM_WORLD,1);
>>>   attr_keyval[num_attr].extra_state = extra_state;
>>>   attr_keyval[num_attr].del         = delete_fn;
>>>   *keyval                           = num_attr++;
>>>   return MPI_SUCCESS;
>>> }
>>>
>>>   This will work if the user creates tons of attributes but is
>>> constantly deleting some as they new ones. So long as the number
>>> outstanding at one time is < MAX_ATTR)
>>>
>>> Barry
>>>
>>>
>>>
>>>
>>>
>>> On Jun 20, 2020, at 10:54 AM, Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>> I don't understand what your session means. Let's try this patch
>>>
>>> diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c
>>> index d559a513..c058265d 100644
>>> --- a/src/sys/mpiuni/mpi.c
>>> +++ b/src/sys/mpiuni/mpi.c
>>> @@ -283,6 +283,7 @@ int MPI_Finalize(void)
>>>    MPI_Comm_free(&comm);
>>>    comm = MPI_COMM_SELF;
>>>    MPI_Comm_free(&comm);
>>> +  num_attr = 1; /* reset the counter */
>>>    MPI_was_finalized = 1;
>>>    return MPI_SUCCESS;
>>>  }
>>>
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Sat, Jun 20, 2020 at 10:48 AM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>
>>>> Typo: I mean “Assuming initializer is only needed once for entire
>>>> session”
>>>>
>>>> On Saturday, June 20, 2020, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>
>>>>> Assuming finalizer is only needed once for entire session(?), I can
>>>>> put initializer into the static block to call it once but where do I call
>>>>> finalizer?
>>>>>
>>>>>
>>>>> On Saturday, June 20, 2020, Junchao Zhang <junchao.zhang at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> The counter num_attr should be recycled. But first try to call PETSc
>>>>>> initialize/Finalize only once to see it fixes the error.
>>>>>> --Junchao Zhang
>>>>>>
>>>>>>
>>>>>> On Sat, Jun 20, 2020 at 12:48 AM Sam Guo <sam.guo at cd-adapco.com>
>>>>>> wrote:
>>>>>>
>>>>>>> To clarify, I call PETSc initialize and PETSc finalize everytime I
>>>>>>> call SLEPc:
>>>>>>>
>>>>>>>   PetscInitializeNoPointers(argc,args,nullptr,nullptr);
>>>>>>>
>>>>>>>   SlepcInitialize(&argc,&args,static_cast<char*>(nullptr),help);
>>>>>>>
>>>>>>>   //calling slepc
>>>>>>>
>>>>>>>   SlepcFinalize();
>>>>>>>
>>>>>>>    PetscFinalize();
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 19, 2020 at 10:32 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Dear PETSc team,
>>>>>>>>    When I called SLEPc multiple time, I eventually got following
>>>>>>>> error:
>>>>>>>>
>>>>>>>> MPI operation not supported by PETSc's sequential MPI wrappers
>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 967 in
>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>> [0]PETSC ERROR: #2 SlepcInitialize() line 262 in
>>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>>> [0]PETSC ERROR: #3 SlepcInitializeNoPointers() line 359 in
>>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>
>>>>>>>>   I debugged: it is because of following in
>>>>>>>> petsc/src/sys/mpiuni/mpi.c
>>>>>>>>
>>>>>>>> if (num_attr >= MAX_ATTR)
>>>>>>>>
>>>>>>>> in function int MPI_Comm_create_keyval(MPI_Copy_function
>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state)
>>>>>>>>
>>>>>>>> num_attr is declared static and keeps increasing every
>>>>>>>> time MPI_Comm_create_keyval is called.
>>>>>>>>
>>>>>>>> I am using petsc 3.11.3 but found 3.13.2 has the same logic.
>>>>>>>>
>>>>>>>> Is this a bug or I didn't use it correctly?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sam
>>>>>>>>
>>>>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200620/82efd71b/attachment.html>
-------------- next part --------------
diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c
index d559a513..29299b21 100644
--- a/src/sys/mpiuni/mpi.c
+++ b/src/sys/mpiuni/mpi.c
@@ -23,8 +23,6 @@ void *MPIUNI_TMP = NULL;
 #define MAX_ATTR 256
 #define MAX_COMM 128
 
-static int MaxComm = 2;
-
 typedef struct {
   void *attribute_val;
   int  active;
@@ -37,10 +35,15 @@ typedef struct {
 
 static MPI_Attr_keyval attr_keyval[MAX_ATTR];
 static MPI_Attr        attr[MAX_COMM][MAX_ATTR];
-static int             comm_active[MAX_COMM];
-static int             num_attr = 1,mpi_tag_ub = 100000000;
+static int             comm_active[MAX_COMM];    /* Boolean array indicating which comms are in use */
+static int             keyval_active[MAX_ATTR];  /* Boolean array indicating which keyvals are in use */
+static int             mpi_tag_ub = 100000000;
+static int             num_attr = 1; /* Maximal number of keyvals/attributes ever created, including the predefined MPI_TAG_UB attribute. */
+static int             MaxComm  = 2;  /* Maximal number of communicators ever created, including comm_self(1), comm_world(2), but not comm_null(0) */
 static void*           MPIUNIF_mpi_in_place = 0;
 
+#define CommIdx(comm)  ((comm)-1)  /* the communicator's interal index used in attr[idx][] and comm_active[idx]. comm_null does not occupy slots in attr[][] */
+
 #if defined(__cplusplus)
 extern "C" {
 #endif
@@ -131,20 +134,31 @@ int MPI_Type_get_contents(MPI_Datatype datatype,int max_integers,int max_address
 */
 static int Keyval_setup(void)
 {
-  attr[MPI_COMM_WORLD-1][0].active        = 1;
-  attr[MPI_COMM_WORLD-1][0].attribute_val = &mpi_tag_ub;
-  attr[MPI_COMM_SELF-1][0].active         = 1;
-  attr[MPI_COMM_SELF-1][0].attribute_val  = &mpi_tag_ub;
+  attr[CommIdx(MPI_COMM_WORLD)][0].active        = 1;
+  attr[CommIdx(MPI_COMM_WORLD)][0].attribute_val = &mpi_tag_ub;
+  attr[CommIdx(MPI_COMM_SELF )][0].active        = 1;
+  attr[CommIdx(MPI_COMM_SELF )][0].attribute_val = &mpi_tag_ub;
+  keyval_active[0] = 1;
   return MPI_SUCCESS;
 }
 
 int MPI_Comm_create_keyval(MPI_Copy_function *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state)
 {
+  int i,keyid;
+  for (i=1; i<num_attr; i++) { /* the first attribute is always in use */
+    if (!keyval_active[i]) {
+      keyid = i;
+      goto found;
+    }
+  }
   if (num_attr >= MAX_ATTR) return MPIUni_Abort(MPI_COMM_WORLD,1);
+  keyid = num_attr++;
 
-  attr_keyval[num_attr].extra_state = extra_state;
-  attr_keyval[num_attr].del         = delete_fn;
-  *keyval                           = num_attr++;
+found:
+  attr_keyval[keyid].extra_state = extra_state;
+  attr_keyval[keyid].del         = delete_fn;
+  *keyval                        = keyid;
+  keyval_active[keyid]           = 1;
   return MPI_SUCCESS;
 }
 
@@ -152,26 +166,28 @@ int MPI_Comm_free_keyval(int *keyval)
 {
   attr_keyval[*keyval].extra_state = 0;
   attr_keyval[*keyval].del         = 0;
-
+  keyval_active[*keyval]           = 0;
   *keyval = 0;
   return MPI_SUCCESS;
 }
 
 int MPI_Comm_set_attr(MPI_Comm comm,int keyval,void *attribute_val)
 {
-  if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
-  attr[comm-1][keyval].active        = 1;
-  attr[comm-1][keyval].attribute_val = attribute_val;
+  int idx = CommIdx(comm);
+  if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
+  attr[idx][keyval].active        = 1;
+  attr[idx][keyval].attribute_val = attribute_val;
   return MPI_SUCCESS;
 }
 
 int MPI_Comm_delete_attr(MPI_Comm comm,int keyval)
 {
-  if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
-  if (attr[comm-1][keyval].active && attr_keyval[keyval].del) {
-    void *save_attribute_val = attr[comm-1][keyval].attribute_val;
-    attr[comm-1][keyval].active        = 0;
-    attr[comm-1][keyval].attribute_val = 0;
+  int idx = CommIdx(comm);
+  if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
+  if (attr[idx][keyval].active && attr_keyval[keyval].del) {
+    void *save_attribute_val        = attr[idx][keyval].attribute_val;
+    attr[idx][keyval].active        = 0;
+    attr[idx][keyval].attribute_val = 0;
     (*(attr_keyval[keyval].del))(comm,keyval,save_attribute_val,attr_keyval[keyval].extra_state);
   }
   return MPI_SUCCESS;
@@ -179,72 +195,74 @@ int MPI_Comm_delete_attr(MPI_Comm comm,int keyval)
 
 int MPI_Comm_get_attr(MPI_Comm comm,int keyval,void *attribute_val,int *flag)
 {
-  if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
+  int idx = CommIdx(comm);
+  if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
   if (!keyval) Keyval_setup();
-  *flag                  = attr[comm-1][keyval].active;
-  *(void**)attribute_val = attr[comm-1][keyval].attribute_val;
+  *flag                  = attr[idx][keyval].active;
+  *(void**)attribute_val = attr[idx][keyval].attribute_val;
   return MPI_SUCCESS;
 }
 
 int MPI_Comm_create(MPI_Comm comm,MPI_Group group,MPI_Comm *newcomm)
 {
   int j;
-  if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
-  for (j=3; j<MaxComm; j++) {
-    if (!comm_active[j-1]) {
-      comm_active[j-1] = 1;
+  if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
+  for (j=3; j<=MaxComm; j++) {
+    if (!comm_active[CommIdx(j)]) {
+      comm_active[CommIdx(j)] = 1;
       *newcomm = j;
       return MPI_SUCCESS;
     }
   }
-  if (MaxComm > MAX_COMM) return MPI_FAILURE;
-  *newcomm =  MaxComm++;
-  comm_active[*newcomm-1] = 1;
+  if (MaxComm >= MAX_COMM) return MPI_FAILURE;
+  *newcomm = ++MaxComm;
+  comm_active[CommIdx(*newcomm)] = 1;
   return MPI_SUCCESS;
 }
 
 int MPI_Comm_dup(MPI_Comm comm,MPI_Comm *out)
 {
   int j;
-  if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
-  for (j=3; j<MaxComm; j++) {
-    if (!comm_active[j-1]) {
-      comm_active[j-1] = 1;
+  if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
+  for (j=3; j<=MaxComm; j++) {
+    if (!comm_active[CommIdx(j)]) {
+      comm_active[CommIdx(j)] = 1;
       *out = j;
       return MPI_SUCCESS;
     }
   }
-  if (MaxComm > MAX_COMM) return MPI_FAILURE;
-  *out = MaxComm++;
-  comm_active[*out-1] = 1;
+  if (MaxComm >= MAX_COMM) return MPI_FAILURE;
+  *out = ++MaxComm;
+  comm_active[CommIdx(*out)] = 1;
   return MPI_SUCCESS;
 }
 
 int MPI_Comm_free(MPI_Comm *comm)
 {
   int i;
+  int idx = CommIdx(*comm);
 
-  if (*comm-1 < 0 || *comm-1 > MaxComm) return MPI_FAILURE;
+  if (*comm < 1 || *comm > MaxComm) return MPI_FAILURE;
   for (i=0; i<num_attr; i++) {
-    if (attr[*comm-1][i].active && attr_keyval[i].del) (*attr_keyval[i].del)(*comm,i,attr[*comm-1][i].attribute_val,attr_keyval[i].extra_state);
-    attr[*comm-1][i].active        = 0;
-    attr[*comm-1][i].attribute_val = 0;
+    if (attr[idx][i].active && attr_keyval[i].del) (*attr_keyval[i].del)(*comm,i,attr[idx][i].attribute_val,attr_keyval[i].extra_state);
+    attr[idx][i].active        = 0;
+    attr[idx][i].attribute_val = 0;
   }
-  if (*comm >= 3) comm_active[*comm-1] = 0;
+  if (*comm >= 3) comm_active[idx] = 0;
   *comm = 0;
   return MPI_SUCCESS;
 }
 
 int MPI_Comm_size(MPI_Comm comm, int *size)
 {
-  if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
+  if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
   *size=1;
   return MPI_SUCCESS;
 }
 
 int MPI_Comm_rank(MPI_Comm comm, int *rank)
 {
-  if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
+  if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
   *rank=0;
   return MPI_SUCCESS;
 }
@@ -269,7 +287,7 @@ static int MPI_was_finalized   = 0;
 int MPI_Init(int *argc, char ***argv)
 {
   if (MPI_was_initialized) return MPI_FAILURE;
-  if (MPI_was_finalized) return MPI_FAILURE;
+  if (MPI_was_finalized) return MPI_FAILURE; /* MPI standard: once MPI_FINALIZE returns, no MPI routine (not even MPI_INIT) may be called, except ... */
   MPI_was_initialized = 1;
   return MPI_SUCCESS;
 }
@@ -283,6 +301,17 @@ int MPI_Finalize(void)
   MPI_Comm_free(&comm);
   comm = MPI_COMM_SELF;
   MPI_Comm_free(&comm);
+#if defined(PETSC_USE_DEBUG)
+  {
+    int i;
+    for (i=3; i<=MaxComm; i++) {
+      if (comm_active[CommIdx(i)]) printf("Warning: MPI communicator %d is not freed before MPI_Finalize()\n", i);
+    }
+  }
+#endif
+  /* reset counters */
+  MaxComm  = 2;
+  num_attr = 1;
   MPI_was_finalized = 1;
   return MPI_SUCCESS;
 }
diff --git a/src/sys/tests/ex53.c b/src/sys/tests/ex53.c
new file mode 100644
index 00000000..4dd598ff
--- /dev/null
+++ b/src/sys/tests/ex53.c
@@ -0,0 +1,55 @@
+static char help[] = "Test resource recyling and MPI_Comm and keyval creation in mpi or mpiuni\n";
+
+#include <petscsys.h>
+
+#define  CHKMPIERR(ierr)  do {if (ierr) MPI_Abort(MPI_COMM_WORLD, ierr);} while(0)
+
+int main(int argc,char **argv)
+{
+  PetscErrorCode ierr;
+  PetscInt       i;
+  PetscMPIInt    key1,key2,attr1=100,attr2=200,*attr,flag;
+  MPI_Comm       newcomm;
+
+  ierr = MPI_Init(&argc,&argv);CHKMPIERR(ierr);
+
+  /* Repeated keyval create/free should not blow up MPI */
+  for (i=0; i<500; i++) {
+    ierr = MPI_Comm_create_keyval(MPI_COMM_NULL_COPY_FN,MPI_COMM_NULL_DELETE_FN,&key1,NULL);CHKMPIERR(ierr);
+    ierr = MPI_Comm_free_keyval(&key1);CHKMPIERR(ierr);
+  }
+
+  /* The following keyval/attr code exposes a bug in old mpiuni code, where it had wrong newcomm returned in MPI_Comm_dup. */
+  ierr = MPI_Comm_create_keyval(MPI_COMM_NULL_COPY_FN,MPI_COMM_NULL_DELETE_FN,&key1,NULL);CHKMPIERR(ierr);
+  ierr = MPI_Comm_create_keyval(MPI_COMM_NULL_COPY_FN,MPI_COMM_NULL_DELETE_FN,&key2,NULL);CHKMPIERR(ierr);
+  ierr = MPI_Comm_dup(MPI_COMM_WORLD,&newcomm);CHKMPIERR(ierr);
+  if (MPI_COMM_WORLD == newcomm) printf("Error: wrong newcomm returned by MPI_Comm_dup()\n");
+
+  ierr = MPI_Comm_set_attr(MPI_COMM_WORLD,key1,&attr1);CHKMPIERR(ierr);
+  ierr = MPI_Comm_set_attr(newcomm,key2,&attr2);CHKMPIERR(ierr);
+  ierr = MPI_Comm_get_attr(newcomm,key1,&attr,&flag);CHKMPIERR(ierr);
+  if (flag) printf("Error: newcomm should not have attribute for keyval %d\n", (int)key1);
+  ierr = MPI_Comm_get_attr(MPI_COMM_WORLD,key1,&attr,&flag);CHKMPIERR(ierr);
+  if (*attr != attr1) printf("Error: expected attribute %d, but got %d\n", (int)attr1, (int)*attr);
+  ierr = MPI_Comm_get_attr(newcomm,key2,&attr,&flag);CHKMPIERR(ierr);
+  if (*attr != attr2) printf("Error: expected attribute %d, but got %d\n", (int)attr2, (int)*attr);
+
+  ierr = MPI_Comm_delete_attr(MPI_COMM_WORLD,key1);CHKMPIERR(ierr);
+  ierr = MPI_Comm_delete_attr(newcomm,key2);CHKMPIERR(ierr);
+  ierr = MPI_Comm_free_keyval(&key1);CHKMPIERR(ierr);
+  ierr = MPI_Comm_free_keyval(&key2);CHKMPIERR(ierr);
+  ierr = MPI_Comm_free(&newcomm);CHKMPIERR(ierr);
+
+  /* Init/Finalize PETSc multiple times when MPI is initialized */
+  for (i=0; i<500; i++) {
+    ierr = PetscInitialize(&argc,&argv,(char*)0,help);if (ierr) return ierr;
+    ierr = PetscFinalize();if (ierr) return ierr;
+  }
+
+  ierr = MPI_Finalize();
+  return ierr;
+}
+
+/*TEST
+   test:
+TEST*/
diff --git a/src/sys/tests/makefile b/src/sys/tests/makefile
index 5bd93ddc..5dcdb9bf 100644
--- a/src/sys/tests/makefile
+++ b/src/sys/tests/makefile
@@ -8,7 +8,7 @@ EXAMPLESC       = ex1.c ex2.c ex3.c ex6.c ex7.c ex8.c ex9.c ex10.c ex11.c ex12.c
                   ex14.c ex16.c ex18.c ex19.c ex20.c ex21.c \
                   ex22.c ex23.c ex24.c ex25.c ex26.c ex27.c ex28.c ex29.c ex30.c ex31.c ex32.c ex35.c ex37.c \
                   ex44.cxx ex45.cxx ex46.cxx ex47.c ex49.c \
-                  ex50.c ex51.c ex52.c
+                  ex50.c ex51.c ex52.c ex53.c
 EXAMPLESF       = ex1f.F90 ex5f.F ex6f.F ex17f.F ex36f.F90 ex38f.F90 ex47f.F90 ex48f90.F90 ex49f.F90
 MANSEC          = Sys
 
diff --git a/src/sys/tests/output/ex53_1.out b/src/sys/tests/output/ex53_1.out
new file mode 100644
index 00000000..e69de29b


More information about the petsc-users mailing list