[petsc-users] PETSc initialization error
Junchao Zhang
junchao.zhang at gmail.com
Sat Jun 20 18:06:09 CDT 2020
Hi, Sam,
Sorry I am late. Please try branch jczhang/fix-mpiuni, based off petsc
maint, or directly apply the attached patch. Under your PETSC_DIR,
git apply patch.txt
Thanks.
--Junchao Zhang
On Sat, Jun 20, 2020 at 1:40 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:
> Sam,
> There are more problems. I am working on a fix. Please wait an hour.
> Thanks.
>
> --Junchao Zhang
>
>
> On Sat, Jun 20, 2020 at 1:12 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>
>> Junchao,
>> I debugged: MPI_Finalize is not called for serial.
>>
>> Barry,
>> I tried your patch and it seems better but eventually got following
>> error:
>>
>> [0]PETSC ERROR: #1 PetscCommDuplicate() line 160 in
>> ../../../petsc/src/sys/objects/tagm.c
>> [0]PETSC ERROR: #2 PetscHeaderCreate_Private() line 64 in
>> ../../../petsc/src/sys/objects/inherit.c
>> [0]PETSC ERROR: #3 MatCreate() line 91 in
>> ../../../petsc/src/mat/utils/gcreate.c
>> [0]PETSC ERROR: #4 MatCreateShell() line 787 in
>> ../../../petsc/src/mat/impls/shell/shell.c
>> [0]PETSC ERROR: --------------------- Error Message
>> --------------------------------------------------------------
>> [0]PETSC ERROR: Null argument, when expecting valid pointer
>> [0]PETSC ERROR: Null Object: Parameter # 1
>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>> for trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.11.3, Jun, 26, 2019
>> [0]PETSC ERROR: #6 MatShellSetOperation() line 1052 in
>> ../../../petsc/src/mat/impls/shell/shell.c
>>
>> On Sat, Jun 20, 2020 at 10:24 AM Barry Smith <bsmith at petsc.dev> wrote:
>>
>>>
>>> Junchao,
>>>
>>> This is a good bug fix. It solves the problem when PETSc initialize
>>> is called many times.
>>>
>>> There is another fix you can do to limit PETSc mpiuni running out
>>> of attributes inside a single PETSc run:
>>>
>>>
>>> int MPI_Comm_create_keyval(MPI_Copy_function
>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state)
>>> {
>>>
>>> if (num_attr >= MAX_ATTR){
>>> for (i=0; i<num_attr; i++) {
>>> if (!attr_keyval[i].extra_state) {
>>> /* reuse this slot */
>>> attr_keyval[i].extra_state = extra_state;
>>> attr_keyval[i.]del = delete_fn;
>>> *keyval = i;
>>> return MPI_SUCCESS;
>>> }
>>> }
>>> return MPIUni_Abort(MPI_COMM_WORLD,1);
>>> }
>>> return MPIUni_Abort(MPI_COMM_WORLD,1);
>>> attr_keyval[num_attr].extra_state = extra_state;
>>> attr_keyval[num_attr].del = delete_fn;
>>> *keyval = num_attr++;
>>> return MPI_SUCCESS;
>>> }
>>>
>>> This will work if the user creates tons of attributes but is
>>> constantly deleting some as they new ones. So long as the number
>>> outstanding at one time is < MAX_ATTR)
>>>
>>> Barry
>>>
>>>
>>>
>>>
>>>
>>> On Jun 20, 2020, at 10:54 AM, Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>> I don't understand what your session means. Let's try this patch
>>>
>>> diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c
>>> index d559a513..c058265d 100644
>>> --- a/src/sys/mpiuni/mpi.c
>>> +++ b/src/sys/mpiuni/mpi.c
>>> @@ -283,6 +283,7 @@ int MPI_Finalize(void)
>>> MPI_Comm_free(&comm);
>>> comm = MPI_COMM_SELF;
>>> MPI_Comm_free(&comm);
>>> + num_attr = 1; /* reset the counter */
>>> MPI_was_finalized = 1;
>>> return MPI_SUCCESS;
>>> }
>>>
>>>
>>> --Junchao Zhang
>>>
>>>
>>> On Sat, Jun 20, 2020 at 10:48 AM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>
>>>> Typo: I mean “Assuming initializer is only needed once for entire
>>>> session”
>>>>
>>>> On Saturday, June 20, 2020, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>
>>>>> Assuming finalizer is only needed once for entire session(?), I can
>>>>> put initializer into the static block to call it once but where do I call
>>>>> finalizer?
>>>>>
>>>>>
>>>>> On Saturday, June 20, 2020, Junchao Zhang <junchao.zhang at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> The counter num_attr should be recycled. But first try to call PETSc
>>>>>> initialize/Finalize only once to see it fixes the error.
>>>>>> --Junchao Zhang
>>>>>>
>>>>>>
>>>>>> On Sat, Jun 20, 2020 at 12:48 AM Sam Guo <sam.guo at cd-adapco.com>
>>>>>> wrote:
>>>>>>
>>>>>>> To clarify, I call PETSc initialize and PETSc finalize everytime I
>>>>>>> call SLEPc:
>>>>>>>
>>>>>>> PetscInitializeNoPointers(argc,args,nullptr,nullptr);
>>>>>>>
>>>>>>> SlepcInitialize(&argc,&args,static_cast<char*>(nullptr),help);
>>>>>>>
>>>>>>> //calling slepc
>>>>>>>
>>>>>>> SlepcFinalize();
>>>>>>>
>>>>>>> PetscFinalize();
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 19, 2020 at 10:32 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Dear PETSc team,
>>>>>>>> When I called SLEPc multiple time, I eventually got following
>>>>>>>> error:
>>>>>>>>
>>>>>>>> MPI operation not supported by PETSc's sequential MPI wrappers
>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 967 in
>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>> [0]PETSC ERROR: #2 SlepcInitialize() line 262 in
>>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>>> [0]PETSC ERROR: #3 SlepcInitializeNoPointers() line 359 in
>>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>
>>>>>>>> I debugged: it is because of following in
>>>>>>>> petsc/src/sys/mpiuni/mpi.c
>>>>>>>>
>>>>>>>> if (num_attr >= MAX_ATTR)
>>>>>>>>
>>>>>>>> in function int MPI_Comm_create_keyval(MPI_Copy_function
>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state)
>>>>>>>>
>>>>>>>> num_attr is declared static and keeps increasing every
>>>>>>>> time MPI_Comm_create_keyval is called.
>>>>>>>>
>>>>>>>> I am using petsc 3.11.3 but found 3.13.2 has the same logic.
>>>>>>>>
>>>>>>>> Is this a bug or I didn't use it correctly?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sam
>>>>>>>>
>>>>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200620/82efd71b/attachment.html>
-------------- next part --------------
diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c
index d559a513..29299b21 100644
--- a/src/sys/mpiuni/mpi.c
+++ b/src/sys/mpiuni/mpi.c
@@ -23,8 +23,6 @@ void *MPIUNI_TMP = NULL;
#define MAX_ATTR 256
#define MAX_COMM 128
-static int MaxComm = 2;
-
typedef struct {
void *attribute_val;
int active;
@@ -37,10 +35,15 @@ typedef struct {
static MPI_Attr_keyval attr_keyval[MAX_ATTR];
static MPI_Attr attr[MAX_COMM][MAX_ATTR];
-static int comm_active[MAX_COMM];
-static int num_attr = 1,mpi_tag_ub = 100000000;
+static int comm_active[MAX_COMM]; /* Boolean array indicating which comms are in use */
+static int keyval_active[MAX_ATTR]; /* Boolean array indicating which keyvals are in use */
+static int mpi_tag_ub = 100000000;
+static int num_attr = 1; /* Maximal number of keyvals/attributes ever created, including the predefined MPI_TAG_UB attribute. */
+static int MaxComm = 2; /* Maximal number of communicators ever created, including comm_self(1), comm_world(2), but not comm_null(0) */
static void* MPIUNIF_mpi_in_place = 0;
+#define CommIdx(comm) ((comm)-1) /* the communicator's interal index used in attr[idx][] and comm_active[idx]. comm_null does not occupy slots in attr[][] */
+
#if defined(__cplusplus)
extern "C" {
#endif
@@ -131,20 +134,31 @@ int MPI_Type_get_contents(MPI_Datatype datatype,int max_integers,int max_address
*/
static int Keyval_setup(void)
{
- attr[MPI_COMM_WORLD-1][0].active = 1;
- attr[MPI_COMM_WORLD-1][0].attribute_val = &mpi_tag_ub;
- attr[MPI_COMM_SELF-1][0].active = 1;
- attr[MPI_COMM_SELF-1][0].attribute_val = &mpi_tag_ub;
+ attr[CommIdx(MPI_COMM_WORLD)][0].active = 1;
+ attr[CommIdx(MPI_COMM_WORLD)][0].attribute_val = &mpi_tag_ub;
+ attr[CommIdx(MPI_COMM_SELF )][0].active = 1;
+ attr[CommIdx(MPI_COMM_SELF )][0].attribute_val = &mpi_tag_ub;
+ keyval_active[0] = 1;
return MPI_SUCCESS;
}
int MPI_Comm_create_keyval(MPI_Copy_function *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state)
{
+ int i,keyid;
+ for (i=1; i<num_attr; i++) { /* the first attribute is always in use */
+ if (!keyval_active[i]) {
+ keyid = i;
+ goto found;
+ }
+ }
if (num_attr >= MAX_ATTR) return MPIUni_Abort(MPI_COMM_WORLD,1);
+ keyid = num_attr++;
- attr_keyval[num_attr].extra_state = extra_state;
- attr_keyval[num_attr].del = delete_fn;
- *keyval = num_attr++;
+found:
+ attr_keyval[keyid].extra_state = extra_state;
+ attr_keyval[keyid].del = delete_fn;
+ *keyval = keyid;
+ keyval_active[keyid] = 1;
return MPI_SUCCESS;
}
@@ -152,26 +166,28 @@ int MPI_Comm_free_keyval(int *keyval)
{
attr_keyval[*keyval].extra_state = 0;
attr_keyval[*keyval].del = 0;
-
+ keyval_active[*keyval] = 0;
*keyval = 0;
return MPI_SUCCESS;
}
int MPI_Comm_set_attr(MPI_Comm comm,int keyval,void *attribute_val)
{
- if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
- attr[comm-1][keyval].active = 1;
- attr[comm-1][keyval].attribute_val = attribute_val;
+ int idx = CommIdx(comm);
+ if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
+ attr[idx][keyval].active = 1;
+ attr[idx][keyval].attribute_val = attribute_val;
return MPI_SUCCESS;
}
int MPI_Comm_delete_attr(MPI_Comm comm,int keyval)
{
- if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
- if (attr[comm-1][keyval].active && attr_keyval[keyval].del) {
- void *save_attribute_val = attr[comm-1][keyval].attribute_val;
- attr[comm-1][keyval].active = 0;
- attr[comm-1][keyval].attribute_val = 0;
+ int idx = CommIdx(comm);
+ if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
+ if (attr[idx][keyval].active && attr_keyval[keyval].del) {
+ void *save_attribute_val = attr[idx][keyval].attribute_val;
+ attr[idx][keyval].active = 0;
+ attr[idx][keyval].attribute_val = 0;
(*(attr_keyval[keyval].del))(comm,keyval,save_attribute_val,attr_keyval[keyval].extra_state);
}
return MPI_SUCCESS;
@@ -179,72 +195,74 @@ int MPI_Comm_delete_attr(MPI_Comm comm,int keyval)
int MPI_Comm_get_attr(MPI_Comm comm,int keyval,void *attribute_val,int *flag)
{
- if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
+ int idx = CommIdx(comm);
+ if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
if (!keyval) Keyval_setup();
- *flag = attr[comm-1][keyval].active;
- *(void**)attribute_val = attr[comm-1][keyval].attribute_val;
+ *flag = attr[idx][keyval].active;
+ *(void**)attribute_val = attr[idx][keyval].attribute_val;
return MPI_SUCCESS;
}
int MPI_Comm_create(MPI_Comm comm,MPI_Group group,MPI_Comm *newcomm)
{
int j;
- if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
- for (j=3; j<MaxComm; j++) {
- if (!comm_active[j-1]) {
- comm_active[j-1] = 1;
+ if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
+ for (j=3; j<=MaxComm; j++) {
+ if (!comm_active[CommIdx(j)]) {
+ comm_active[CommIdx(j)] = 1;
*newcomm = j;
return MPI_SUCCESS;
}
}
- if (MaxComm > MAX_COMM) return MPI_FAILURE;
- *newcomm = MaxComm++;
- comm_active[*newcomm-1] = 1;
+ if (MaxComm >= MAX_COMM) return MPI_FAILURE;
+ *newcomm = ++MaxComm;
+ comm_active[CommIdx(*newcomm)] = 1;
return MPI_SUCCESS;
}
int MPI_Comm_dup(MPI_Comm comm,MPI_Comm *out)
{
int j;
- if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
- for (j=3; j<MaxComm; j++) {
- if (!comm_active[j-1]) {
- comm_active[j-1] = 1;
+ if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
+ for (j=3; j<=MaxComm; j++) {
+ if (!comm_active[CommIdx(j)]) {
+ comm_active[CommIdx(j)] = 1;
*out = j;
return MPI_SUCCESS;
}
}
- if (MaxComm > MAX_COMM) return MPI_FAILURE;
- *out = MaxComm++;
- comm_active[*out-1] = 1;
+ if (MaxComm >= MAX_COMM) return MPI_FAILURE;
+ *out = ++MaxComm;
+ comm_active[CommIdx(*out)] = 1;
return MPI_SUCCESS;
}
int MPI_Comm_free(MPI_Comm *comm)
{
int i;
+ int idx = CommIdx(*comm);
- if (*comm-1 < 0 || *comm-1 > MaxComm) return MPI_FAILURE;
+ if (*comm < 1 || *comm > MaxComm) return MPI_FAILURE;
for (i=0; i<num_attr; i++) {
- if (attr[*comm-1][i].active && attr_keyval[i].del) (*attr_keyval[i].del)(*comm,i,attr[*comm-1][i].attribute_val,attr_keyval[i].extra_state);
- attr[*comm-1][i].active = 0;
- attr[*comm-1][i].attribute_val = 0;
+ if (attr[idx][i].active && attr_keyval[i].del) (*attr_keyval[i].del)(*comm,i,attr[idx][i].attribute_val,attr_keyval[i].extra_state);
+ attr[idx][i].active = 0;
+ attr[idx][i].attribute_val = 0;
}
- if (*comm >= 3) comm_active[*comm-1] = 0;
+ if (*comm >= 3) comm_active[idx] = 0;
*comm = 0;
return MPI_SUCCESS;
}
int MPI_Comm_size(MPI_Comm comm, int *size)
{
- if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
+ if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
*size=1;
return MPI_SUCCESS;
}
int MPI_Comm_rank(MPI_Comm comm, int *rank)
{
- if (comm-1 < 0 || comm-1 > MaxComm) return MPI_FAILURE;
+ if (comm < 1 || comm > MaxComm) return MPI_FAILURE;
*rank=0;
return MPI_SUCCESS;
}
@@ -269,7 +287,7 @@ static int MPI_was_finalized = 0;
int MPI_Init(int *argc, char ***argv)
{
if (MPI_was_initialized) return MPI_FAILURE;
- if (MPI_was_finalized) return MPI_FAILURE;
+ if (MPI_was_finalized) return MPI_FAILURE; /* MPI standard: once MPI_FINALIZE returns, no MPI routine (not even MPI_INIT) may be called, except ... */
MPI_was_initialized = 1;
return MPI_SUCCESS;
}
@@ -283,6 +301,17 @@ int MPI_Finalize(void)
MPI_Comm_free(&comm);
comm = MPI_COMM_SELF;
MPI_Comm_free(&comm);
+#if defined(PETSC_USE_DEBUG)
+ {
+ int i;
+ for (i=3; i<=MaxComm; i++) {
+ if (comm_active[CommIdx(i)]) printf("Warning: MPI communicator %d is not freed before MPI_Finalize()\n", i);
+ }
+ }
+#endif
+ /* reset counters */
+ MaxComm = 2;
+ num_attr = 1;
MPI_was_finalized = 1;
return MPI_SUCCESS;
}
diff --git a/src/sys/tests/ex53.c b/src/sys/tests/ex53.c
new file mode 100644
index 00000000..4dd598ff
--- /dev/null
+++ b/src/sys/tests/ex53.c
@@ -0,0 +1,55 @@
+static char help[] = "Test resource recyling and MPI_Comm and keyval creation in mpi or mpiuni\n";
+
+#include <petscsys.h>
+
+#define CHKMPIERR(ierr) do {if (ierr) MPI_Abort(MPI_COMM_WORLD, ierr);} while(0)
+
+int main(int argc,char **argv)
+{
+ PetscErrorCode ierr;
+ PetscInt i;
+ PetscMPIInt key1,key2,attr1=100,attr2=200,*attr,flag;
+ MPI_Comm newcomm;
+
+ ierr = MPI_Init(&argc,&argv);CHKMPIERR(ierr);
+
+ /* Repeated keyval create/free should not blow up MPI */
+ for (i=0; i<500; i++) {
+ ierr = MPI_Comm_create_keyval(MPI_COMM_NULL_COPY_FN,MPI_COMM_NULL_DELETE_FN,&key1,NULL);CHKMPIERR(ierr);
+ ierr = MPI_Comm_free_keyval(&key1);CHKMPIERR(ierr);
+ }
+
+ /* The following keyval/attr code exposes a bug in old mpiuni code, where it had wrong newcomm returned in MPI_Comm_dup. */
+ ierr = MPI_Comm_create_keyval(MPI_COMM_NULL_COPY_FN,MPI_COMM_NULL_DELETE_FN,&key1,NULL);CHKMPIERR(ierr);
+ ierr = MPI_Comm_create_keyval(MPI_COMM_NULL_COPY_FN,MPI_COMM_NULL_DELETE_FN,&key2,NULL);CHKMPIERR(ierr);
+ ierr = MPI_Comm_dup(MPI_COMM_WORLD,&newcomm);CHKMPIERR(ierr);
+ if (MPI_COMM_WORLD == newcomm) printf("Error: wrong newcomm returned by MPI_Comm_dup()\n");
+
+ ierr = MPI_Comm_set_attr(MPI_COMM_WORLD,key1,&attr1);CHKMPIERR(ierr);
+ ierr = MPI_Comm_set_attr(newcomm,key2,&attr2);CHKMPIERR(ierr);
+ ierr = MPI_Comm_get_attr(newcomm,key1,&attr,&flag);CHKMPIERR(ierr);
+ if (flag) printf("Error: newcomm should not have attribute for keyval %d\n", (int)key1);
+ ierr = MPI_Comm_get_attr(MPI_COMM_WORLD,key1,&attr,&flag);CHKMPIERR(ierr);
+ if (*attr != attr1) printf("Error: expected attribute %d, but got %d\n", (int)attr1, (int)*attr);
+ ierr = MPI_Comm_get_attr(newcomm,key2,&attr,&flag);CHKMPIERR(ierr);
+ if (*attr != attr2) printf("Error: expected attribute %d, but got %d\n", (int)attr2, (int)*attr);
+
+ ierr = MPI_Comm_delete_attr(MPI_COMM_WORLD,key1);CHKMPIERR(ierr);
+ ierr = MPI_Comm_delete_attr(newcomm,key2);CHKMPIERR(ierr);
+ ierr = MPI_Comm_free_keyval(&key1);CHKMPIERR(ierr);
+ ierr = MPI_Comm_free_keyval(&key2);CHKMPIERR(ierr);
+ ierr = MPI_Comm_free(&newcomm);CHKMPIERR(ierr);
+
+ /* Init/Finalize PETSc multiple times when MPI is initialized */
+ for (i=0; i<500; i++) {
+ ierr = PetscInitialize(&argc,&argv,(char*)0,help);if (ierr) return ierr;
+ ierr = PetscFinalize();if (ierr) return ierr;
+ }
+
+ ierr = MPI_Finalize();
+ return ierr;
+}
+
+/*TEST
+ test:
+TEST*/
diff --git a/src/sys/tests/makefile b/src/sys/tests/makefile
index 5bd93ddc..5dcdb9bf 100644
--- a/src/sys/tests/makefile
+++ b/src/sys/tests/makefile
@@ -8,7 +8,7 @@ EXAMPLESC = ex1.c ex2.c ex3.c ex6.c ex7.c ex8.c ex9.c ex10.c ex11.c ex12.c
ex14.c ex16.c ex18.c ex19.c ex20.c ex21.c \
ex22.c ex23.c ex24.c ex25.c ex26.c ex27.c ex28.c ex29.c ex30.c ex31.c ex32.c ex35.c ex37.c \
ex44.cxx ex45.cxx ex46.cxx ex47.c ex49.c \
- ex50.c ex51.c ex52.c
+ ex50.c ex51.c ex52.c ex53.c
EXAMPLESF = ex1f.F90 ex5f.F ex6f.F ex17f.F ex36f.F90 ex38f.F90 ex47f.F90 ex48f90.F90 ex49f.F90
MANSEC = Sys
diff --git a/src/sys/tests/output/ex53_1.out b/src/sys/tests/output/ex53_1.out
new file mode 100644
index 00000000..e69de29b
More information about the petsc-users
mailing list