[petsc-users] PETSc initialization error

Junchao Zhang junchao.zhang at gmail.com
Fri Jun 26 20:57:49 CDT 2020


Did the test included in that commit fail in your environment? You can also
change the test by adding calls to SlepcInitialize/SlepcFinalize between
PetscInitializeNoPointers/PetscFinalize as in my previous email.

--Junchao Zhang


On Fri, Jun 26, 2020 at 5:54 PM Sam Guo <sam.guo at cd-adapco.com> wrote:

> Hi Junchao,
>    If you are talking about this commit of yours
> https://gitlab.com/petsc/petsc/-/commit/f0463fa09df52ce43e7c5bf47a1c87df0c9e5cbb
>
> Recycle keyvals and fix bugs in MPI_Comm creation
>    I think I got it. It fixes the serial one but parallel one is still
> crashing.
>
> Thanks,
> Sam
>
> On Fri, Jun 26, 2020 at 3:43 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>
>> Hi Junchao,
>>    I am not ready to upgrade petsc yet(due to the lengthy technical and
>> legal approval process of our internal policy). Can you send me the diff
>> file so I can apply it to petsc 3.11.3)?
>>
>> Thanks,
>> Sam
>>
>> On Fri, Jun 26, 2020 at 3:33 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>> Sam,
>>>   Please discard the origin patch I sent you. A better fix is already in
>>> maint/master. An test is at src/sys/tests/ex53.c
>>>   I modified that test at the end with
>>>
>>>   for (i=0; i<500; i++) {
>>>     ierr = PetscInitializeNoPointers(argc,argv,NULL,help);if (ierr)
>>> return ierr;
>>>     ierr = SlepcInitialize(&argc,&argv,NULL,help);if (ierr) return ierr;
>>>     ierr = SlepcFinalize();if (ierr) return ierr;
>>>     ierr = PetscFinalize();if (ierr) return ierr;
>>>   }
>>>
>>>
>>>  then I ran it with multiple mpi ranks and it ran correctly. So try your
>>> program with petsc master first. If not work, see if you can come up with a
>>> test example for us.
>>>
>>>  Thanks.
>>> --Junchao Zhang
>>>
>>>
>>> On Fri, Jun 26, 2020 at 3:37 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>
>>>> One work around for me is to call PetscInitialize once for my entire
>>>> program and skip PetscFinalize (since I don't have a good place to call
>>>> PetscFinalize   before ending the program).
>>>>
>>>> On Fri, Jun 26, 2020 at 1:33 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>
>>>>> I get the crash after calling Initialize/Finalize multiple times.
>>>>> Junchao fixed the bug for serial but parallel still crashes.
>>>>>
>>>>> On Fri, Jun 26, 2020 at 1:28 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>
>>>>>>
>>>>>>   Ah, so you get the crash the second time you call
>>>>>> PetscInitialize()?  That is a problem because we do intend to support that
>>>>>> capability (but you much call PetscFinalize() each time also).
>>>>>>
>>>>>>   Barry
>>>>>>
>>>>>>
>>>>>> On Jun 26, 2020, at 3:25 PM, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>>>
>>>>>> Hi Barry,
>>>>>>    Thanks for the quick response.
>>>>>>    I will call PetscInitialize once and skip the PetscFinalize for
>>>>>> now to avoid the crash. The crash is actually in PetscInitialize, not
>>>>>> PetscFinalize.
>>>>>>
>>>>>> Thanks,
>>>>>> Sam
>>>>>>
>>>>>> On Fri, Jun 26, 2020 at 1:21 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>>
>>>>>>>
>>>>>>>   Sam,
>>>>>>>
>>>>>>>   You can skip PetscFinalize() so long as you only call
>>>>>>> PetscInitialize() once. It is not desirable in general to skip the finalize
>>>>>>> because PETSc can't free all its data structures and you cannot see the
>>>>>>> PETSc logging information with -log_view but in terms of the code running
>>>>>>> correctly you do not need to call PetscFinalize.
>>>>>>>
>>>>>>>    If your code crashes in PetscFinalize() please send the full
>>>>>>> error output and we can try to help you debug it.
>>>>>>>
>>>>>>>
>>>>>>>    Barry
>>>>>>>
>>>>>>> On Jun 26, 2020, at 3:14 PM, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>>>>
>>>>>>> To clarify, we have a mpi wrapper (so we can switch to different mpi
>>>>>>> at runtime). I compile petsc using our mpi wrapper.
>>>>>>> If I just call PETSc initialize once without calling finallize, it
>>>>>>> is ok. My question to you is that: can I skip finalize?
>>>>>>> Our program calls mpi_finalize at end anyway.
>>>>>>>
>>>>>>> On Fri, Jun 26, 2020 at 1:09 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Junchao,
>>>>>>>>    Attached please find the configure.log.
>>>>>>>>    I also attach the pinit.c which contains your patch (I am
>>>>>>>> currently using 3.11.3. I've applied your patch to 3.11.3). Your patch
>>>>>>>> fixes the serial version. The error now is about the parallel.
>>>>>>>>    Here is the error log:
>>>>>>>>
>>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in
>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>> [1]PETSC ERROR: #2 checkError() line 56 in
>>>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp
>>>>>>>> [1]PETSC ERROR: #3 PetscInitialize() line 966 in
>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>> [1]PETSC ERROR: #4 SlepcInitialize() line 262 in
>>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 969 in
>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>> [0]PETSC ERROR: #2 checkError() line 56 in
>>>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp
>>>>>>>> [0]PETSC ERROR: #3 PetscInitialize() line 966 in
>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>> [0]PETSC ERROR: #4 SlepcInitialize() line 262 in
>>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------------
>>>>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>>>>>>>> with errorcode 56.
>>>>>>>>
>>>>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>>>>>>> You may or may not see output from other processes, depending on
>>>>>>>> exactly when Open MPI kills them.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sam
>>>>>>>>
>>>>>>>> On Thu, Jun 25, 2020 at 7:37 PM Junchao Zhang <
>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Sam,
>>>>>>>>>    The MPI_Comm_create_keyval() error was fixed in maint/master.
>>>>>>>>> From the error message, it seems you need to configure --with-log=1
>>>>>>>>>    Otherwise, please send your full error stack trace and
>>>>>>>>> configure.log.
>>>>>>>>>   Thanks.
>>>>>>>>> --Junchao Zhang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jun 25, 2020 at 2:18 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Junchao,
>>>>>>>>>>    I now encountered the same error with parallel. I am wondering
>>>>>>>>>> if there is a need for parallel fix as well.
>>>>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in
>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>>>
>>>>>>>>>> On Sat, Jun 20, 2020 at 7:35 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Junchao,
>>>>>>>>>>>    Your patch works.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Sam
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Jun 20, 2020 at 4:23 PM Junchao Zhang <
>>>>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:24 PM Barry Smith <bsmith at petsc.dev>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    Junchao,
>>>>>>>>>>>>>
>>>>>>>>>>>>>      This is a good bug fix. It solves the problem when PETSc
>>>>>>>>>>>>> initialize is called many times.
>>>>>>>>>>>>>
>>>>>>>>>>>>>      There is another fix you can do to limit PETSc mpiuni
>>>>>>>>>>>>> running out of attributes inside a single PETSc run:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> int MPI_Comm_create_keyval(MPI_Copy_function
>>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state)
>>>>>>>>>>>>> {
>>>>>>>>>>>>>
>>>>>>>>>>>>>  if (num_attr >= MAX_ATTR){
>>>>>>>>>>>>>    for (i=0; i<num_attr; i++) {
>>>>>>>>>>>>>      if (!attr_keyval[i].extra_state) {
>>>>>>>>>>>>>
>>>>>>>>>>>> attr_keyval[i].extra_state is provided by user (could be NULL).
>>>>>>>>>>>> We can not rely on it.
>>>>>>>>>>>>
>>>>>>>>>>>>>         /* reuse this slot */
>>>>>>>>>>>>>         attr_keyval[i].extra_state = extra_state;
>>>>>>>>>>>>>        attr_keyval[i.]del         = delete_fn;
>>>>>>>>>>>>>        *keyval = i;
>>>>>>>>>>>>>         return MPI_SUCCESS;
>>>>>>>>>>>>>      }
>>>>>>>>>>>>>   }
>>>>>>>>>>>>>   return MPIUni_Abort(MPI_COMM_WORLD,1);
>>>>>>>>>>>>> }
>>>>>>>>>>>>>  return MPIUni_Abort(MPI_COMM_WORLD,1);
>>>>>>>>>>>>>   attr_keyval[num_attr].extra_state = extra_state;
>>>>>>>>>>>>>   attr_keyval[num_attr].del         = delete_fn;
>>>>>>>>>>>>>   *keyval                           = num_attr++;
>>>>>>>>>>>>>   return MPI_SUCCESS;
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>   This will work if the user creates tons of attributes but is
>>>>>>>>>>>>> constantly deleting some as they new ones. So long as the number
>>>>>>>>>>>>> outstanding at one time is < MAX_ATTR)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Barry
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Jun 20, 2020, at 10:54 AM, Junchao Zhang <
>>>>>>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't understand what your session means. Let's try this
>>>>>>>>>>>>> patch
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c
>>>>>>>>>>>>> index d559a513..c058265d 100644
>>>>>>>>>>>>> --- a/src/sys/mpiuni/mpi.c
>>>>>>>>>>>>> +++ b/src/sys/mpiuni/mpi.c
>>>>>>>>>>>>> @@ -283,6 +283,7 @@ int MPI_Finalize(void)
>>>>>>>>>>>>>    MPI_Comm_free(&comm);
>>>>>>>>>>>>>    comm = MPI_COMM_SELF;
>>>>>>>>>>>>>    MPI_Comm_free(&comm);
>>>>>>>>>>>>> +  num_attr = 1; /* reset the counter */
>>>>>>>>>>>>>    MPI_was_finalized = 1;
>>>>>>>>>>>>>    return MPI_SUCCESS;
>>>>>>>>>>>>>  }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 10:48 AM Sam Guo <
>>>>>>>>>>>>> sam.guo at cd-adapco.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Typo: I mean “Assuming initializer is only needed once for
>>>>>>>>>>>>>> entire session”
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Saturday, June 20, 2020, Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Assuming finalizer is only needed once for entire
>>>>>>>>>>>>>>> session(?), I can put initializer into the static block to call it once but
>>>>>>>>>>>>>>> where do I call finalizer?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Saturday, June 20, 2020, Junchao Zhang <
>>>>>>>>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The counter num_attr should be recycled. But first try to
>>>>>>>>>>>>>>>> call PETSc initialize/Finalize only once to see it fixes the error.
>>>>>>>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:48 AM Sam Guo <
>>>>>>>>>>>>>>>> sam.guo at cd-adapco.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To clarify, I call PETSc initialize and PETSc finalize
>>>>>>>>>>>>>>>>> everytime I call SLEPc:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   PetscInitializeNoPointers(argc,args,nullptr,nullptr);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   SlepcInitialize(&argc,&args,static_cast<char*>(nullptr),help);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   //calling slepc
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   SlepcFinalize();
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    PetscFinalize();
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Jun 19, 2020 at 10:32 PM Sam Guo <
>>>>>>>>>>>>>>>>> sam.guo at cd-adapco.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Dear PETSc team,
>>>>>>>>>>>>>>>>>>    When I called SLEPc multiple time, I eventually got
>>>>>>>>>>>>>>>>>> following error:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> MPI operation not supported by PETSc's sequential MPI
>>>>>>>>>>>>>>>>>> wrappers
>>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 967 in
>>>>>>>>>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #2 SlepcInitialize() line 262 in
>>>>>>>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #3 SlepcInitializeNoPointers() line 359
>>>>>>>>>>>>>>>>>> in ../../../slepc/src/sys/slepcinit.c
>>>>>>>>>>>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>>>>>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   I debugged: it is because of following in
>>>>>>>>>>>>>>>>>> petsc/src/sys/mpiuni/mpi.c
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> if (num_attr >= MAX_ATTR)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> in function int MPI_Comm_create_keyval(MPI_Copy_function
>>>>>>>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> num_attr is declared static and keeps increasing every
>>>>>>>>>>>>>>>>>> time MPI_Comm_create_keyval is called.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I am using petsc 3.11.3 but found 3.13.2 has the
>>>>>>>>>>>>>>>>>> same logic.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is this a bug or I didn't use it correctly?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Sam
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200626/f15a1df8/attachment-0001.html>


More information about the petsc-users mailing list