[petsc-users] PETSc initialization error

Sam Guo sam.guo at cd-adapco.com
Fri Jun 26 17:53:52 CDT 2020


Hi Junchao,
   If you are talking about this commit of yours
https://gitlab.com/petsc/petsc/-/commit/f0463fa09df52ce43e7c5bf47a1c87df0c9e5cbb

Recycle keyvals and fix bugs in MPI_Comm creation
   I think I got it. It fixes the serial one but parallel one is still
crashing.

Thanks,
Sam

On Fri, Jun 26, 2020 at 3:43 PM Sam Guo <sam.guo at cd-adapco.com> wrote:

> Hi Junchao,
>    I am not ready to upgrade petsc yet(due to the lengthy technical and
> legal approval process of our internal policy). Can you send me the diff
> file so I can apply it to petsc 3.11.3)?
>
> Thanks,
> Sam
>
> On Fri, Jun 26, 2020 at 3:33 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> Sam,
>>   Please discard the origin patch I sent you. A better fix is already in
>> maint/master. An test is at src/sys/tests/ex53.c
>>   I modified that test at the end with
>>
>>   for (i=0; i<500; i++) {
>>     ierr = PetscInitializeNoPointers(argc,argv,NULL,help);if (ierr)
>> return ierr;
>>     ierr = SlepcInitialize(&argc,&argv,NULL,help);if (ierr) return ierr;
>>     ierr = SlepcFinalize();if (ierr) return ierr;
>>     ierr = PetscFinalize();if (ierr) return ierr;
>>   }
>>
>>
>>  then I ran it with multiple mpi ranks and it ran correctly. So try your
>> program with petsc master first. If not work, see if you can come up with a
>> test example for us.
>>
>>  Thanks.
>> --Junchao Zhang
>>
>>
>> On Fri, Jun 26, 2020 at 3:37 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>
>>> One work around for me is to call PetscInitialize once for my entire
>>> program and skip PetscFinalize (since I don't have a good place to call
>>> PetscFinalize   before ending the program).
>>>
>>> On Fri, Jun 26, 2020 at 1:33 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>
>>>> I get the crash after calling Initialize/Finalize multiple times.
>>>> Junchao fixed the bug for serial but parallel still crashes.
>>>>
>>>> On Fri, Jun 26, 2020 at 1:28 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>
>>>>>
>>>>>   Ah, so you get the crash the second time you call
>>>>> PetscInitialize()?  That is a problem because we do intend to support that
>>>>> capability (but you much call PetscFinalize() each time also).
>>>>>
>>>>>   Barry
>>>>>
>>>>>
>>>>> On Jun 26, 2020, at 3:25 PM, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>>
>>>>> Hi Barry,
>>>>>    Thanks for the quick response.
>>>>>    I will call PetscInitialize once and skip the PetscFinalize for now
>>>>> to avoid the crash. The crash is actually in PetscInitialize, not
>>>>> PetscFinalize.
>>>>>
>>>>> Thanks,
>>>>> Sam
>>>>>
>>>>> On Fri, Jun 26, 2020 at 1:21 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>
>>>>>>
>>>>>>   Sam,
>>>>>>
>>>>>>   You can skip PetscFinalize() so long as you only call
>>>>>> PetscInitialize() once. It is not desirable in general to skip the finalize
>>>>>> because PETSc can't free all its data structures and you cannot see the
>>>>>> PETSc logging information with -log_view but in terms of the code running
>>>>>> correctly you do not need to call PetscFinalize.
>>>>>>
>>>>>>    If your code crashes in PetscFinalize() please send the full error
>>>>>> output and we can try to help you debug it.
>>>>>>
>>>>>>
>>>>>>    Barry
>>>>>>
>>>>>> On Jun 26, 2020, at 3:14 PM, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>>>
>>>>>> To clarify, we have a mpi wrapper (so we can switch to different mpi
>>>>>> at runtime). I compile petsc using our mpi wrapper.
>>>>>> If I just call PETSc initialize once without calling finallize, it is
>>>>>> ok. My question to you is that: can I skip finalize?
>>>>>> Our program calls mpi_finalize at end anyway.
>>>>>>
>>>>>> On Fri, Jun 26, 2020 at 1:09 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Junchao,
>>>>>>>    Attached please find the configure.log.
>>>>>>>    I also attach the pinit.c which contains your patch (I am
>>>>>>> currently using 3.11.3. I've applied your patch to 3.11.3). Your patch
>>>>>>> fixes the serial version. The error now is about the parallel.
>>>>>>>    Here is the error log:
>>>>>>>
>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in
>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>> [1]PETSC ERROR: #2 checkError() line 56 in
>>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp
>>>>>>> [1]PETSC ERROR: #3 PetscInitialize() line 966 in
>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>> [1]PETSC ERROR: #4 SlepcInitialize() line 262 in
>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 969 in
>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>> [0]PETSC ERROR: #2 checkError() line 56 in
>>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp
>>>>>>> [0]PETSC ERROR: #3 PetscInitialize() line 966 in
>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>> [0]PETSC ERROR: #4 SlepcInitialize() line 262 in
>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>
>>>>>>> --------------------------------------------------------------------------
>>>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>>>>>>> with errorcode 56.
>>>>>>>
>>>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>>>>>> You may or may not see output from other processes, depending on
>>>>>>> exactly when Open MPI kills them.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sam
>>>>>>>
>>>>>>> On Thu, Jun 25, 2020 at 7:37 PM Junchao Zhang <
>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>
>>>>>>>> Sam,
>>>>>>>>    The MPI_Comm_create_keyval() error was fixed in maint/master.
>>>>>>>> From the error message, it seems you need to configure --with-log=1
>>>>>>>>    Otherwise, please send your full error stack trace and
>>>>>>>> configure.log.
>>>>>>>>   Thanks.
>>>>>>>> --Junchao Zhang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 25, 2020 at 2:18 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Junchao,
>>>>>>>>>    I now encountered the same error with parallel. I am wondering
>>>>>>>>> if there is a need for parallel fix as well.
>>>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in
>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>>
>>>>>>>>> On Sat, Jun 20, 2020 at 7:35 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Junchao,
>>>>>>>>>>    Your patch works.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Sam
>>>>>>>>>>
>>>>>>>>>> On Sat, Jun 20, 2020 at 4:23 PM Junchao Zhang <
>>>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:24 PM Barry Smith <bsmith at petsc.dev>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    Junchao,
>>>>>>>>>>>>
>>>>>>>>>>>>      This is a good bug fix. It solves the problem when PETSc
>>>>>>>>>>>> initialize is called many times.
>>>>>>>>>>>>
>>>>>>>>>>>>      There is another fix you can do to limit PETSc mpiuni
>>>>>>>>>>>> running out of attributes inside a single PETSc run:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> int MPI_Comm_create_keyval(MPI_Copy_function
>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state)
>>>>>>>>>>>> {
>>>>>>>>>>>>
>>>>>>>>>>>>  if (num_attr >= MAX_ATTR){
>>>>>>>>>>>>    for (i=0; i<num_attr; i++) {
>>>>>>>>>>>>      if (!attr_keyval[i].extra_state) {
>>>>>>>>>>>>
>>>>>>>>>>> attr_keyval[i].extra_state is provided by user (could be NULL).
>>>>>>>>>>> We can not rely on it.
>>>>>>>>>>>
>>>>>>>>>>>>         /* reuse this slot */
>>>>>>>>>>>>         attr_keyval[i].extra_state = extra_state;
>>>>>>>>>>>>        attr_keyval[i.]del         = delete_fn;
>>>>>>>>>>>>        *keyval = i;
>>>>>>>>>>>>         return MPI_SUCCESS;
>>>>>>>>>>>>      }
>>>>>>>>>>>>   }
>>>>>>>>>>>>   return MPIUni_Abort(MPI_COMM_WORLD,1);
>>>>>>>>>>>> }
>>>>>>>>>>>>  return MPIUni_Abort(MPI_COMM_WORLD,1);
>>>>>>>>>>>>   attr_keyval[num_attr].extra_state = extra_state;
>>>>>>>>>>>>   attr_keyval[num_attr].del         = delete_fn;
>>>>>>>>>>>>   *keyval                           = num_attr++;
>>>>>>>>>>>>   return MPI_SUCCESS;
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>   This will work if the user creates tons of attributes but is
>>>>>>>>>>>> constantly deleting some as they new ones. So long as the number
>>>>>>>>>>>> outstanding at one time is < MAX_ATTR)
>>>>>>>>>>>>
>>>>>>>>>>>> Barry
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Jun 20, 2020, at 10:54 AM, Junchao Zhang <
>>>>>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I don't understand what your session means. Let's try this patch
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c
>>>>>>>>>>>> index d559a513..c058265d 100644
>>>>>>>>>>>> --- a/src/sys/mpiuni/mpi.c
>>>>>>>>>>>> +++ b/src/sys/mpiuni/mpi.c
>>>>>>>>>>>> @@ -283,6 +283,7 @@ int MPI_Finalize(void)
>>>>>>>>>>>>    MPI_Comm_free(&comm);
>>>>>>>>>>>>    comm = MPI_COMM_SELF;
>>>>>>>>>>>>    MPI_Comm_free(&comm);
>>>>>>>>>>>> +  num_attr = 1; /* reset the counter */
>>>>>>>>>>>>    MPI_was_finalized = 1;
>>>>>>>>>>>>    return MPI_SUCCESS;
>>>>>>>>>>>>  }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Jun 20, 2020 at 10:48 AM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Typo: I mean “Assuming initializer is only needed once for
>>>>>>>>>>>>> entire session”
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Saturday, June 20, 2020, Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Assuming finalizer is only needed once for entire session(?),
>>>>>>>>>>>>>> I can put initializer into the static block to call it once but where do I
>>>>>>>>>>>>>> call finalizer?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Saturday, June 20, 2020, Junchao Zhang <
>>>>>>>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The counter num_attr should be recycled. But first try to
>>>>>>>>>>>>>>> call PETSc initialize/Finalize only once to see it fixes the error.
>>>>>>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:48 AM Sam Guo <
>>>>>>>>>>>>>>> sam.guo at cd-adapco.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To clarify, I call PETSc initialize and PETSc finalize
>>>>>>>>>>>>>>>> everytime I call SLEPc:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   PetscInitializeNoPointers(argc,args,nullptr,nullptr);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   SlepcInitialize(&argc,&args,static_cast<char*>(nullptr),help);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   //calling slepc
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>   SlepcFinalize();
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    PetscFinalize();
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jun 19, 2020 at 10:32 PM Sam Guo <
>>>>>>>>>>>>>>>> sam.guo at cd-adapco.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Dear PETSc team,
>>>>>>>>>>>>>>>>>    When I called SLEPc multiple time, I eventually got
>>>>>>>>>>>>>>>>> following error:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> MPI operation not supported by PETSc's sequential MPI
>>>>>>>>>>>>>>>>> wrappers
>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 967 in
>>>>>>>>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #2 SlepcInitialize() line 262 in
>>>>>>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #3 SlepcInitializeNoPointers() line 359 in
>>>>>>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>>>>>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>>>>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   I debugged: it is because of following in
>>>>>>>>>>>>>>>>> petsc/src/sys/mpiuni/mpi.c
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> if (num_attr >= MAX_ATTR)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> in function int MPI_Comm_create_keyval(MPI_Copy_function
>>>>>>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> num_attr is declared static and keeps increasing every
>>>>>>>>>>>>>>>>> time MPI_Comm_create_keyval is called.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am using petsc 3.11.3 but found 3.13.2 has the
>>>>>>>>>>>>>>>>> same logic.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is this a bug or I didn't use it correctly?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Sam
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200626/2f5c2f64/attachment.html>


More information about the petsc-users mailing list