[petsc-users] PETSc initialization error

Sam Guo sam.guo at cd-adapco.com
Mon Jun 29 13:00:39 CDT 2020


Hi Junchao,
   I'll test the ex53. At the meantime, I use the following work around:
my program call MPI initialize once for entire program
PetscInitialize once for entire program
SlecpInitialize once for entire program (I think I can skip PetscInitialize
above)
calling slepc multiple times
my program call MPI finalize before ending program

   You can see I stkip PetscFinalize/SlepcFinalize. I am uneasy for
skipping them since I am not sure what is the consequence. Can you comment
on it?

Thanks,
Sam



On Fri, Jun 26, 2020 at 6:58 PM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> Did the test included in that commit fail in your environment? You can
> also change the test by adding calls to SlepcInitialize/SlepcFinalize
> between PetscInitializeNoPointers/PetscFinalize as in my previous email.
>
> --Junchao Zhang
>
>
> On Fri, Jun 26, 2020 at 5:54 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>
>> Hi Junchao,
>>    If you are talking about this commit of yours
>> https://gitlab.com/petsc/petsc/-/commit/f0463fa09df52ce43e7c5bf47a1c87df0c9e5cbb
>>
>> Recycle keyvals and fix bugs in MPI_Comm creation
>>    I think I got it. It fixes the serial one but parallel one is still
>> crashing.
>>
>> Thanks,
>> Sam
>>
>> On Fri, Jun 26, 2020 at 3:43 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>
>>> Hi Junchao,
>>>    I am not ready to upgrade petsc yet(due to the lengthy technical and
>>> legal approval process of our internal policy). Can you send me the diff
>>> file so I can apply it to petsc 3.11.3)?
>>>
>>> Thanks,
>>> Sam
>>>
>>> On Fri, Jun 26, 2020 at 3:33 PM Junchao Zhang <junchao.zhang at gmail.com>
>>> wrote:
>>>
>>>> Sam,
>>>>   Please discard the origin patch I sent you. A better fix is already
>>>> in maint/master. An test is at src/sys/tests/ex53.c
>>>>   I modified that test at the end with
>>>>
>>>>   for (i=0; i<500; i++) {
>>>>     ierr = PetscInitializeNoPointers(argc,argv,NULL,help);if (ierr)
>>>> return ierr;
>>>>     ierr = SlepcInitialize(&argc,&argv,NULL,help);if (ierr) return ierr;
>>>>     ierr = SlepcFinalize();if (ierr) return ierr;
>>>>     ierr = PetscFinalize();if (ierr) return ierr;
>>>>   }
>>>>
>>>>
>>>>  then I ran it with multiple mpi ranks and it ran correctly. So try
>>>> your program with petsc master first. If not work, see if you can come up
>>>> with a test example for us.
>>>>
>>>>  Thanks.
>>>> --Junchao Zhang
>>>>
>>>>
>>>> On Fri, Jun 26, 2020 at 3:37 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>
>>>>> One work around for me is to call PetscInitialize once for my entire
>>>>> program and skip PetscFinalize (since I don't have a good place to call
>>>>> PetscFinalize   before ending the program).
>>>>>
>>>>> On Fri, Jun 26, 2020 at 1:33 PM Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>>
>>>>>> I get the crash after calling Initialize/Finalize multiple times.
>>>>>> Junchao fixed the bug for serial but parallel still crashes.
>>>>>>
>>>>>> On Fri, Jun 26, 2020 at 1:28 PM Barry Smith <bsmith at petsc.dev> wrote:
>>>>>>
>>>>>>>
>>>>>>>   Ah, so you get the crash the second time you call
>>>>>>> PetscInitialize()?  That is a problem because we do intend to support that
>>>>>>> capability (but you much call PetscFinalize() each time also).
>>>>>>>
>>>>>>>   Barry
>>>>>>>
>>>>>>>
>>>>>>> On Jun 26, 2020, at 3:25 PM, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>>>>
>>>>>>> Hi Barry,
>>>>>>>    Thanks for the quick response.
>>>>>>>    I will call PetscInitialize once and skip the PetscFinalize for
>>>>>>> now to avoid the crash. The crash is actually in PetscInitialize, not
>>>>>>> PetscFinalize.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Sam
>>>>>>>
>>>>>>> On Fri, Jun 26, 2020 at 1:21 PM Barry Smith <bsmith at petsc.dev>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>   Sam,
>>>>>>>>
>>>>>>>>   You can skip PetscFinalize() so long as you only call
>>>>>>>> PetscInitialize() once. It is not desirable in general to skip the finalize
>>>>>>>> because PETSc can't free all its data structures and you cannot see the
>>>>>>>> PETSc logging information with -log_view but in terms of the code running
>>>>>>>> correctly you do not need to call PetscFinalize.
>>>>>>>>
>>>>>>>>    If your code crashes in PetscFinalize() please send the full
>>>>>>>> error output and we can try to help you debug it.
>>>>>>>>
>>>>>>>>
>>>>>>>>    Barry
>>>>>>>>
>>>>>>>> On Jun 26, 2020, at 3:14 PM, Sam Guo <sam.guo at cd-adapco.com> wrote:
>>>>>>>>
>>>>>>>> To clarify, we have a mpi wrapper (so we can switch to different
>>>>>>>> mpi at runtime). I compile petsc using our mpi wrapper.
>>>>>>>> If I just call PETSc initialize once without calling finallize, it
>>>>>>>> is ok. My question to you is that: can I skip finalize?
>>>>>>>> Our program calls mpi_finalize at end anyway.
>>>>>>>>
>>>>>>>> On Fri, Jun 26, 2020 at 1:09 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Junchao,
>>>>>>>>>    Attached please find the configure.log.
>>>>>>>>>    I also attach the pinit.c which contains your patch (I am
>>>>>>>>> currently using 3.11.3. I've applied your patch to 3.11.3). Your patch
>>>>>>>>> fixes the serial version. The error now is about the parallel.
>>>>>>>>>    Here is the error log:
>>>>>>>>>
>>>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in
>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>>> [1]PETSC ERROR: #2 checkError() line 56 in
>>>>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp
>>>>>>>>> [1]PETSC ERROR: #3 PetscInitialize() line 966 in
>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>>> [1]PETSC ERROR: #4 SlepcInitialize() line 262 in
>>>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 969 in
>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>>> [0]PETSC ERROR: #2 checkError() line 56 in
>>>>>>>>> ../../../physics/src/eigensolver/SLEPc.cpp
>>>>>>>>> [0]PETSC ERROR: #3 PetscInitialize() line 966 in
>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>>> [0]PETSC ERROR: #4 SlepcInitialize() line 262 in
>>>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>>>>>>>>> with errorcode 56.
>>>>>>>>>
>>>>>>>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>>>>>>>> You may or may not see output from other processes, depending on
>>>>>>>>> exactly when Open MPI kills them.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Sam
>>>>>>>>>
>>>>>>>>> On Thu, Jun 25, 2020 at 7:37 PM Junchao Zhang <
>>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Sam,
>>>>>>>>>>    The MPI_Comm_create_keyval() error was fixed in maint/master.
>>>>>>>>>> From the error message, it seems you need to configure --with-log=1
>>>>>>>>>>    Otherwise, please send your full error stack trace and
>>>>>>>>>> configure.log.
>>>>>>>>>>   Thanks.
>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Jun 25, 2020 at 2:18 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Junchao,
>>>>>>>>>>>    I now encountered the same error with parallel. I am
>>>>>>>>>>> wondering if there is a need for parallel fix as well.
>>>>>>>>>>> [1]PETSC ERROR: #1 PetscInitialize() line 969 in
>>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Jun 20, 2020 at 7:35 PM Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Junchao,
>>>>>>>>>>>>    Your patch works.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Sam
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Jun 20, 2020 at 4:23 PM Junchao Zhang <
>>>>>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:24 PM Barry Smith <bsmith at petsc.dev>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    Junchao,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      This is a good bug fix. It solves the problem when PETSc
>>>>>>>>>>>>>> initialize is called many times.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      There is another fix you can do to limit PETSc mpiuni
>>>>>>>>>>>>>> running out of attributes inside a single PETSc run:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> int MPI_Comm_create_keyval(MPI_Copy_function
>>>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  if (num_attr >= MAX_ATTR){
>>>>>>>>>>>>>>    for (i=0; i<num_attr; i++) {
>>>>>>>>>>>>>>      if (!attr_keyval[i].extra_state) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>> attr_keyval[i].extra_state is provided by user (could be
>>>>>>>>>>>>> NULL). We can not rely on it.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>         /* reuse this slot */
>>>>>>>>>>>>>>         attr_keyval[i].extra_state = extra_state;
>>>>>>>>>>>>>>        attr_keyval[i.]del         = delete_fn;
>>>>>>>>>>>>>>        *keyval = i;
>>>>>>>>>>>>>>         return MPI_SUCCESS;
>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>   return MPIUni_Abort(MPI_COMM_WORLD,1);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>  return MPIUni_Abort(MPI_COMM_WORLD,1);
>>>>>>>>>>>>>>   attr_keyval[num_attr].extra_state = extra_state;
>>>>>>>>>>>>>>   attr_keyval[num_attr].del         = delete_fn;
>>>>>>>>>>>>>>   *keyval                           = num_attr++;
>>>>>>>>>>>>>>   return MPI_SUCCESS;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   This will work if the user creates tons of attributes but
>>>>>>>>>>>>>> is constantly deleting some as they new ones. So long as the number
>>>>>>>>>>>>>> outstanding at one time is < MAX_ATTR)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Barry
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Jun 20, 2020, at 10:54 AM, Junchao Zhang <
>>>>>>>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't understand what your session means. Let's try this
>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/src/sys/mpiuni/mpi.c b/src/sys/mpiuni/mpi.c
>>>>>>>>>>>>>> index d559a513..c058265d 100644
>>>>>>>>>>>>>> --- a/src/sys/mpiuni/mpi.c
>>>>>>>>>>>>>> +++ b/src/sys/mpiuni/mpi.c
>>>>>>>>>>>>>> @@ -283,6 +283,7 @@ int MPI_Finalize(void)
>>>>>>>>>>>>>>    MPI_Comm_free(&comm);
>>>>>>>>>>>>>>    comm = MPI_COMM_SELF;
>>>>>>>>>>>>>>    MPI_Comm_free(&comm);
>>>>>>>>>>>>>> +  num_attr = 1; /* reset the counter */
>>>>>>>>>>>>>>    MPI_was_finalized = 1;
>>>>>>>>>>>>>>    return MPI_SUCCESS;
>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 10:48 AM Sam Guo <
>>>>>>>>>>>>>> sam.guo at cd-adapco.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Typo: I mean “Assuming initializer is only needed once for
>>>>>>>>>>>>>>> entire session”
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Saturday, June 20, 2020, Sam Guo <sam.guo at cd-adapco.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Assuming finalizer is only needed once for entire
>>>>>>>>>>>>>>>> session(?), I can put initializer into the static block to call it once but
>>>>>>>>>>>>>>>> where do I call finalizer?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Saturday, June 20, 2020, Junchao Zhang <
>>>>>>>>>>>>>>>> junchao.zhang at gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The counter num_attr should be recycled. But first try to
>>>>>>>>>>>>>>>>> call PETSc initialize/Finalize only once to see it fixes the error.
>>>>>>>>>>>>>>>>> --Junchao Zhang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Jun 20, 2020 at 12:48 AM Sam Guo <
>>>>>>>>>>>>>>>>> sam.guo at cd-adapco.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> To clarify, I call PETSc initialize and PETSc finalize
>>>>>>>>>>>>>>>>>> everytime I call SLEPc:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   PetscInitializeNoPointers(argc,args,nullptr,nullptr);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   SlepcInitialize(&argc,&args,static_cast<char*>(nullptr),help);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   //calling slepc
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   SlepcFinalize();
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    PetscFinalize();
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Jun 19, 2020 at 10:32 PM Sam Guo <
>>>>>>>>>>>>>>>>>> sam.guo at cd-adapco.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Dear PETSc team,
>>>>>>>>>>>>>>>>>>>    When I called SLEPc multiple time, I eventually got
>>>>>>>>>>>>>>>>>>> following error:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> MPI operation not supported by PETSc's sequential MPI
>>>>>>>>>>>>>>>>>>> wrappers
>>>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #1 PetscInitialize() line 967 in
>>>>>>>>>>>>>>>>>>> ../../../petsc/src/sys/objects/pinit.c
>>>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #2 SlepcInitialize() line 262 in
>>>>>>>>>>>>>>>>>>> ../../../slepc/src/sys/slepcinit.c
>>>>>>>>>>>>>>>>>>> [0]PETSC ERROR: #3 SlepcInitializeNoPointers() line 359
>>>>>>>>>>>>>>>>>>> in ../../../slepc/src/sys/slepcinit.c
>>>>>>>>>>>>>>>>>>> PETSC ERROR: Logging has not been enabled.
>>>>>>>>>>>>>>>>>>> You might have forgotten to call PetscInitialize().
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   I debugged: it is because of following in
>>>>>>>>>>>>>>>>>>> petsc/src/sys/mpiuni/mpi.c
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> if (num_attr >= MAX_ATTR)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> in function int MPI_Comm_create_keyval(MPI_Copy_function
>>>>>>>>>>>>>>>>>>> *copy_fn,MPI_Delete_function *delete_fn,int *keyval,void *extra_state)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> num_attr is declared static and keeps increasing every
>>>>>>>>>>>>>>>>>>> time MPI_Comm_create_keyval is called.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I am using petsc 3.11.3 but found 3.13.2 has the
>>>>>>>>>>>>>>>>>>> same logic.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is this a bug or I didn't use it correctly?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Sam
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200629/3edd7868/attachment-0001.html>


More information about the petsc-users mailing list