[petsc-dev] Should PetscSignalHandlerDefault avoid calling MPI_Abort?

John Peterson jwpeterson at gmail.com
Wed May 6 11:01:01 CDT 2020


Hi Junchao,

Thanks for pointing me to the MR, I will follow the discussion there from
now on.

--
John

On Wed, May 6, 2020 at 10:58 AM Junchao Zhang <junchao.zhang at gmail.com>
wrote:

> John,
>   I had an MR at https://gitlab.com/petsc/petsc/-/merge_requests/2745.
> Currently, we could not agree on a solution. The concern is if we do
> _Exit() instead of MPI_Abort() in signal handler, then some MPI (batch
> system) might not be able to kill all MPI processes.
>   I prefer _Exit(), because it can solve the problem you reported
> (actually happened).
>
> --Junchao Zhang
>
>
> On Wed, May 6, 2020 at 10:22 AM John Peterson <jwpeterson at gmail.com>
> wrote:
>
>> Hi Junchao,
>>
>> I was just wondering if there was any update on this? I saw your question
>> on the discuss at mpich thread, but I gather you have not received a
>> response yet.
>>
>> --
>> John
>>
>>
>> On Tue, Apr 21, 2020 at 10:09 PM Junchao Zhang <junchao.zhang at gmail.com>
>> wrote:
>>
>>>   I don't see problems calling _exit in PetscSignalHandlerDefault. Let
>>> me try it first.
>>> --Junchao Zhang
>>>
>>>
>>> On Tue, Apr 21, 2020 at 3:17 PM John Peterson <jwpeterson at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I started a thread on discuss at mpich.org regarding some hanging
>>>> canceled jobs that we were seeing:
>>>>
>>>> https://lists.mpich.org/pipermail/discuss/2020-April/005910.html
>>>>
>>>> It turns out that there are some fairly strict rules about what types
>>>> of functions (asynchronous-safe only) can be called from signal handlers,
>>>> and MPI_Abort(), at least the mpich implementation of it, apparently does
>>>> not fall into that category. I wonder if you have any comments on this. One
>>>> possibility might be might be to just call "_exit" from
>>>> PetscSignalHandlerDefault rather than PETSCABORT, not sure what other
>>>> issues that would cause, however.
>>>>
>>>> Thanks,
>>>> John
>>>>
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200506/4cd0fafe/attachment.html>


More information about the petsc-dev mailing list