[petsc-dev] Should PetscSignalHandlerDefault avoid calling MPI_Abort?
Junchao Zhang
junchao.zhang at gmail.com
Wed May 6 10:58:15 CDT 2020
John,
I had an MR at https://gitlab.com/petsc/petsc/-/merge_requests/2745.
Currently, we could not agree on a solution. The concern is if we do
_Exit() instead of MPI_Abort() in signal handler, then some MPI (batch
system) might not be able to kill all MPI processes.
I prefer _Exit(), because it can solve the problem you reported (actually
happened).
--Junchao Zhang
On Wed, May 6, 2020 at 10:22 AM John Peterson <jwpeterson at gmail.com> wrote:
> Hi Junchao,
>
> I was just wondering if there was any update on this? I saw your question
> on the discuss at mpich thread, but I gather you have not received a
> response yet.
>
> --
> John
>
>
> On Tue, Apr 21, 2020 at 10:09 PM Junchao Zhang <junchao.zhang at gmail.com>
> wrote:
>
>> I don't see problems calling _exit in PetscSignalHandlerDefault. Let me
>> try it first.
>> --Junchao Zhang
>>
>>
>> On Tue, Apr 21, 2020 at 3:17 PM John Peterson <jwpeterson at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I started a thread on discuss at mpich.org regarding some hanging canceled
>>> jobs that we were seeing:
>>>
>>> https://lists.mpich.org/pipermail/discuss/2020-April/005910.html
>>>
>>> It turns out that there are some fairly strict rules about what types of
>>> functions (asynchronous-safe only) can be called from signal handlers, and
>>> MPI_Abort(), at least the mpich implementation of it, apparently does not
>>> fall into that category. I wonder if you have any comments on this. One
>>> possibility might be might be to just call "_exit" from
>>> PetscSignalHandlerDefault rather than PETSCABORT, not sure what other
>>> issues that would cause, however.
>>>
>>> Thanks,
>>> John
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200506/443c5a92/attachment.html>
More information about the petsc-dev
mailing list