[petsc-users] Terminating a process running petsc via petsc4py without mpi_abort

Hudson, Stephen Tobias P shudson at anl.gov
Mon Jun 8 09:39:48 CDT 2020


Ok, having looked at this a bit more, I'm inclined to support Junchao's approach, but there seems to be concern that, even if the standards support it, there could be issues
in some scenarios.

I don't have enough information to dispute this. But if this was put in many years ago, I'm interested in what the other MPI libraries do now - eg. does trilinos etc use MPI_ABORT in the signal handler. If not, do users report issues hanging on terminate?
________________________________
From: Junchao Zhang <junchao.zhang at gmail.com>
Sent: Friday, June 5, 2020 7:26 PM
To: Hudson, Stephen Tobias P <shudson at anl.gov>
Cc: Lisandro Dalcin <dalcinl at gmail.com>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: [petsc-users] Terminating a process running petsc via petsc4py without mpi_abort



On Fri, Jun 5, 2020 at 3:39 PM Hudson, Stephen Tobias P via petsc-users <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>> wrote:
It seems I do have to bypass Python's multiprocessing somewhat limited interface. E.g.

self.process._popen._send_signal(signal.SIGINT)

which works, but I am by-passing the API.

I would support allowing the user to configure at run-time the signal handling for SIGTERM to exit without MPI_ABORT. I think I understand MPI_ABORT being the default, I've experienced hangs due to errors on single processes.
“hangs due to errors on single processes". If the single processes call exit(), then there will be no hang.


________________________________
From: Hudson, Stephen Tobias P <shudson at anl.gov<mailto:shudson at anl.gov>>
Sent: Friday, June 5, 2020 2:41 PM
To: Lisandro Dalcin <dalcinl at gmail.com<mailto:dalcinl at gmail.com>>
Cc: Balay, Satish <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort

Thanks, I will experiment with this.

I am working through the multiprocessing interface, but I can see that the routines provided there are pretty much wrappers to the process signal functions.

I guess the alternative is SIGKILL.

Steve
________________________________
From: Lisandro Dalcin <dalcinl at gmail.com<mailto:dalcinl at gmail.com>>
Sent: Thursday, June 4, 2020 4:54 PM
To: Hudson, Stephen Tobias P <shudson at anl.gov<mailto:shudson at anl.gov>>
Cc: Balay, Satish <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>>; petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort

(1) You can use PETSc.Sys.pushErrorHandler("abort"), but it will not help you. What you really need is to override PETSc's default signal handling

(2) While it is true that PETSc overrides the signal handler, you can override it again from python after from petsc4py import PETSc.

For implementing (2), maybe you should try sending SIGINT and not SIGTERM, such that you can do the following.

from petsc4py import PETSc

import signal
signal.signal(signal.SIGINT, signal.default_int_handler)

...

if __name__ == "__main__":
    try:
        main()
    except KeyboardInterrupt: # Triggered if Ctrl+C or signaled with SIGINT
        ... # do cleanup if needed

Otherwise, you just need  signal.signal(signal.SIGINT, signal.SIG_DFL)


PS: I'm not in favor of changing current PETSc's signal handling behavior.
This particular issue is fixable with two lines of Python code:

from signal import signal, SIGINT, SIG_DFL
signal(SIGINT, SIG_DFL)



On Thu, 4 Jun 2020 at 23:39, Hudson, Stephen Tobias P <shudson at anl.gov<mailto:shudson at anl.gov>> wrote:
Lisandro,

I don't see an interface to set this through petsc4py. Is it possible?

Thanks,
Steve
________________________________
From: Hudson, Stephen Tobias P <shudson at anl.gov<mailto:shudson at anl.gov>>
Sent: Thursday, June 4, 2020 2:47 PM
To: Balay, Satish <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Lisandro Dalcin <dalcinl at gmail.com<mailto:dalcinl at gmail.com>>
Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort

Sounds good. I will have a look at how to set this through petsc4py.

Thanks
Steve
________________________________
From: Satish Balay <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>>
Sent: Thursday, June 4, 2020 2:32 PM
To: Hudson, Stephen Tobias P <shudson at anl.gov<mailto:shudson at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Lisandro Dalcin <dalcinl at gmail.com<mailto:dalcinl at gmail.com>>
Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort

I don't completely understand the issue here. How is sequential run different than parallel run?

In both cases - a PetscErrorHandler is likely getting invoked. One can change this behavior with:

https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscPushErrorHandler.html

And there are a few default error handlers to choose


PETSC_EXTERN PetscErrorCode PetscTraceBackErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscIgnoreErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscEmacsClientErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscMPIAbortErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscAbortErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscAttachDebuggerErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscReturnErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);

Some of the are accessible via command line option. for ex: -on_error_abort or -on_error_mpiabort

Or perhaps you want to completely disable error handler with: -no_signal_handler

cc: petsc-users

Satish

On Thu, 4 Jun 2020, Hudson, Stephen Tobias P wrote:

> Satish,
>
> We are having issues caused by MPI_abort getting called when we try to terminate a sub-process running petsc4py. Ideally we would always use a serial build of petsc/petsc4py in this mode, but many users will have a parallel build. We need to be able to send a terminate signal that just kills the process.
>
> Is there a way to turn off the mpi_abort?
>
> Thanks,
>
> Steve
>
>



--
Lisandro Dalcin
============
Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200608/3e8c33d3/attachment.html>


More information about the petsc-users mailing list