[petsc-users] Terminating a process running petsc via petsc4py without mpi_abort
Hudson, Stephen Tobias P
shudson at anl.gov
Fri Jun 5 15:39:27 CDT 2020
It seems I do have to bypass Python's multiprocessing somewhat limited interface. E.g.
self.process._popen._send_signal(signal.SIGINT)
which works, but I am by-passing the API.
I would support allowing the user to configure at run-time the signal handling for SIGTERM to exit without MPI_ABORT. I think I understand MPI_ABORT being the default, I've experienced hangs due to errors on single processes.
________________________________
From: Hudson, Stephen Tobias P <shudson at anl.gov>
Sent: Friday, June 5, 2020 2:41 PM
To: Lisandro Dalcin <dalcinl at gmail.com>
Cc: Balay, Satish <balay at mcs.anl.gov>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort
Thanks, I will experiment with this.
I am working through the multiprocessing interface, but I can see that the routines provided there are pretty much wrappers to the process signal functions.
I guess the alternative is SIGKILL.
Steve
________________________________
From: Lisandro Dalcin <dalcinl at gmail.com>
Sent: Thursday, June 4, 2020 4:54 PM
To: Hudson, Stephen Tobias P <shudson at anl.gov>
Cc: Balay, Satish <balay at mcs.anl.gov>; petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort
(1) You can use PETSc.Sys.pushErrorHandler("abort"), but it will not help you. What you really need is to override PETSc's default signal handling
(2) While it is true that PETSc overrides the signal handler, you can override it again from python after from petsc4py import PETSc.
For implementing (2), maybe you should try sending SIGINT and not SIGTERM, such that you can do the following.
from petsc4py import PETSc
import signal
signal.signal(signal.SIGINT, signal.default_int_handler)
...
if __name__ == "__main__":
try:
main()
except KeyboardInterrupt: # Triggered if Ctrl+C or signaled with SIGINT
... # do cleanup if needed
Otherwise, you just need signal.signal(signal.SIGINT, signal.SIG_DFL)
PS: I'm not in favor of changing current PETSc's signal handling behavior.
This particular issue is fixable with two lines of Python code:
from signal import signal, SIGINT, SIG_DFL
signal(SIGINT, SIG_DFL)
On Thu, 4 Jun 2020 at 23:39, Hudson, Stephen Tobias P <shudson at anl.gov<mailto:shudson at anl.gov>> wrote:
Lisandro,
I don't see an interface to set this through petsc4py. Is it possible?
Thanks,
Steve
________________________________
From: Hudson, Stephen Tobias P <shudson at anl.gov<mailto:shudson at anl.gov>>
Sent: Thursday, June 4, 2020 2:47 PM
To: Balay, Satish <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Lisandro Dalcin <dalcinl at gmail.com<mailto:dalcinl at gmail.com>>
Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort
Sounds good. I will have a look at how to set this through petsc4py.
Thanks
Steve
________________________________
From: Satish Balay <balay at mcs.anl.gov<mailto:balay at mcs.anl.gov>>
Sent: Thursday, June 4, 2020 2:32 PM
To: Hudson, Stephen Tobias P <shudson at anl.gov<mailto:shudson at anl.gov>>
Cc: petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>; Lisandro Dalcin <dalcinl at gmail.com<mailto:dalcinl at gmail.com>>
Subject: Re: Terminating a process running petsc via petsc4py without mpi_abort
I don't completely understand the issue here. How is sequential run different than parallel run?
In both cases - a PetscErrorHandler is likely getting invoked. One can change this behavior with:
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscPushErrorHandler.html
And there are a few default error handlers to choose
PETSC_EXTERN PetscErrorCode PetscTraceBackErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscIgnoreErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscEmacsClientErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscMPIAbortErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscAbortErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscAttachDebuggerErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
PETSC_EXTERN PetscErrorCode PetscReturnErrorHandler(MPI_Comm,int,const char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
Some of the are accessible via command line option. for ex: -on_error_abort or -on_error_mpiabort
Or perhaps you want to completely disable error handler with: -no_signal_handler
cc: petsc-users
Satish
On Thu, 4 Jun 2020, Hudson, Stephen Tobias P wrote:
> Satish,
>
> We are having issues caused by MPI_abort getting called when we try to terminate a sub-process running petsc4py. Ideally we would always use a serial build of petsc/petsc4py in this mode, but many users will have a parallel build. We need to be able to send a terminate signal that just kills the process.
>
> Is there a way to turn off the mpi_abort?
>
> Thanks,
>
> Steve
>
>
--
Lisandro Dalcin
============
Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200605/7920c94f/attachment-0001.html>
More information about the petsc-users
mailing list