[petsc-users] Terminating a process running petsc via petsc4py without mpi_abort

Junchao Zhang junchao.zhang at gmail.com
Fri Jun 5 19:26:34 CDT 2020


On Fri, Jun 5, 2020 at 3:39 PM Hudson, Stephen Tobias P via petsc-users <
petsc-users at mcs.anl.gov> wrote:

> It seems I do have to bypass Python's multiprocessing somewhat limited
> interface. E.g.
>
> self.process._popen._send_signal(signal.SIGINT)
>
> which works, but I am by-passing the API.
>
> I would support allowing the user to configure at run-time the signal
> handling for SIGTERM to exit without MPI_ABORT. I think I understand
> MPI_ABORT being the default, I've experienced hangs due to errors on single
> processes.
>
“hangs due to errors on single processes". If the single processes call
exit(), then there will be no hang.



> ------------------------------
> *From:* Hudson, Stephen Tobias P <shudson at anl.gov>
> *Sent:* Friday, June 5, 2020 2:41 PM
> *To:* Lisandro Dalcin <dalcinl at gmail.com>
> *Cc:* Balay, Satish <balay at mcs.anl.gov>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: Terminating a process running petsc via petsc4py without
> mpi_abort
>
> Thanks, I will experiment with this.
>
> I am working through the multiprocessing interface, but I can see that the
> routines provided there are pretty much wrappers to the process signal
> functions.
>
> I guess the alternative is SIGKILL.
>
> Steve
> ------------------------------
> *From:* Lisandro Dalcin <dalcinl at gmail.com>
> *Sent:* Thursday, June 4, 2020 4:54 PM
> *To:* Hudson, Stephen Tobias P <shudson at anl.gov>
> *Cc:* Balay, Satish <balay at mcs.anl.gov>; petsc-users at mcs.anl.gov <
> petsc-users at mcs.anl.gov>
> *Subject:* Re: Terminating a process running petsc via petsc4py without
> mpi_abort
>
> (1) You can use PETSc.Sys.pushErrorHandler("abort"), but it will not help
> you. What you really need is to override PETSc's default signal handling
>
> (2) While it is true that PETSc overrides the signal handler, you can
> override it again from python after from petsc4py import PETSc.
>
> For implementing (2), maybe you should try sending SIGINT and not SIGTERM,
> such that you can do the following.
>
> from petsc4py import PETSc
>
> import signal
> signal.signal(signal.SIGINT, signal.default_int_handler)
>
> ...
>
> if __name__ == "__main__":
>     try:
>         main()
>     except KeyboardInterrupt: # Triggered if Ctrl+C or signaled with
> SIGINT
>         ... # do cleanup if needed
>
> Otherwise, you just need  signal.signal(signal.SIGINT, signal.SIG_DFL)
>
>
> PS: I'm not in favor of changing current PETSc's signal handling behavior.
> This particular issue is fixable with two lines of Python code:
>
> from signal import signal, SIGINT, SIG_DFL
> signal(SIGINT, SIG_DFL)
>
>
>
> On Thu, 4 Jun 2020 at 23:39, Hudson, Stephen Tobias P <shudson at anl.gov>
> wrote:
>
> Lisandro,
>
> I don't see an interface to set this through petsc4py. Is it possible?
>
> Thanks,
> Steve
> ------------------------------
> *From:* Hudson, Stephen Tobias P <shudson at anl.gov>
> *Sent:* Thursday, June 4, 2020 2:47 PM
> *To:* Balay, Satish <balay at mcs.anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Lisandro Dalcin <
> dalcinl at gmail.com>
> *Subject:* Re: Terminating a process running petsc via petsc4py without
> mpi_abort
>
> Sounds good. I will have a look at how to set this through petsc4py.
>
> Thanks
> Steve
> ------------------------------
> *From:* Satish Balay <balay at mcs.anl.gov>
> *Sent:* Thursday, June 4, 2020 2:32 PM
> *To:* Hudson, Stephen Tobias P <shudson at anl.gov>
> *Cc:* petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>; Lisandro Dalcin <
> dalcinl at gmail.com>
> *Subject:* Re: Terminating a process running petsc via petsc4py without
> mpi_abort
>
> I don't completely understand the issue here. How is sequential run
> different than parallel run?
>
> In both cases - a PetscErrorHandler is likely getting invoked. One can
> change this behavior with:
>
>
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscPushErrorHandler.html
>
> And there are a few default error handlers to choose
>
>
> PETSC_EXTERN PetscErrorCode PetscTraceBackErrorHandler(MPI_Comm,int,const
> char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
> PETSC_EXTERN PetscErrorCode PetscIgnoreErrorHandler(MPI_Comm,int,const
> char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
> PETSC_EXTERN PetscErrorCode
> PetscEmacsClientErrorHandler(MPI_Comm,int,const char*,const
> char*,PetscErrorCode,PetscErrorType,const char*,void*);
> PETSC_EXTERN PetscErrorCode PetscMPIAbortErrorHandler(MPI_Comm,int,const
> char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
> PETSC_EXTERN PetscErrorCode PetscAbortErrorHandler(MPI_Comm,int,const
> char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
> PETSC_EXTERN PetscErrorCode
> PetscAttachDebuggerErrorHandler(MPI_Comm,int,const char*,const
> char*,PetscErrorCode,PetscErrorType,const char*,void*);
> PETSC_EXTERN PetscErrorCode PetscReturnErrorHandler(MPI_Comm,int,const
> char*,const char*,PetscErrorCode,PetscErrorType,const char*,void*);
>
> Some of the are accessible via command line option. for ex:
> -on_error_abort or -on_error_mpiabort
>
> Or perhaps you want to completely disable error handler with:
> -no_signal_handler
>
> cc: petsc-users
>
> Satish
>
> On Thu, 4 Jun 2020, Hudson, Stephen Tobias P wrote:
>
> > Satish,
> >
> > We are having issues caused by MPI_abort getting called when we try to
> terminate a sub-process running petsc4py. Ideally we would always use a
> serial build of petsc/petsc4py in this mode, but many users will have a
> parallel build. We need to be able to send a terminate signal that just
> kills the process.
> >
> > Is there a way to turn off the mpi_abort?
> >
> > Thanks,
> >
> > Steve
> >
> >
>
>
>
> --
> Lisandro Dalcin
> ============
> Research Scientist
> Extreme Computing Research Center (ECRC)
> King Abdullah University of Science and Technology (KAUST)
> http://ecrc.kaust.edu.sa/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200605/f43b7fd7/attachment-0001.html>


More information about the petsc-users mailing list