[petsc-dev] Signal handling
Junchao Zhang
jczhang at mcs.anl.gov
Mon Mar 9 11:21:59 CDT 2020
Hi, Lisandro,
It is very cool to see you can make petsc dance with slurm. From you
pseudo example, my comments are:
* Do we need a type PetscSigSet instead of explicit int?
* Why do PetscSignalBegin/End() have different argument types? Many petsc
XxxBegin/End() routines have the same arguments. It is easier to remember
for users.
* Why do you need PetscSigMask in public header? Can user do
PetscSignalClear(PETSC_SIGUSR1) instead of
PetscSignalClear(PetscSigMask(PETSC_SIGUSR1))?
I like fewer and simpler public APIs. Just my two cents.
Thanks.
--Junchao Zhang
On Thu, Mar 5, 2020 at 4:00 PM Lisandro Dalcin <dalcinl at gmail.com> wrote:
> I've implemented some lightweight signal handling facilities. See the
> attached header and implementation files for a taste of the current API,
> and the pseudo-example code showing how to use it, briefly described below:
>
> Right now I'm using it to interact with the job scheduler during
> (explicit) timestepping. I have being/end signal handling calls around
> TSSolve(). A PostStep() routine catches signals and handles them this way:
>
> * If SIGINT or SIGTERM, I dump a restart file and set converged reason to
> USER to stop.
> * If SIGUSR1, I dump a restart file and continue timestepping.
> * if SIGUSR2, I dump a VTK file and continue timestepping.
>
> I can send signals to the job with `scancel -s SIG<NAME>`. When the job
> time allocation is about to expire, SLURM fist sends SIGTERM and waits some
> time before SIGKILL. That time is enough to get a restart file from the
> last step, stop timestepping and finalize gracefully.
>
> I'm not 100% happy with the API, maybe I should make it easier to use. For
> example, I could define each PETSC_SIGXXX so that I do need the
> macro PetscSigMask(). That would complicate a bit the mapping signal enum
> -> name string, though. I could also implement PetscSignalRaise(), it may
> be useful, but I'm not sure.
>
> Do you think this may be of some value for core PETSc? I'm asking before
> submitting a MR because that would require writing some docs, I don't want
> to do the doc work before knowing your opinion first :-).
>
> Regards,
>
> --
> Lisandro Dalcin
> ============
> Research Scientist
> Extreme Computing Research Center (ECRC)
> King Abdullah University of Science and Technology (KAUST)
> http://ecrc.kaust.edu.sa/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20200309/f40ac059/attachment.html>
More information about the petsc-dev
mailing list