[petsc-users] Caught signal number 11 SEGV

Satish Balay balay at mcs.anl.gov
Tue Feb 23 15:49:23 CST 2021


This run is with '-n 2' - so -debugger_nodes value should be either 0 or 1

Satish

On Tue, 23 Feb 2021, Francesco Brarda wrote:

> Using the command you suggested I got 
> 
> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm -debugger_nodes 3
> ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as of version 3.14 and will be removed in a future release. Please use the option -debugger_ranks instead. (Silence this warning with -options_suppress_deprecated_warnings)
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
> method = optimize
>   optimize
>     algorithm = lbfgs (Default)
>       lbfgs
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         init_alpha = 0.001 (Default)
>         tol_obj = 9.9999999999999998e-13 (Default)
>         tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)        tol_rel_obj = 10000 (Default)
>         tol_grad = 1e-08 (Default)
>         tol_rel_grad = 10000000 (Default)
> 
>         tol_rel_grad = 10000000 (Default)
>         tol_param = 1e-08 (Default)        tol_param = 1e-08 (Default)
>         history_size = 5 (Default)
>     iter = 2000 (Default)
> 
>         history_size = 5 (Default)
>     iter = 2000 (Default)
>     save_iterations = 0 (Default)
> id = 0 (Default)
> data
>     save_iterations = 0 (Default)
> id = 0 (Default)
> data
>   file =  (Default)
>   file =  (Default)
> init = 2 (Default)
> random
>   seed = 3623621468 (Default)
> output
>   file = output.csv (Default)init = 2 (Default)
> random
>   seed = 3623621468 (Default)
> output
>   file = output.csv (Default)
> 
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
> 
>   diagnostic_file =  (Default)
>   refresh = 100 (Default)
> 
> Initial log joint probability = -195.984
>     Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
>       10      -0.97101    0.00292919       1.65855       0.001       0.001       46  LS failed, Hessian reset 
>       12     -0.483952      0.001316       1.18542       0.001       0.001       77  LS failed, Hessian reset 
>       13     -0.477916     0.0118542      0.163518        0.01       0.001      106  LS failed, Hessian reset 
> [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp
> [1]PETSC ERROR: PETSc Option Table entries:
> [1]PETSC ERROR: -debugger_nodes 3
> [1]PETSC ERROR: -start_in_debugger noxterm
> [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov—————
>  
> And then it does not go further. With the -debugger_ranks suggested, the output is the same. What do you think, please?
> I am using a cluster (one node, dual-socket system with twelve-core-CPUs), but when I do the ssh I do not use the -X flag, if that's what you mean.
> 
> Thank you,
> Francesco
> 
> 
> > Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley <knepley at gmail.com> ha scritto:
> > 
> > On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda <brardafrancesco at gmail.com> wrote:
> > Thank you for the quick response. 
> > Sorry, you are right. Here is the complete output:
> > 
> > fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger
> > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on display :0.0 on machine srvulx13
> > PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on display :0.0 on machine srvulx13
> > xterm: Xt error: Can't open display: :0.0
> > xterm: DISPLAY is not set
> > xterm: Xt error: Can't open display: :0.0
> > xterm: DISPLAY is not set
> > 
> > Do you have an Xserver running? If not, you can use
> > 
> >   -start_in_debugger noxterm -debugger_nodes 3
> > 
> > and try to get a stack trace from one node.
> > 
> >   Thanks,
> > 
> >     Matt
> >  
> > method = optimize
> >   optimize
> >     algorithm = lbfgs (Default)
> >       lbfgs
> > method = optimize
> >   optimize
> >     algorithm = lbfgs (Default)
> >       lbfgs
> >         init_alpha = 0.001 (Default)
> >         tol_obj = 9.9999999999999998e-13 (Default)
> >         tol_rel_obj = 10000 (Default)
> >         tol_grad = 1e-08 (Default)
> >         init_alpha = 0.001 (Default)
> >         tol_obj = 9.9999999999999998e-13 (Default)
> >         tol_rel_obj = 10000 (Default)
> >         tol_grad = 1e-08 (Default)
> >         tol_rel_grad = 10000000 (Default)
> >         tol_param = 1e-08 (Default)
> >         history_size = 5 (Default)
> >         tol_rel_grad = 10000000 (Default)
> >         tol_param = 1e-08 (Default)
> >         history_size = 5 (Default)
> >     iter = 2000 (Default)
> >     iter = 2000 (Default)
> >     save_iterations = 0 (Default)
> > id = 0 (Default)
> > data    save_iterations = 0 (Default)
> > id = 0 (Default)
> > data
> >   file =  (Default)
> > 
> >   file =  (Default)
> > init = 2 (Default)
> > random
> >   seed = 3585768430 (Default)
> > init = 2 (Default)
> > random
> >   seed = 3585768430 (Default)
> > output
> >   file = output.csv (Default)
> > output
> >   file = output.csv (Default)
> >   diagnostic_file =  (Default)
> >   refresh = 100 (Default)
> >   diagnostic_file =  (Default)
> >   refresh = 100 (Default)
> > 
> > 
> > Initial log joint probability = -731.444
> >     Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes 
> > [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp  
> >   To prevent termination, change the error handler using PetscPushErrorHandler()
> > 
> > ===================================================================================
> > =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> > =   PID 47804 RUNNING AT srvulx13
> > =   EXIT CODE: 134
> > =   CLEANING UP REMAINING PROCESSES
> > =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> > ===================================================================================
> > YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> > This typically refers to a problem with your application.
> > Please see the FAQ page for debugging suggestions
> > 
> > 
> > 
> > 
> > 
> > The code inside main.cpp is the following:
> > 
> > #include <cmdstan/command.hpp>
> > #include <stan/services/error_codes.hpp>
> > 
> > #include <petsc.h>
> > 
> > int main(int argc, char* argv[]) {
> > 
> >   PetscErrorCode ierr;
> >   ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr);
> > 
> >   try {
> >     ierr = cmdstan::command(argc, argv);CHKERRQ(ierr);
> >   } catch (const std::exception& e) {
> >     std::cout << e.what() << std::endl;
> >     ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr);
> >   }
> > 
> >   ierr = PetscFinalize();CHKERRQ(ierr);
> >   return ierr;
> > }
> > 
> > I highlighted the line 12. Although I read the page where the command PetscPushErrorHandler is explained and the example provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively use the command.
> > Should I change the entire try/catch with PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ?
> > 
> > Best,
> > Francesco
> > 
> > 
> >> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley <knepley at gmail.com> ha scritto:
> >> 
> >> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda <brardafrancesco at gmail.com> wrote:
> >> Hi!
> >> 
> >> I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion. 
> >> It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel. 
> >> The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling.
> >> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following:
> >> 
> >> For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached,
> >> you run the program (conr) and then when you hit the SEGV you get a stack trace (where).
> >> 
> >>   THanks,
> >> 
> >>     Matt
> >>  
> >> [2]PETSC ERROR: ------------------------------------------------------------------------
> >> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> >> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> >> [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> >> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> >> [2]PETSC ERROR: likely location of problem given in stack below
> >> [2]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> >> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> >> [2]PETSC ERROR:       INSTEAD the line number of the start of the function
> >> [2]PETSC ERROR:       is given.
> >> [3]PETSC ERROR: ------------------------------------------------------------------------
> >> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
> >> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> >> [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> >> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
> >> [3]PETSC ERROR: likely location of problem given in stack below
> >> [3]PETSC ERROR: ---------------------  Stack Frames ------------------------------------
> >> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> >> [3]PETSC ERROR:       INSTEAD the line number of the start of the function
> >> [3]PETSC ERROR:       is given.
> >> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
> >>   To prevent termination, change the error handler using PetscPushErrorHandler()
> >> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in  unknown file (null)
> >>   To prevent termination, change the error handler using PetscPushErrorHandler()
> >> 
> >> ===================================================================================
> >> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> >> =   PID 22939 RUNNING AT srvulx13
> >> =   EXIT CODE: 134
> >> =   CLEANING UP REMAINING PROCESSES
> >> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> >> ===================================================================================
> >> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
> >> This typically refers to a problem with your application.
> >> Please see the FAQ page for debugging suggestions
> >> 
> >> I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem? 
> >> I hope I have been clear enough.
> >> Attached you can find also my configure.log and make.log files.
> >> 
> >> Best,
> >> Francesco
> >> 
> >> 
> >> 
> >> 
> >> 
> >> -- 
> >> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> >> -- Norbert Wiener
> >> 
> >> https://www.cse.buffalo.edu/~knepley/
> > 
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/
> 
> 


More information about the petsc-users mailing list