[petsc-users] Caught signal number 11 SEGV
Barry Smith
bsmith at petsc.dev
Wed Feb 24 00:14:04 CST 2021
start_in_debugger noxterm -debugger_nodes 3
Use -start_in_debugger noxterm -debugger_nodes 0
when not opening windows for each debugger it is best to have the first rank associated with the tty as the debugger node
> On Feb 23, 2021, at 3:46 PM, Francesco Brarda <brardafrancesco at gmail.com> wrote:
>
> Using the command you suggested I got
>
> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger noxterm -debugger_nodes 3
> ** PETSc DEPRECATION WARNING ** : the option -debugger_nodes is deprecated as of version 3.14 and will be removed in a future release. Please use the option -debugger_ranks instead. (Silence this warning with -options_suppress_deprecated_warnings)
> method = optimize
> optimize
> algorithm = lbfgs (Default)
> lbfgs
> method = optimize
> optimize
> algorithm = lbfgs (Default)
> lbfgs
> init_alpha = 0.001 (Default)
> tol_obj = 9.9999999999999998e-13 (Default)
> init_alpha = 0.001 (Default)
> tol_obj = 9.9999999999999998e-13 (Default)
> tol_rel_obj = 10000 (Default)
> tol_grad = 1e-08 (Default) tol_rel_obj = 10000 (Default)
> tol_grad = 1e-08 (Default)
> tol_rel_grad = 10000000 (Default)
>
> tol_rel_grad = 10000000 (Default)
> tol_param = 1e-08 (Default) tol_param = 1e-08 (Default)
> history_size = 5 (Default)
> iter = 2000 (Default)
>
> history_size = 5 (Default)
> iter = 2000 (Default)
> save_iterations = 0 (Default)
> id = 0 (Default)
> data
> save_iterations = 0 (Default)
> id = 0 (Default)
> data
> file = (Default)
> file = (Default)
> init = 2 (Default)
> random
> seed = 3623621468 (Default)
> output
> file = output.csv (Default)init = 2 (Default)
> random
> seed = 3623621468 (Default)
> output
> file = output.csv (Default)
>
> diagnostic_file = (Default)
> refresh = 100 (Default)
>
> diagnostic_file = (Default)
> refresh = 100 (Default)
>
> Initial log joint probability = -195.984
> Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
> 10 -0.97101 0.00292919 1.65855 0.001 0.001 46 LS failed, Hessian reset
> 12 -0.483952 0.001316 1.18542 0.001 0.001 77 LS failed, Hessian reset
> 13 -0.477916 0.0118542 0.163518 0.01 0.001 106 LS failed, Hessian reset
> [1]PETSC ERROR: #1 main() line 12 in src/cmdstan/main.cpp
> [1]PETSC ERROR: PETSc Option Table entries:
> [1]PETSC ERROR: -debugger_nodes 3
> [1]PETSC ERROR: -start_in_debugger noxterm
> [1]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov>—————
>
> And then it does not go further. With the -debugger_ranks suggested, the output is the same. What do you think, please?
> I am using a cluster (one node, dual-socket system with twelve-core-CPUs), but when I do the ssh I do not use the -X flag, if that's what you mean.
>
> Thank you,
> Francesco
>
>
>> Il giorno 23 feb 2021, alle ore 21:59, Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>> ha scritto:
>>
>> On Tue, Feb 23, 2021 at 3:55 PM Francesco Brarda <brardafrancesco at gmail.com <mailto:brardafrancesco at gmail.com>> wrote:
>> Thank you for the quick response.
>> Sorry, you are right. Here is the complete output:
>>
>> fbrarda at srvulx13:~/cmdstan-petsc$ $PETSC_DIR/$PETSC_ARCH/bin/mpirun -n 2 examples/rosenbrock/rosenbrock optimize -start_in_debugger
>> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47803 on display :0.0 on machine srvulx13
>> PETSC: Attaching gdb to examples/rosenbrock/rosenbrock of pid 47804 on display :0.0 on machine srvulx13
>> xterm: Xt error: Can't open display: :0.0
>> xterm: DISPLAY is not set
>> xterm: Xt error: Can't open display: :0.0
>> xterm: DISPLAY is not set
>>
>> Do you have an Xserver running? If not, you can use
>>
>> -start_in_debugger noxterm -debugger_nodes 3
>>
>> and try to get a stack trace from one node.
>>
>> Thanks,
>>
>> Matt
>>
>> method = optimize
>> optimize
>> algorithm = lbfgs (Default)
>> lbfgs
>> method = optimize
>> optimize
>> algorithm = lbfgs (Default)
>> lbfgs
>> init_alpha = 0.001 (Default)
>> tol_obj = 9.9999999999999998e-13 (Default)
>> tol_rel_obj = 10000 (Default)
>> tol_grad = 1e-08 (Default)
>> init_alpha = 0.001 (Default)
>> tol_obj = 9.9999999999999998e-13 (Default)
>> tol_rel_obj = 10000 (Default)
>> tol_grad = 1e-08 (Default)
>> tol_rel_grad = 10000000 (Default)
>> tol_param = 1e-08 (Default)
>> history_size = 5 (Default)
>> tol_rel_grad = 10000000 (Default)
>> tol_param = 1e-08 (Default)
>> history_size = 5 (Default)
>> iter = 2000 (Default)
>> iter = 2000 (Default)
>> save_iterations = 0 (Default)
>> id = 0 (Default)
>> data save_iterations = 0 (Default)
>> id = 0 (Default)
>> data
>> file = (Default)
>>
>> file = (Default)
>> init = 2 (Default)
>> random
>> seed = 3585768430 (Default)
>> init = 2 (Default)
>> random
>> seed = 3585768430 (Default)
>> output
>> file = output.csv (Default)
>> output
>> file = output.csv (Default)
>> diagnostic_file = (Default)
>> refresh = 100 (Default)
>> diagnostic_file = (Default)
>> refresh = 100 (Default)
>>
>>
>> Initial log joint probability = -731.444
>> Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes
>> [1]PETSC ERROR: PetscAbortErrorHandler: main() line 12 in src/cmdstan/main.cpp
>> To prevent termination, change the error handler using PetscPushErrorHandler()
>>
>> ===================================================================================
>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> = PID 47804 RUNNING AT srvulx13
>> = EXIT CODE: 134
>> = CLEANING UP REMAINING PROCESSES
>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
>> This typically refers to a problem with your application.
>> Please see the FAQ page for debugging suggestions
>>
>>
>>
>>
>>
>> The code inside main.cpp is the following:
>>
>> #include <cmdstan/command.hpp>
>> #include <stan/services/error_codes.hpp>
>>
>> #include <petsc.h>
>>
>> int main(int argc, char* argv[]) {
>>
>> PetscErrorCode ierr;
>> ierr = PetscInitialize(&argc, &argv, 0, 0);CHKERRQ(ierr);
>>
>> try {
>> ierr = cmdstan::command(argc, argv);CHKERRQ(ierr);
>> } catch (const std::exception& e) {
>> std::cout << e.what() << std::endl;
>> ierr = stan::services::error_codes::SOFTWARE;CHKERRQ(ierr);
>> }
>>
>> ierr = PetscFinalize();CHKERRQ(ierr);
>> return ierr;
>> }
>>
>> I highlighted the line 12. Although I read the page where the command PetscPushErrorHandler is explained and the example provided (src/ksp/ksp/tutorials/ex27.c), I do not understand how I should effectively use the command.
>> Should I change the entire try/catch with PetscPushErrorHandler(PetscIgnoreErrorHandler,NULL); ?
>>
>> Best,
>> Francesco
>>
>>
>>> Il giorno 23 feb 2021, alle ore 11:54, Matthew Knepley <knepley at gmail.com> ha scritto:
>>>
>>> On Tue, Feb 23, 2021 at 3:54 AM Francesco Brarda <brardafrancesco at gmail.com> wrote:
>>> Hi!
>>>
>>> I am very new to the PETSc world. I am working with a GitHub repo that uses PETSc together with Stan (a statistics open source software), here you can find the discussion.
>>> It has been defined a functor to convert EigenVector to PetscVec and viceversa, both sequentially and in parallel.
>>> The file using these functions does the conversions with the sequential setting. I changed to those using MPI, that is from EigenVectorToPetscVecSeq to EigenVectorToPetscVecMPI and so on because I want to evaluate the scaling.
>>> Running the example with mpirun -n 5 examples/rosenbrock/rosenbrock optimize in the debug mode I get the error Caught signal number 11 SEGV. I therefore used the option -start_in_debugger and I get the following:
>>>
>>> For some reason, the -start_in_debuggger option is not being seen. Are you showing all the output? Once the debugger is attached,
>>> you run the program (conr) and then when you hit the SEGV you get a stack trace (where).
>>>
>>> THanks,
>>>
>>> Matt
>>>
>>> [2]PETSC ERROR: ------------------------------------------------------------------------
>>> [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>> [2]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>> [2]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>>> [2]PETSC ERROR: likely location of problem given in stack below
>>> [2]PETSC ERROR: --------------------- Stack Frames ------------------------------------
>>> [2]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>>> [2]PETSC ERROR: INSTEAD the line number of the start of the function
>>> [2]PETSC ERROR: is given.
>>> [3]PETSC ERROR: ------------------------------------------------------------------------
>>> [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
>>> [3]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>>> [3]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> [3]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
>>> [3]PETSC ERROR: likely location of problem given in stack below
>>> [3]PETSC ERROR: --------------------- Stack Frames ------------------------------------
>>> [3]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
>>> [3]PETSC ERROR: INSTEAD the line number of the start of the function
>>> [3]PETSC ERROR: is given.
>>> [3]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null)
>>> To prevent termination, change the error handler using PetscPushErrorHandler()
>>> [2]PETSC ERROR: PetscAbortErrorHandler: User provided function() line 0 in unknown file (null)
>>> To prevent termination, change the error handler using PetscPushErrorHandler()
>>>
>>> ===================================================================================
>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> = PID 22939 RUNNING AT srvulx13
>>> = EXIT CODE: 134
>>> = CLEANING UP REMAINING PROCESSES
>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> ===================================================================================
>>> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
>>> This typically refers to a problem with your application.
>>> Please see the FAQ page for debugging suggestions
>>>
>>> I read the documentation regarding the PetscAbortErrorHandler, but I do not know where should I use it. How can I solve the problem?
>>> I hope I have been clear enough.
>>> Attached you can find also my configure.log and make.log files.
>>>
>>> Best,
>>> Francesco
>>>
>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20210224/25e40de2/attachment-0001.html>
More information about the petsc-users
mailing list