<div dir="ltr">The debugger stops when you start up, that's this code [1]. Then you want to hit 'continue' so your job runs normally to where it fails. You can also set a break point on PetscError since PETSc is catching the error from MPI. When you stop at the 'second breakpoint', you'll be at the part where your code has detected an error condition in MPI. Type a 'where' there to get the stack when the error was detected. <div>
<br></div><div>[1]</div><div><span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; background-color: rgb(255, 255, 255); ">(gdb) where<br>#0 0x00007fae5b941590 in __nanosleep_nocancel () at<br>
<div class="im" style="color: rgb(80, 0, 80); ">../sysdeps/unix/syscall-template.S:82<br></div>#1 0x00007fae5b94143c in __sleep (seconds=0) at<br>../sysdeps/unix/sysv/linux/sleep.c:138<br>#2 0x000000000056cc48 in PetscSleep (s=10) at psleep.c:56<br>
#3 0x0000000000838887 in PetscAttachDebugger () at adebug.c:410<br>#4 0x00000000005590a7 in PetscOptionsCheckInitial_Private () at init.c:392<br>#5 0x000000000055e40e in PetscInitialize (argc=0x7ffff403debc,<br>args=0x7ffff403deb0, file=0x0,<br>
help=0x0) at pinit.c:639<br>#6 0x0000000000524a16 in PetscSolver::InitializePetsc<br>(argc=0x7ffff403debc, argv=0x7ffff403deb0)<br> at /home/dsz/src/framework/trunk/solve/PetscSolver.cxx:124<br>#7 0x00000000004c404f in main (argc=4, argv=0x7ffff403e4c8)<br>
at /home/dsz/src/framework/trunk/solve/cd3t10mpi_main.cxx:526<br>(gdb)<br></span><br></div><div><br></div><div><br><div class="gmail_quote">On Fri, Aug 19, 2011 at 8:22 PM, Dominik Szczerba <span dir="ltr"><<a href="mailto:dominik@itis.ethz.ch">dominik@itis.ethz.ch</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">What do you mean by "the second break"?<br>
<font color="#888888"><br>
Dominik<br>
</font><div><div></div><div class="h5"><br>
On Fri, Aug 19, 2011 at 6:47 PM, Aron Ahmadia <<a href="mailto:aron.ahmadia@kaust.edu.sa">aron.ahmadia@kaust.edu.sa</a>> wrote:<br>
> You want to do a 'where' on the second break, when your program is raising<br>
> an abort signal...<br>
> A<br>
><br>
> On Fri, Aug 19, 2011 at 6:57 PM, Dominik Szczerba <<a href="mailto:dominik@itis.ethz.ch">dominik@itis.ethz.ch</a>><br>
> wrote:<br>
>><br>
>> (gdb) where<br>
>> #0 0x00007fae5b941590 in __nanosleep_nocancel () at<br>
>> ../sysdeps/unix/syscall-template.S:82<br>
>> #1 0x00007fae5b94143c in __sleep (seconds=0) at<br>
>> ../sysdeps/unix/sysv/linux/sleep.c:138<br>
>> #2 0x000000000056cc48 in PetscSleep (s=10) at psleep.c:56<br>
>> #3 0x0000000000838887 in PetscAttachDebugger () at adebug.c:410<br>
>> #4 0x00000000005590a7 in PetscOptionsCheckInitial_Private () at<br>
>> init.c:392<br>
>> #5 0x000000000055e40e in PetscInitialize (argc=0x7ffff403debc,<br>
>> args=0x7ffff403deb0, file=0x0,<br>
>> help=0x0) at pinit.c:639<br>
>> #6 0x0000000000524a16 in PetscSolver::InitializePetsc<br>
>> (argc=0x7ffff403debc, argv=0x7ffff403deb0)<br>
>> at /home/dsz/src/framework/trunk/solve/PetscSolver.cxx:124<br>
>> #7 0x00000000004c404f in main (argc=4, argv=0x7ffff403e4c8)<br>
>> at /home/dsz/src/framework/trunk/solve/cd3t10mpi_main.cxx:526<br>
>> (gdb)<br>
>><br>
>> PetscSolver.cxx:124:<br>
>><br>
>> ierr = PetscInitialize(argc, argv, (char *)0, (char *)0);<br>
>> CHKERRQ(ierr);<br>
>><br>
>> Hmmm, not very helpful.....<br>
>><br>
>> The app runs on one cpu, but silently crashes on two.<br>
>><br>
>> Any hints are very appreciated.<br>
>><br>
>> Dominik<br>
>><br>
>><br>
>><br>
>> On Fri, Aug 19, 2011 at 5:49 PM, Satish Balay <<a href="mailto:balay@mcs.anl.gov">balay@mcs.anl.gov</a>> wrote:<br>
>> > On Fri, 19 Aug 2011, Dominik Szczerba wrote:<br>
>> ><br>
>> >> Hi,<br>
>> >><br>
>> >> I am starting my app in the debugger as:<br>
>> >><br>
>> >> mpiexec -np 2 sm3t4mpi run.xml -start_in_debugger -display :0.0<br>
>> >><br>
>> >> In the console I get:<br>
>> >><br>
>> >> [1]PETSC ERROR: MPI error 14<br>
>> >><br>
>> >> in the two open terminals with gdb I get:<br>
>> >><br>
>> >> 0x00007f2ecdd15590 in __nanosleep_nocancel () at<br>
>> >> ../sysdeps/unix/syscall-template.S:82<br>
>> >> 82 ../sysdeps/unix/syscall-template.S: No such file or directory.<br>
>> >> in ../sysdeps/unix/syscall-template.S<br>
>> >> (gdb)<br>
>> >><br>
>> >><br>
>> >> I type 'c' nonetheless and see:<br>
>> >><br>
>> >> (gdb) c<br>
>> >> Continuing.<br>
>> >> [New Thread 0x7f268e975700 (LWP 22388)]<br>
>> >><br>
>> >> Program received signal SIGABRT, Aborted.<br>
>> >> 0x00007f268f421d05 in raise (sig=6) at<br>
>> >> ../nptl/sysdeps/unix/sysv/linux/raise.c:64<br>
>> >> 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or<br>
>> >> directory.<br>
>> >> in ../nptl/sysdeps/unix/sysv/linux/raise.c<br>
>> >><br>
>> >><br>
>> >><br>
>> >> How do I go on debugging?<br>
>> ><br>
>> > what do you get for:<br>
>> ><br>
>> > (gdb) where<br>
>> ><br>
>> > Satish<br>
>> ><br>
>> ><br>
>> >><br>
>> >> Many thanks for any hints,<br>
>> >><br>
>> >> Dominik<br>
>> >><br>
>> ><br>
>> ><br>
><br>
><br>
</div></div></blockquote></div><br></div></div>