<div class="gmail_extra">On Sat, Apr 28, 2012 at 8:36 PM, Andrew Spott <span dir="ltr"><<a href="mailto:andrew.spott@gmail.com" target="_blank">andrew.spott@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div>When I attach debugger on error on the local machine, I get a bunch of lines like this one:</div><div><br></div><div>warning: Could not find object file "/private/tmp/homebrew-gcc-4.6.2-HNPr/gcc-4.6.2/build/x86_64-apple-darwin11.3.0/libstdc++-v3/src/../libsupc++/.libs/libsupc++convenience.a(cp-demangle.o)" - no debug information available for "cp-demangle.c".</div>
</div></blockquote><div><br></div><div>It looks like you build with autotools. That just makes things hard :)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div>then ". done" And then nothing. It looks like the program exits before the debugger can attach. after a while I get this:</div></div></blockquote><div><br></div><div>You can use -debugger_pause 10 to make it wait 10s before continuing after spawning the debugger. Make it long</div>
<div>enough to attach.</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div>/Users/spott/Documents/Code/EnergyBasisSchrodingerSolver/data/ebss-input/basis_rmax1.00e+02_rmin1.00e-06_dr1.00e-01/76151: No such file or directory</div>
<div>Unable to access task for process-id 76151: (os/kern) failure.</div><div><br></div><div>in the gdb window. In the terminal window, I get</div><div><br></div><div><div>application called MPI_Abort(MPI_COMM_WORLD, 0) - process 3</div>
<div>[cli_3]: aborting job:</div></div><div><br></div><div>if I just "start_in_debugger" I just don't' get the "MPI_Abort" thing, but everything else is the same.</div><div><br></div><div>any ideas?</div>
<div><br></div><div>-Andrew</div><div><br></div><div><div><div><div><div>On Apr 28, 2012, at 6:11 PM, Matthew Knepley wrote:</div><br><blockquote type="cite"><div class="gmail_extra">On Sat, Apr 28, 2012 at 8:07 PM, Andrew Spott <span dir="ltr"><<a href="mailto:andrew.spott@gmail.com" target="_blank">andrew.spott@gmail.com</a>></span> wrote:<br>
<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">are there any tricks to doing this across ssh?<div><br></div><div>I've attempted it using the method given, but I can't get it to start in the debugger or to attach the debugger, the program just exits or hangs after telling me the error.</div>
</div></blockquote><div><br></div><div>Is there a reason you cannot run this problem on your local machine with 4 processes?</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div>-Andrew</div><div><br><div><div>On Apr 28, 2012, at 4:45 PM, Matthew Knepley wrote:</div><br><blockquote type="cite"><div class="gmail_extra">On Sat, Apr 28, 2012 at 6:39 PM, Andrew Spott <span dir="ltr"><<a href="mailto:andrew.spott@gmail.com" target="_blank">andrew.spott@gmail.com</a>></span> wrote:<br>
<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
>-start_in-debugger noxterm -debugger_nodes 14<br>
<br>
All my cores are on the same machine, is this supposed to start a debugger on processor 14? or computer 14?<br></blockquote><div><br></div><div>Neither. This spawns a gdb process on the same node as the process with MPI rank 14. Then attaches gdb</div>
<div>to process 14.</div><div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I don't think I have x11 setup properly for the compute nodes, so x11 isn't really an option.<br>
<br>
Thanks for the help.<br>
<br>
-Andrew<br>
<br>
<br>
On Apr 27, 2012, at 7:26 PM, Satish Balay wrote:<br>
<br>
> On Fri, 27 Apr 2012, Andrew Spott wrote:<br>
><br>
>> I'm honestly stumped.<br>
>><br>
>> I have some petsc code that essentially just populates a matrix in parallel, then puts it in a file. All my code that uses floating point computations is checked for NaN's and infinities and it doesn't seem to show up. However, when I run it on more than 4 cores, I get floating point exceptions that kill the program. I tried turning off the exceptions from petsc, but the program still dies from them, just without the petsc error message.<br>
>><br>
>> I honestly don't know where to go, I suppose I should attach a debugger, but I'm not sure how to do that for multi-processor code.<br>
><br>
> assuming you have X11 setup properly from compute nodes you can run<br>
> with the extra option '-start_in_debugger'<br>
><br>
> If X11 is not properly setup - and you'd like to run gdb on one of the<br>
> nodes [say node 14 where you see SEGV] - you can do:<br>
><br>
> -start_in-debugger noxterm -debugger_nodes 14<br>
><br>
> Or try valgrind<br>
><br>
> mpiexec -n 16 valgrind --tool=memcheck -q ./executable<br>
><br>
><br>
> For debugging - its best to install with --download-mpich [so that its<br>
> valgrind clean] - and run all mpi stuff on a single machine - [usually<br>
> X11 works well from a single machine.]<br>
><br>
> Satish<br>
><br>
>><br>
>> any ideas? (long error message below):<br>
>><br>
>> -Andrew<br>
>><br>
>> [14]PETSC ERROR: ------------------------------------------------------------------------<br>
>> [14]PETSC ERROR: Caught signal number 8 FPE: Floating Point Exception,probably divide by zero<br>
>> [14]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger<br>
>> [14]PETSC ERROR: or see <a href="http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[14]PETSC" target="_blank">http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[14]PETSC</a> ERROR: or try <a href="http://valgrind.org/" target="_blank">http://valgrind.org</a> on GNU/linux and Apple Mac OS X to find memory corruption errors<br>
>> [14]PETSC ERROR: likely location of problem given in stack below<br>
>> [14]PETSC ERROR: --------------------- Stack Frames ------------------------------------<br>
>> [14]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,<br>
>> [14]PETSC ERROR: INSTEAD the line number of the start of the function<br>
>> [14]PETSC ERROR: is given.<br>
>> [14]PETSC ERROR: --------------------- Error Message ------------------------------------<br>
>> [14]PETSC ERROR: Signal received!<br>
>> [14]PETSC ERROR: ------------------------------------------------------------------------<br>
>> [14]PE[15]PETSC ERROR: ------------------------------------------------------------------------<br>
>> [15]PETSC ERROR: Caught signal number 8 FPE: Floating Point Exception,probably divide by zero<br>
>> [15]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger<br>
>> [15]PETSC ERROR: or see <a href="http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[15]PETSC" target="_blank">http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[15]PETSC</a> ERROR: or try <a href="http://valgrind.org/" target="_blank">http://valgrind.org</a> on GNU/linux and Apple Mac OS X to find memory corruption errors<br>
>> [15]PETSC ERROR: likely location of problem given in stack below<br>
>> [15]PETSC ERROR: --------------------- Stack Frames ------------------------------------<br>
>> [15]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,<br>
>> [15]PETSC ERROR: INSTEAD the line number of the start of the function<br>
>> [15]PETSC ERROR: is given.<br>
>> [15]PETSC ERROR: --------------------- Error Message ------------------------------------<br>
>> [15]PETSC ERROR: Signal received!<br>
>> [15]PETSC ERROR: ------------------------------------------------------------------------<br>
>> [15]PETSC ERROR: Petsc Release Version 3.2.0, Patch 6, Wed Jan 11 09:28:45 CST 2012<br>
>> [14]PETSC ERROR: See docs/changes/index.html for recent updates.<br>
>> [14]PETSC ERROR: See docs/faq.html for hints about trouble shooting.<br>
>> [14]PETSC ERROR: See docs/index.html for manual pages.<br>
>> [14]PETSC ERROR: ------------------------------------------------------------------------<br>
>> [14]PETSC ERROR: /home/becker/ansp6066/local/bin/finddme on a linux-gnu named <a href="http://photon9.colorado.edu/" target="_blank">photon9.colorado.edu</a> by ansp6066 Fri Apr 27 18:01:55 2012<br>
>> [14]PETSC ERROR: Libraries linked from /home/becker/ansp6066/local/petsc-3.2-p6/lib<br>
>> [14]PETSC ERROR: Configure run at Mon Feb 27 11:17:14 2012<br>
>> [14]PETSC ERROR: Configure options --prefix=/home/becker/ansp6066/local/petsc-3.2-p6 --with-c++-support --with-fortran --with-mpi-dir=/usr/local/mpich2 --with-shared-libraries=0 --with-scalar-type=complex --with-blas-lapack-libs=/central/intel/mkl/lib/em64t/libmkl_core.a --with-clanguage=cxx<br>
>> [14]PETSC ERROR: ------------------------------------------------------------------------<br>
>> [14]TSC ERROR: Petsc Release Version 3.2.0, Patch 6, Wed Jan 11 09:28:45 CST 2012<br>
>> [15]PETSC ERROR: See docs/changes/index.html for recent updates.<br>
>> [15]PETSC ERROR: See docs/faq.html for hints about trouble shooting.<br>
>> [15]PETSC ERROR: See docs/index.html for manual pages.<br>
>> [15]PETSC ERROR: ------------------------------------------------------------------------<br>
>> [15]PETSC ERROR: /home/becker/ansp6066/local/bin/finddme on a linux-gnu named <a href="http://photon9.colorado.edu/" target="_blank">photon9.colorado.edu</a> by ansp6066 Fri Apr 27 18:01:55 2012<br>
>> [15]PETSC ERROR: Libraries linked from /home/becker/ansp6066/local/petsc-3.2-p6/lib<br>
>> [15]PETSC ERROR: Configure run at Mon Feb 27 11:17:14 2012<br>
>> [15]PETSC ERROR: Configure options --prefix=/home/becker/ansp6066/local/petsc-3.2-p6 --with-c++-support --with-fortran --with-mpi-dir=/usr/local/mpich2 --with-shared-libraries=0 --with-scalar-type=complex --with-blas-lapack-libs=/central/intel/mkl/lib/em64t/libmkl_core.a --with-clanguage=cxx<br>
>> [15]PETSC ERROR: ------------------------------------------------------------------------<br>
>> [15]PETSC ERROR: User provided function() line 0 in unknown directory unknown file<br>
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 14PETSC ERROR: User provided function() line 0 in unknown directory unknown file<br>
>> application called MPI_Abort(MPI_COMM_WORLD, 59) - process 15[0]0:Return code = 0, signaled with Interrupt<br>
><br>
<br>
</blockquote></div><br><br clear="all"><span class="HOEnZb"><font color="#888888"><span><font color="#888888"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>
</font></span></font></span></div><span class="HOEnZb"><font color="#888888">
</font></span></blockquote></div><span class="HOEnZb"><font color="#888888"><br></font></span></div></div></blockquote></div><span class="HOEnZb"><font color="#888888"><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>
</font></span></div>
</blockquote></div><br></div></div></div></div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>
</div>