On Mon, Apr 2, 2012 at 9:52 PM, Tabrez Ali <span dir="ltr"><<a href="mailto:stali@geology.wisc.edu">stali@geology.wisc.edu</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Matt/Barry<br>
<br>
My intention was to make sure that the code is bug free and since
PETSc was pre-installed on the cluster with various compilers it was
easier to test quickly rather than build all combinations myself.
Performance is of absolutely no concern.<br>
<br>
Things were working fine with 3.1 but recently the OS (Cray Linux
Env) was upgraded and so was PETSc (to 3.2).<br>
<br>
Matt<br>
<br>
I am attaching entire output.<br></div></blockquote><div><br></div><div><span style>> Unable to start debugger in xterm: No such file or directory</span><br style><span style>> aborting job:</span><br style></div>
<div><span style><br></span></div><div><span style>xterm is not in the path.</span></div><div><span style><br></span></div><div><span style> Matt</span></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Tabrez<br>
<br>
---<br>
<br>
stali@krakenpf2:~/meshes> which xterm<br>
/usr/bin/xterm<br>
stali@krakenpf2:~/meshes> aprun -n 1 ./defmod -f
2d_point_load_dyn_abc.inp -on_error_attach_debugger<br>
Reading input ...<br>
Reading mesh data ...<br>
Forming [K] ...<br>
Forming [M] & [M]^-1 ...<br>
Applying constraints ...<br>
Forming RHS ...<br>
Setting up solver ...<br>
Solving ...<br>
Time Step 0<br>
[0]PETSC ERROR:
------------------------------------------------------------------------<br>
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
Violation, probably memory access out of range<br>
[0]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger<br>
[0]PETSC ERROR: or see
<a href="http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind" target="_blank">http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind</a>[0]PETSC
ERROR: or try <a href="http://valgrind.org" target="_blank">http://valgrind.org</a> on GNU/linux and Apple Mac OS X to
find memory corruption errors<br>
[0]PETSC ERROR: configure using --with-debugging=yes, recompile,
link, and run <br>
[0]PETSC ERROR: to get more information on the crash.<br>
[0]PETSC ERROR: User provided function() line 0 in unknown directory
unknown file <br>
[0]PETSC ERROR: PETSC: Attaching gdb to ./defmod of pid 26164 on
display :0.0 on machine nid03538<br>
Unable to start debugger in xterm: No such file or directory<br>
aborting job:<br>
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0<br>
_pmii_daemon(SIGCHLD): [NID 03538] [c12-3c2s4n2] [Mon Apr 2
22:50:09 2012] PE 0 exit signal Aborted<br>
Application 134950 exit codes: 134<br>
Application 134950 resources: utime ~1s, stime ~0s<br>
<br>
On 04/02/2012 09:04 PM, Matthew Knepley wrote:
<blockquote type="cite">On Mon, Apr 2, 2012 at 8:57 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span>
wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><br>
On Apr 2, 2012, at 8:10 PM, Tabrez Ali wrote:<br>
<br>
> Hello<br>
><br>
> I am trying to debug a program using the switch
'-on_error_attach_debugger' but the vendor/sysadmin built
PETSc 3.2.00 is unable to start the debugger in xterm (see
text below). But xterm is installed. What am I doing wrong?<br>
><br>
> Btw the segfault happens during a call to MatMult but
only with vendor/sysadmin supplied PETSc 3.2 with PGI and
Intel compilers only and _not_ with CRAY or GNU compilers.<br>
<br>
</div>
My advice, blow off "the vendor/sysadmin supplied PETSc 3.2"
and just built it yourself so you can get real work done
instead of trying to debug their mess. I promise the vendor
one is not like a billion times faster or anything.</blockquote>
<div><br>
</div>
<div>If you want to justify this to anyone (like a funder), just
run both on ex5 for a large size and look at the flops on
MatMult. That</div>
<div>is probably your dominant cost (or your PC).</div>
<div><br>
</div>
<div> Matt</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span><font color="#888888"><br>
Barry<br>
</font></span>
<div>
<div><br>
<br>
<br>
><br>
> I also dont get the segfault if I build PETSc 3.2-p7
myself with PGI/Intel compilers.<br>
><br>
> Any ideas on how to diagnose the problem?
Unfortunately I cannot seem to run valgrind on this
particular machine.<br>
><br>
> Thanks in advance.<br>
><br>
> Tabrez<br>
><br>
> ---<br>
><br>
> stali@krakenpf1:~/meshes> which xterm<br>
> /usr/bin/xterm<br>
> stali@krakenpf1:~/meshes> aprun -n 1 ./defmod -f
2d_point_load_dyn_abc.inp -on_error_attach_debugger<br>
> ...<br>
> ...<br>
> ...<br>
> [0]PETSC ERROR:
------------------------------------------------------------------------<br>
> [0]PETSC ERROR: Caught signal number 11 SEGV:
Segmentation Violation, probably memory access out of
range<br>
> [0]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger<br>
> [0]PETSC ERROR: or see <a href="http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC" target="_blank">http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC</a>
ERROR: or try <a href="http://valgrind.org" target="_blank">http://valgrind.org</a>
on GNU/linux and Apple Mac OS X to find memory corruption
errors<br>
> [0]PETSC ERROR: configure using --with-debugging=yes,
recompile, link, and run<br>
> [0]PETSC ERROR: to get more information on the crash.<br>
> [0]PETSC ERROR: User provided function() line 0 in
unknown directory unknown file<br>
> [0]PETSC ERROR: PETSC: Attaching gdb to ./defmod of
pid 32384 on display localhost:20.0 on machine nid10649<br>
> Unable to start debugger in xterm: No such file or
directory<br>
> aborting job:<br>
> application called MPI_Abort(MPI_COMM_WORLD, 0) -
process 0<br>
> _pmii_daemon(SIGCHLD): [NID 10649] [c23-3c0s6n1] [Mon
Apr 2 13:06:48 2012] PE 0 exit signal Aborted<br>
> Application 133198 exit codes: 134<br>
> Application 133198 resources: utime ~1s, stime ~0s<br>
<br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all"><span class="HOEnZb"><font color="#888888">
<div><br>
</div>
-- <br>
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to
which their experiments lead.<br>
-- Norbert Wiener<br>
</font></span></blockquote>
<br>
</div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>