[petsc-users] Debugger question

Satish Balay balay at mcs.anl.gov
Mon Apr 2 21:05:07 CDT 2012


Sounds like a Cray machine.

start_in_debugger is useful for debugging on workstations [or
clusters] etc where there is some control on X11 tunnels. Also
'xterm','gdb' or similar debugger should be available on the compute
nodes [along with a x/ssh tunnel].

On a cray - you are better off looking for a parallel debugger. Don't
know if cray has one available.

Wrt debugging - you might want to run your code with valgrind on a
linux box..

Satish

On Mon, 2 Apr 2012, Tabrez Ali wrote:

> Hello
> 
> I am trying to debug a program using the switch '-on_error_attach_debugger'
> but  the vendor/sysadmin built PETSc 3.2.00 is unable to start the debugger in
> xterm (see text below). But xterm is installed. What am I doing wrong?
> 
> Btw the segfault happens during a call to MatMult but only with
> vendor/sysadmin supplied PETSc 3.2 with PGI and Intel compilers only and _not_
> with CRAY or GNU compilers.
> 
> I also dont get the segfault if I build PETSc 3.2-p7 myself with PGI/Intel
> compilers.
> 
> Any ideas on how to diagnose the problem? Unfortunately I cannot seem to run
> valgrind on this particular machine.
> 
> Thanks in advance.
> 
> Tabrez
> 
> ---
> 
> stali at krakenpf1:~/meshes> which xterm
> /usr/bin/xterm
> stali at krakenpf1:~/meshes> aprun -n 1 ./defmod -f 2d_point_load_dyn_abc.inp
> -on_error_attach_debugger
> ...
> ...
> ...
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably
> memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/petsc-as/documentation/faq.html#valgrind[0]PETSC
> ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find
> memory corruption errors
> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
> [0]PETSC ERROR: to get more information on the crash.
> [0]PETSC ERROR: User provided function() line 0 in unknown directory unknown
> file
> [0]PETSC ERROR: PETSC: Attaching gdb to ./defmod of pid 32384 on display
> localhost:20.0 on machine nid10649
> Unable to start debugger in xterm: No such file or directory
> aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0
> _pmii_daemon(SIGCHLD): [NID 10649] [c23-3c0s6n1] [Mon Apr  2 13:06:48 2012] PE
> 0 exit signal Aborted
> Application 133198 exit codes: 134
> Application 133198 resources: utime ~1s, stime ~0s
> 



More information about the petsc-users mailing list