[petsc-users] How to check where code hangs

Barry Smith bsmith at mcs.anl.gov
Fri Oct 3 07:22:54 CDT 2014


  You really have to use a debugger. When it is “hanging” you interrupt it and type bt.  Or if you are lucky you can use total view. Unfortunately getting access to the debugger on a batch system is often difficult. If your nodes have access to X windows and xterm you can run the PETSc program with, for example,

    -start_in_debugger -display $DISPLAY -debugger_nodes 0 

to have only the 0th MPI process start up the debugger. You can also use -debugger_nodes 0,2,8 for example to start a debugger on 3 different MPI processes.

  Barry

On Oct 3, 2014, at 2:13 AM, TAY wee-beng <zonexo at gmail.com> wrote:

> Hi,
> 
> This qn is not PETSc related but I hope to get some ans from experienced users.
> 
> I'm running an MPI code which uses 144 cpu. It aborts after a short while. I'm trying to find out where it aborts exactly.
> 
> However, being an MPI code, it seems quite difficult.
> 
> I used:
> 
> call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *, "xxx"
> 
> where xxx = 1,2,3 ....
> 
> If it prints up to 3, I know it aborts between 3 and 4. However, it doesn't seem to work as supposed.
> 
> I wonder why.
> 
> Also is there a better way to do this?
> 
> -- 
> Thank you.
> 
> Yours sincerely,
> 
> TAY wee-beng
> 



More information about the petsc-users mailing list