[MPICH] Tracing the mpich library with gdb and xtern

Darius Buntinas buntinas at mcs.anl.gov
Thu Jan 3 10:39:55 CST 2008


It's easiest if you run both processes on the same machine, then the 
DISPLAY values will be correct.

But if you need to use two machines, there are some tricks.

Using ssh (this is secure and the way I do it).  First some background 
info.  Ssh can be configured to forward X traffic from a remote machine 
back to your display.  When you ssh into another machine you'll see that 
DISPLAY is set to something like "localhost:10.0".  Now any process 
(that's owned by you) on the remote machine can display an xwindow on 
your local display by sending it to "localhost:10.0".  Another thing to 
notice is that every new ssh session (whether it's your ssh session or 
someone else's) to that node gets a different value for DISPLAY (e.g., 
"localhost:11.0").

So what I do is open one ssh session to each of the remote machines my 
jobs will run on.  This sets up the X forwarding, and you need to keep 
these open as long as you want X to be forwarded.  Now, read the values 
of DISPLAY from each ssh session.  If they're the same, say 
localhost:10.0, it's easy:

   mpiexec -n 2 -env DISPLAY localhost:10.0 xterm -e gdb ./cpi

You should see two xterms open up, one from each remote machine, with 
gdb running.

Now, if DISPLAY is not the same on both, then you'll have to set DISPLAY
differently for each process:

   mpiexec -n 1 -env DISPLAY localhost:10.0 xterm -e gdb ./cpi : \
            n 1 -env DISPLAY localhost:11.0 xterm -e gdb ./cpi

(Notice the colon (:) and the escaped linebreak).  The trick is to 
figure out which rank runs on which machine, so you use the right 
DISPLAY value on the right machine.  Of course with two processes, you 
can try it one way, and if it doesn't work flip the DISPLAY values and 
try again.  (alternatively you can check which rank is run on which 
machine like this:  "mpiexec -l -n 2 hostname").

Note that you can run any X program this way.  I generally use ddd as 
the debugger instead of "xterm -e gdb"

I hope this clarified more thatn it confused.

-d

On 01/03/2008 04:03 AM, Krishna Chaitanya wrote:
> Hi,
>        Thanks for the help, guess I complicated my question un-necessarily.
>        I wish to run a program on two machines and  have two  debug 
> windows on my local machine, so that i can trace through the pt2pt code. 
> This must concern xterm and setting the display variable correctly. At 
> this stage, I have the DISPLAY set to 0.0 on both the machines and I am 
> ssh-ing into the remote machine by using the -X switch. The debug window 
> for the remote machine is getting launched at the remote terminal but is 
> not getting displayed on mine. Please let me know what needs to be done 
> to have the window displayed on my machine.
> 
> Thanks,
> Krishna Chaitanya K
> 
> 
> 
> 
> On 1/2/08, *Darius Buntinas* <buntinas at mcs.anl.gov 
> <mailto:buntinas at mcs.anl.gov>> wrote:
> 
> 
>     I'm not sure exactly what you want to do, so here are a few ideas.
> 
>     If you want to start rank 0 before rank 1, you can change your test
>     program so it calls mpi_commrank right after mpi_init.  Then set a
>     breakpoint just after mpi_commrank, and when you hit the breakpoint, you
>     can look at the rank and decide which one to continue running.
> 
>     If you can't modify the test program, you can set the breakpoint just
>     after mpi_init, then read MPIDI_Process.my_pg_rank.  You'll have to
>     configure MPICH2 with --enable-g=dbg to get debugging symbols added (you
>     might also want to configure with --disable-compiler-optimizations to
>     remove the -O2 flag and make it easier to step through the code).
> 
>     If you really need to know the rank of a process before mpi_init is
>     called, you can read the PMI_RANK environment variable.  Note that
>     mpi_init performs an implicit barrier, so if one process calls
>     mpi_init,
>     it won't exit until all other processes have called mpi_init, so even if
>     you start one process before the others, it won't get past mpi_init.
> 
>     Hope that helps,
>     -d
> 
>     On 12/31/2007 04:07 AM, Krishna Chaitanya wrote:
>      > Hi,
>      >                I have been tracing the flow of the mpich code by
>      > executing a simple program having MPI_Send() and MPI_Recv(),on one
>      > machine.  I have been using gdb along with xtern to have two windows
>      > open at the same time as I step through the code. I wish to get a
>     better
>      > glimpse of the working of the point to point calls, by launching
>     the job
>      > on two machines and by tracing the flow in a similar manner.
>     Could you
>      > please tell me how I can go about it? I have a feeling that this
>     can be
>      > done by hard-coding the macros and telling the daemons directly
>     that one
>      > machine is rank1 and the other is rank2. That way, i can start rank1
>      > process first and the rank2 process a little later and trace
>     through the
>      > code.
>      >
>      > Thanks,
>      > Krishna Chaitanya,
>      > Dept of Information Technology,
>      > National Institute of Technology, Karnataka ( NITK )
>      > India
>      >
>      > --
>      > In the middle of difficulty, lies opportunity
> 
> 
> 
> 
> -- 
> In the middle of difficulty, lies opportunity




More information about the mpich-discuss mailing list