[mpich-discuss] debugging mpi without mpiexec -gdb

Dave Goodell goodell at mcs.anl.gov
Tue Apr 13 09:45:59 CDT 2010


On Apr 13, 2010, at 8:44 AM, William Pearson wrote:

>
> I have an MPICH2 program that works fine with mpiexec -gdb -np  
> 5 ...., but crashes quite quickly without the -gdb.
>
> Is there some combination of -machinefile and other parameters that  
> I can give directly to my program, so that I can run it under gdb  
> without using mpiexec?

Not really, the process manager is doing a lot of things behind the  
scenes, so you can't just get rid of it.

There are several decent options for getting a debugger or debugger- 
like thing on your parallel program:

1) run "mpiexec -n 5 xterm -e gdb ./your_app", which will create 5  
xterm windows, each running gdb on one of your processes.  This  
particular configuration is very similar to the "-gdb" option to MPD's  
mpiexec.

2) You can try a variation of (1) that is sometimes useful by using  
the MPMD launch syntax.  "mpiexec -n 1 xterm -e gdb ./your_app : -n  
3 ./your_app : -n 1 xterm -e gdb ./your_app" will launch gdb windows  
only for ranks 0 and 4.

3) If there is an interesting place you want to examine in your  
program, you can make your program stop there and wait until you  
attach with the debugger and set/clear the dummy variable your program  
is polling.  This page talks about attaching with gdb: http://inside.mines.edu/~lwiencke/elab/gdb/gdb_22.html

4) Enable core dumps (usually via "ulimit -c unlimited") and let your  
program crash, as long as it is crashing with a signal like SIGSEGV  
and friends.  Then load the core dump in gdb and figure out what went  
wrong.

5) Use valgrind instead of a proper debugger.  "mpiexec -n 5 valgrind - 
q ./your_app" will run your program under valgrind.  This may or may  
not tell you where your problems are, depending on what kind of  
problem you are experiencing.  If valgrind will show the problem but  
you still need to debug, you can run "mpiexec -n 5 xterm -e valgrind - 
q --db-attach=yes ./your_app".  This will spawn 5 xterm windows, each  
running one of your processes under valgrind.  When valgrind  
encounters a warning/error it will ask you if you want to attach.

6) Use Ashley Pittman's PADB debugger: http://padb.pittman.org.uk/   
This isn't a full debugger, but it might give you enough information  
to track down your bug.

7) Use a commercial parallel debugger, such as TotalView or DDT, that  
understands MPI jobs and can deal with multiple processes at once.

-Dave



More information about the mpich-discuss mailing list