[mpich-discuss] debugging mpi without mpiexec -gdb
goodell at mcs.anl.gov
Tue Apr 13 09:45:59 CDT 2010
On Apr 13, 2010, at 8:44 AM, William Pearson wrote:
> I have an MPICH2 program that works fine with mpiexec -gdb -np
> 5 ...., but crashes quite quickly without the -gdb.
> Is there some combination of -machinefile and other parameters that
> I can give directly to my program, so that I can run it under gdb
> without using mpiexec?
Not really, the process manager is doing a lot of things behind the
scenes, so you can't just get rid of it.
There are several decent options for getting a debugger or debugger-
like thing on your parallel program:
1) run "mpiexec -n 5 xterm -e gdb ./your_app", which will create 5
xterm windows, each running gdb on one of your processes. This
particular configuration is very similar to the "-gdb" option to MPD's
2) You can try a variation of (1) that is sometimes useful by using
the MPMD launch syntax. "mpiexec -n 1 xterm -e gdb ./your_app : -n
3 ./your_app : -n 1 xterm -e gdb ./your_app" will launch gdb windows
only for ranks 0 and 4.
3) If there is an interesting place you want to examine in your
program, you can make your program stop there and wait until you
attach with the debugger and set/clear the dummy variable your program
is polling. This page talks about attaching with gdb: http://inside.mines.edu/~lwiencke/elab/gdb/gdb_22.html
4) Enable core dumps (usually via "ulimit -c unlimited") and let your
program crash, as long as it is crashing with a signal like SIGSEGV
and friends. Then load the core dump in gdb and figure out what went
5) Use valgrind instead of a proper debugger. "mpiexec -n 5 valgrind -
q ./your_app" will run your program under valgrind. This may or may
not tell you where your problems are, depending on what kind of
problem you are experiencing. If valgrind will show the problem but
you still need to debug, you can run "mpiexec -n 5 xterm -e valgrind -
q --db-attach=yes ./your_app". This will spawn 5 xterm windows, each
running one of your processes under valgrind. When valgrind
encounters a warning/error it will ask you if you want to attach.
6) Use Ashley Pittman's PADB debugger: http://padb.pittman.org.uk/
This isn't a full debugger, but it might give you enough information
to track down your bug.
7) Use a commercial parallel debugger, such as TotalView or DDT, that
understands MPI jobs and can deal with multiple processes at once.
More information about the mpich-discuss