[MPICH2-dev] mpiexec with gdb

Florin Isaila florin.isaila at gmail.com
Wed Oct 11 11:05:52 CDT 2006


Hi,
thank you very much, Ralph.

Your output is what I would have expected. But what happens when I run the
gdb (or even ddd the way you indicated) is that the program wouldn't stop at
the breakpoint and the gdb would just die, as shown below.
I have GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh) and mpich2-1.0.4p1.

Could that be a configuration problem? Any hints about how could I
investigate what happens? Why is the breakpoint bypassed?

c1::test(10:25am) #16% mpiexec -gdb -n 1 test
0:  (gdb) l
0:  1   void test_dt() {
0:  2     int *i =0;
0:  3     *i=1;
0:  4   }
0:  5
0:  6   int main(int argc, char* argv[]) {
0:  7     MPI_Init(&argc, &argv);
0:  8     test_dt();
0:  9     MPI_Finalize();
0:  10    return 0;
0:  (gdb) b 8
0:  Breakpoint 2 at 0x804969a: file test.c, line 8.
0:  (gdb) r
 rank 0 in job 167  c1_32771   caused collective abort of all ranks
  exit status of rank 0: killed by signal 9
c1::test(10:25am) #17%

Thanks
Florin

On 10/10/06, Ralph Butler <rbutler at mtsu.edu> wrote:
>
>
> On TueOct 10, at Tue Oct 10 4:48PM, Florin Isaila wrote:
>
> > Hi,
> >
> > I am having a problem running mpiexec with gdb. I set a breakpoint
> > at a program line, but the program wouldnt stop there in case an
> > error occurs (o/w it  stops normally).  The error  can be a
> > segmentation fault  or a  call to MPI_Abort.
> >
> > This makes debugging impossible. Is the old style of starting each
> > mpi process in a separate debugging session possible?
>
> I have tried running the pgm we see in your output in the same way
> you show and have included the output below.
> However, many folks prefer to use ddd like this:
>      mpiexec -n 2 ddd mpi_pgm
>
> This will launch 2 ddd windows on the desktop each running mpi_pgm.
> It's pretty easy to do around 4 this way.
>
> > While merging the output of several debuggers is helpful in some
> > cases, controlling each independent process is sometimes very
> > important.
> >
> > Here the simplest example with a forced segmentation fault. The
> > breakpoint at line 229 is ignored, even though the segmentation
> > fault occurs after. The gdb is also quited, without making clear
> > the source of error.
> >
> > stallion:~/tests/mpi/dtype % mpiexec -gdb -n 1 test
> > 0:  (gdb) l 204
> >
> > 0:  204 void test_dt() {
> > 0:  205   int *i = 0;
> > 0:  206   *i = 1;
> > 0:  209}
> >
> > 0:  (gdb) l 227
> > 0:  227 int main(int argc, char* argv[]) {
> > 0:  228   MPI_Init(&argc, &argv);
> > 0:  229   test_dt();
> > 0:  230   MPI_Finalize();
> > 0:  231   return 0;
> > 0:  232 }
> >
> > 0:  (gdb) b 229
> > 0:  Breakpoint 2 at 0x8049f79: file test.c, line 229.
> > 0:  (gdb) r
> >  rank 0 in job 72  stallion.ece.northwestern.edu_42447   caused
> > collective abort of all ranks
> >   exit status of rank 0: killed by signal 9
> >
> > Many thanks
> > Florin
>
> My run of the pgm:
>
> (magpie:52) % mpiexec -gdb -n 1 temp
> 0:  (gdb) l
> 0:  1   void test_dt() {
> 0:  2       int *i = 0;
> 0:  3       *i = 1;
> 0:  4   }
> 0:  5
> 0:  6   int main(int argc, char* argv[]) {
> 0:  7       MPI_Init(&argc, &argv);
> 0:  8       test_dt();
> 0:  9       MPI_Finalize();
> 0:  10      return 0;
> 0:  (gdb) b 8
> 0:  Breakpoint 2 at 0x80495fe: file temp.c, line 8.
> 0:  (gdb) r
> 0:  Continuing.
> 0:
> 0:  Breakpoint 2, main (argc=1, argv=0xbffff3b4) at temp.c:8
> 0:  8       test_dt();
> 0:  (gdb) 0:  (gdb) s
> 0:  test_dt () at temp.c:2
> 0:  2       int *i = 0;
> 0:  (gdb) s
> 0:  3       *i = 1;
> 0:  (gdb) p *i
> 0:  Cannot access memory at address 0x0
> 0:  (gdb) p i
> 0:  $1 = (int *) 0x0
> 0:  (gdb) c
> 0:  Continuing.
> 0:
> 0:  Program received signal SIGSEGV, Segmentation fault.
> 0:  0x080495d4 in test_dt () at temp.c:3
> 0:  3       *i = 1;
> 0:  (gdb) where
> 0:  #0  0x080495d4 in test_dt () at temp.c:3
> 0:  #1  0x08049603 in main (argc=1, argv=0xbffff3b4) at temp.c:8
> 0:  (gdb) q
> rank 0 in job 2  magpie_42682   caused collective abort of all ranks
>    exit status of rank 0: killed by signal 9
> (magpie:53) %
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20061011/301c0c96/attachment.htm>


More information about the mpich2-dev mailing list