[MPICH2-dev] mpiexec with gdb

Rusty Lusk lusk at mcs.anl.gov
Wed Oct 11 11:18:20 CDT 2006


Could it be that you have not compiled the program for debugging  
(with -g)?

Regards,
Rusty

On Oct 11, 2006, at 11:05 AM, Florin Isaila wrote:

> Hi,
> thank you very much, Ralph.
>
> Your output is what I would have expected. But what happens when I  
> run the gdb (or even ddd the way you indicated) is that the program  
> wouldn't stop at the breakpoint and the gdb would just die, as  
> shown below.
> I have GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh) and mpich2-1.0.4p1.
>
> Could that be a configuration problem? Any hints about how could I  
> investigate what happens? Why is the breakpoint bypassed?
>
> c1::test(10:25am) #16% mpiexec -gdb -n 1 test
> 0:  (gdb) l
> 0:  1   void test_dt() {
> 0:  2     int *i =0;
> 0:  3     *i=1;
> 0:  4   }
> 0:  5
> 0:  6   int main(int argc, char* argv[]) {
> 0:  7     MPI_Init(&argc, &argv);
> 0:  8     test_dt();
> 0:  9     MPI_Finalize();
> 0:  10    return 0;
> 0:  (gdb) b 8
> 0:  Breakpoint 2 at 0x804969a: file test.c, line 8.
> 0:  (gdb) r
>  rank 0 in job 167  c1_32771   caused collective abort of all ranks
>   exit status of rank 0: killed by signal 9
> c1::test(10:25am) #17%
>
> Thanks
> Florin
>
> On 10/10/06, Ralph Butler <rbutler at mtsu.edu> wrote:
>
> On TueOct 10, at Tue Oct 10 4:48PM, Florin Isaila wrote:
>
> > Hi,
> >
> > I am having a problem running mpiexec with gdb. I set a breakpoint
> > at a program line, but the program wouldnt stop there in case an
> > error occurs (o/w it  stops normally).  The error  can be a
> > segmentation fault  or a  call to MPI_Abort.
> >
> > This makes debugging impossible. Is the old style of starting each
> > mpi process in a separate debugging session possible?
>
> I have tried running the pgm we see in your output in the same way
> you show and have included the output below.
> However, many folks prefer to use ddd like this:
>      mpiexec -n 2 ddd mpi_pgm
>
> This will launch 2 ddd windows on the desktop each running mpi_pgm.
> It's pretty easy to do around 4 this way.
>
> > While merging the output of several debuggers is helpful in some
> > cases, controlling each independent process is sometimes very
> > important.
> >
> > Here the simplest example with a forced segmentation fault. The
> > breakpoint at line 229 is ignored, even though the segmentation
> > fault occurs after. The gdb is also quited, without making clear
> > the source of error.
> >
> > stallion:~/tests/mpi/dtype % mpiexec -gdb -n 1 test
> > 0:  (gdb) l 204
> >
> > 0:  204 void test_dt() {
> > 0:  205   int *i = 0;
> > 0:  206   *i = 1;
> > 0:  209}
> >
> > 0:  (gdb) l 227
> > 0:  227 int main(int argc, char* argv[]) {
> > 0:  228   MPI_Init(&argc, &argv);
> > 0:  229   test_dt();
> > 0:  230   MPI_Finalize();
> > 0:  231   return 0;
> > 0:  232 }
> >
> > 0:  (gdb) b 229
> > 0:  Breakpoint 2 at 0x8049f79: file test.c, line 229.
> > 0:  (gdb) r
> >  rank 0 in job 72   stallion.ece.northwestern.edu_42447   caused
> > collective abort of all ranks
> >   exit status of rank 0: killed by signal 9
> >
> > Many thanks
> > Florin
>
> My run of the pgm:
>
> (magpie:52) % mpiexec -gdb -n 1 temp
> 0:  (gdb) l
> 0:  1   void test_dt() {
> 0:  2       int *i = 0;
> 0:  3       *i = 1;
> 0:  4   }
> 0:  5
> 0:  6   int main(int argc, char* argv[]) {
> 0:  7       MPI_Init(&argc, &argv);
> 0:  8       test_dt();
> 0:  9       MPI_Finalize();
> 0:  10      return 0;
> 0:  (gdb) b 8
> 0:  Breakpoint 2 at 0x80495fe: file temp.c, line 8.
> 0:  (gdb) r
> 0:  Continuing.
> 0:
> 0:  Breakpoint 2, main (argc=1, argv=0xbffff3b4) at temp.c:8
> 0:  8       test_dt();
> 0:  (gdb) 0:  (gdb) s
> 0:  test_dt () at temp.c:2
> 0:  2       int *i = 0;
> 0:  (gdb) s
> 0:  3       *i = 1;
> 0:  (gdb) p *i
> 0:  Cannot access memory at address 0x0
> 0:  (gdb) p i
> 0:  $1 = (int *) 0x0
> 0:  (gdb) c
> 0:  Continuing.
> 0:
> 0:  Program received signal SIGSEGV, Segmentation fault.
> 0:  0x080495d4 in test_dt () at temp.c:3
> 0:  3       *i = 1;
> 0:  (gdb) where
> 0:  #0  0x080495d4 in test_dt () at temp.c:3
> 0:  #1  0x08049603 in main (argc=1, argv=0xbffff3b4) at temp.c:8
> 0:  (gdb) q
> rank 0 in job 2  magpie_42682   caused collective abort of all ranks
>    exit status of rank 0: killed by signal 9
> (magpie:53) %
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20061011/6c9b8dc3/attachment.htm>


More information about the mpich2-dev mailing list