[MPICH2-dev] mpiexec with gdb

Florin Isaila florin.isaila at gmail.com
Wed Oct 11 11:27:34 CDT 2006


On 10/11/06, Rusty Lusk <lusk at mcs.anl.gov> wrote:
>
> Could it be that you have not compiled the program for debugging (with
> -g)?
>

I compiled:
mpicc -g test.c -o test

even though it appers not to be necessary, because I configured mpich with
--enable-g=dbg

Thanks
Florin


Regards,
> Rusty
>
> On Oct 11, 2006, at 11:05 AM, Florin Isaila wrote:
>
> Hi,
> thank you very much, Ralph.
>
> Your output is what I would have expected. But what happens when I run the
> gdb (or even ddd the way you indicated) is that the program wouldn't stop at
> the breakpoint and the gdb would just die, as shown below.
> I have GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh) and mpich2-1.0.4p1.
>
> Could that be a configuration problem? Any hints about how could I
> investigate what happens? Why is the breakpoint bypassed?
>
> c1::test(10:25am) #16% mpiexec -gdb -n 1 test
> 0:  (gdb) l
> 0:  1   void test_dt() {
> 0:  2     int *i =0;
> 0:  3     *i=1;
> 0:  4   }
> 0:  5
> 0:  6   int main(int argc, char* argv[]) {
> 0:  7     MPI_Init(&argc, &argv);
> 0:  8     test_dt();
> 0:  9     MPI_Finalize();
> 0:  10    return 0;
> 0:  (gdb) b 8
> 0:  Breakpoint 2 at 0x804969a: file test.c, line 8.
> 0:  (gdb) r
>  rank 0 in job 167  c1_32771   caused collective abort of all ranks
>   exit status of rank 0: killed by signal 9
> c1::test(10:25am) #17%
>
> Thanks
> Florin
>
> On 10/10/06, Ralph Butler <rbutler at mtsu.edu> wrote:
> >
> >
> > On TueOct 10, at Tue Oct 10 4:48PM, Florin Isaila wrote:
> >
> > > Hi,
> > >
> > > I am having a problem running mpiexec with gdb. I set a breakpoint
> > > at a program line, but the program wouldnt stop there in case an
> > > error occurs (o/w it  stops normally).  The error  can be a
> > > segmentation fault  or a  call to MPI_Abort.
> > >
> > > This makes debugging impossible. Is the old style of starting each
> > > mpi process in a separate debugging session possible?
> >
> > I have tried running the pgm we see in your output in the same way
> > you show and have included the output below.
> > However, many folks prefer to use ddd like this:
> >      mpiexec -n 2 ddd mpi_pgm
> >
> > This will launch 2 ddd windows on the desktop each running mpi_pgm.
> > It's pretty easy to do around 4 this way.
> >
> > > While merging the output of several debuggers is helpful in some
> > > cases, controlling each independent process is sometimes very
> > > important.
> > >
> > > Here the simplest example with a forced segmentation fault. The
> > > breakpoint at line 229 is ignored, even though the segmentation
> > > fault occurs after. The gdb is also quited, without making clear
> > > the source of error.
> > >
> > > stallion:~/tests/mpi/dtype % mpiexec -gdb -n 1 test
> > > 0:  (gdb) l 204
> > >
> > > 0:  204 void test_dt() {
> > > 0:  205   int *i = 0;
> > > 0:  206   *i = 1;
> > > 0:  209}
> > >
> > > 0:  (gdb) l 227
> > > 0:  227 int main(int argc, char* argv[]) {
> > > 0:  228   MPI_Init(&argc, &argv);
> > > 0:  229   test_dt();
> > > 0:  230   MPI_Finalize();
> > > 0:  231   return 0;
> > > 0:  232 }
> > >
> > > 0:  (gdb) b 229
> > > 0:  Breakpoint 2 at 0x8049f79: file test.c, line 229.
> > > 0:  (gdb) r
> > >  rank 0 in job 72   stallion.ece.northwestern.edu_42447   caused
> > > collective abort of all ranks
> > >   exit status of rank 0: killed by signal 9
> > >
> > > Many thanks
> > > Florin
> >
> > My run of the pgm:
> >
> > (magpie:52) % mpiexec -gdb -n 1 temp
> > 0:  (gdb) l
> > 0:  1   void test_dt() {
> > 0:  2       int *i = 0;
> > 0:  3       *i = 1;
> > 0:  4   }
> > 0:  5
> > 0:  6   int main(int argc, char* argv[]) {
> > 0:  7       MPI_Init(&argc, &argv);
> > 0:  8       test_dt();
> > 0:  9       MPI_Finalize();
> > 0:  10      return 0;
> > 0:  (gdb) b 8
> > 0:  Breakpoint 2 at 0x80495fe: file temp.c, line 8.
> > 0:  (gdb) r
> > 0:  Continuing.
> > 0:
> > 0:  Breakpoint 2, main (argc=1, argv=0xbffff3b4) at temp.c:8
> > 0:  8       test_dt();
> > 0:  (gdb) 0:  (gdb) s
> > 0:  test_dt () at temp.c:2
> > 0:  2       int *i = 0;
> > 0:  (gdb) s
> > 0:  3       *i = 1;
> > 0:  (gdb) p *i
> > 0:  Cannot access memory at address 0x0
> > 0:  (gdb) p i
> > 0:  $1 = (int *) 0x0
> > 0:  (gdb) c
> > 0:  Continuing.
> > 0:
> > 0:  Program received signal SIGSEGV, Segmentation fault.
> > 0:  0x080495d4 in test_dt () at temp.c:3
> > 0:  3       *i = 1;
> > 0:  (gdb) where
> > 0:  #0  0x080495d4 in test_dt () at temp.c:3
> > 0:  #1  0x08049603 in main (argc=1, argv=0xbffff3b4) at temp.c:8
> > 0:  (gdb) q
> > rank 0 in job 2  magpie_42682   caused collective abort of all ranks
> >    exit status of rank 0: killed by signal 9
> > (magpie:53) %
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20061011/daf40b53/attachment.htm>


More information about the mpich2-dev mailing list