[MPICH2-dev] mpiexec with gdb
Florin Isaila
florin.isaila at gmail.com
Wed Oct 11 11:27:34 CDT 2006
On 10/11/06, Rusty Lusk <lusk at mcs.anl.gov> wrote:
>
> Could it be that you have not compiled the program for debugging (with
> -g)?
>
I compiled:
mpicc -g test.c -o test
even though it appers not to be necessary, because I configured mpich with
--enable-g=dbg
Thanks
Florin
Regards,
> Rusty
>
> On Oct 11, 2006, at 11:05 AM, Florin Isaila wrote:
>
> Hi,
> thank you very much, Ralph.
>
> Your output is what I would have expected. But what happens when I run the
> gdb (or even ddd the way you indicated) is that the program wouldn't stop at
> the breakpoint and the gdb would just die, as shown below.
> I have GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh) and mpich2-1.0.4p1.
>
> Could that be a configuration problem? Any hints about how could I
> investigate what happens? Why is the breakpoint bypassed?
>
> c1::test(10:25am) #16% mpiexec -gdb -n 1 test
> 0: (gdb) l
> 0: 1 void test_dt() {
> 0: 2 int *i =0;
> 0: 3 *i=1;
> 0: 4 }
> 0: 5
> 0: 6 int main(int argc, char* argv[]) {
> 0: 7 MPI_Init(&argc, &argv);
> 0: 8 test_dt();
> 0: 9 MPI_Finalize();
> 0: 10 return 0;
> 0: (gdb) b 8
> 0: Breakpoint 2 at 0x804969a: file test.c, line 8.
> 0: (gdb) r
> rank 0 in job 167 c1_32771 caused collective abort of all ranks
> exit status of rank 0: killed by signal 9
> c1::test(10:25am) #17%
>
> Thanks
> Florin
>
> On 10/10/06, Ralph Butler <rbutler at mtsu.edu> wrote:
> >
> >
> > On TueOct 10, at Tue Oct 10 4:48PM, Florin Isaila wrote:
> >
> > > Hi,
> > >
> > > I am having a problem running mpiexec with gdb. I set a breakpoint
> > > at a program line, but the program wouldnt stop there in case an
> > > error occurs (o/w it stops normally). The error can be a
> > > segmentation fault or a call to MPI_Abort.
> > >
> > > This makes debugging impossible. Is the old style of starting each
> > > mpi process in a separate debugging session possible?
> >
> > I have tried running the pgm we see in your output in the same way
> > you show and have included the output below.
> > However, many folks prefer to use ddd like this:
> > mpiexec -n 2 ddd mpi_pgm
> >
> > This will launch 2 ddd windows on the desktop each running mpi_pgm.
> > It's pretty easy to do around 4 this way.
> >
> > > While merging the output of several debuggers is helpful in some
> > > cases, controlling each independent process is sometimes very
> > > important.
> > >
> > > Here the simplest example with a forced segmentation fault. The
> > > breakpoint at line 229 is ignored, even though the segmentation
> > > fault occurs after. The gdb is also quited, without making clear
> > > the source of error.
> > >
> > > stallion:~/tests/mpi/dtype % mpiexec -gdb -n 1 test
> > > 0: (gdb) l 204
> > >
> > > 0: 204 void test_dt() {
> > > 0: 205 int *i = 0;
> > > 0: 206 *i = 1;
> > > 0: 209}
> > >
> > > 0: (gdb) l 227
> > > 0: 227 int main(int argc, char* argv[]) {
> > > 0: 228 MPI_Init(&argc, &argv);
> > > 0: 229 test_dt();
> > > 0: 230 MPI_Finalize();
> > > 0: 231 return 0;
> > > 0: 232 }
> > >
> > > 0: (gdb) b 229
> > > 0: Breakpoint 2 at 0x8049f79: file test.c, line 229.
> > > 0: (gdb) r
> > > rank 0 in job 72 stallion.ece.northwestern.edu_42447 caused
> > > collective abort of all ranks
> > > exit status of rank 0: killed by signal 9
> > >
> > > Many thanks
> > > Florin
> >
> > My run of the pgm:
> >
> > (magpie:52) % mpiexec -gdb -n 1 temp
> > 0: (gdb) l
> > 0: 1 void test_dt() {
> > 0: 2 int *i = 0;
> > 0: 3 *i = 1;
> > 0: 4 }
> > 0: 5
> > 0: 6 int main(int argc, char* argv[]) {
> > 0: 7 MPI_Init(&argc, &argv);
> > 0: 8 test_dt();
> > 0: 9 MPI_Finalize();
> > 0: 10 return 0;
> > 0: (gdb) b 8
> > 0: Breakpoint 2 at 0x80495fe: file temp.c, line 8.
> > 0: (gdb) r
> > 0: Continuing.
> > 0:
> > 0: Breakpoint 2, main (argc=1, argv=0xbffff3b4) at temp.c:8
> > 0: 8 test_dt();
> > 0: (gdb) 0: (gdb) s
> > 0: test_dt () at temp.c:2
> > 0: 2 int *i = 0;
> > 0: (gdb) s
> > 0: 3 *i = 1;
> > 0: (gdb) p *i
> > 0: Cannot access memory at address 0x0
> > 0: (gdb) p i
> > 0: $1 = (int *) 0x0
> > 0: (gdb) c
> > 0: Continuing.
> > 0:
> > 0: Program received signal SIGSEGV, Segmentation fault.
> > 0: 0x080495d4 in test_dt () at temp.c:3
> > 0: 3 *i = 1;
> > 0: (gdb) where
> > 0: #0 0x080495d4 in test_dt () at temp.c:3
> > 0: #1 0x08049603 in main (argc=1, argv=0xbffff3b4) at temp.c:8
> > 0: (gdb) q
> > rank 0 in job 2 magpie_42682 caused collective abort of all ranks
> > exit status of rank 0: killed by signal 9
> > (magpie:53) %
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20061011/daf40b53/attachment.htm>
More information about the mpich2-dev
mailing list