[MPICH2-dev] mpiexec with gdb
Florin Isaila
florin.isaila at gmail.com
Wed Oct 11 12:08:58 CDT 2006
Ralph, you are right,
with SUSE LINUX 10.0 and gdb 6.4 works fine.
Is anybody with Red Hat Linux for which it works? If yes, what gdb version
are you using?
On my machine I have :
- Red Hat Enterprise Linux WS release 4 (Nahant Update 4)
- gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
- mpich2-1.0.4p1
Thanks
Florin
On 10/11/06, Ralph Butler <rbutler at mtsu.edu> wrote:
>
> I can't say. The problem with gdb is that it produces different
> output on different platforms, even
> different installations of linux.
> Our program attempts to parse that output and (within certain bounds)
> allow for differences.
> This is done in mpdgdbdrv.py
> Some folks have dived into the parser code and hacked it to print
> exactly what their particular
> gdb is producing trying to guess at how to make mpdgdbdrv respond
> correctly. I am not
> advocating that you do that because none of us really wants to spend
> lots of time guessing at
> what some odd flavor of gdb might produce, but it's about the only
> way to see for sure
> what is going wrong.
> --ralph
>
> On WedOct 11, at Wed Oct 11 11:05AM, Florin Isaila wrote:
>
> > Hi,
> > thank you very much, Ralph.
> >
> > Your output is what I would have expected. But what happens when I
> > run the gdb (or even ddd the way you indicated) is that the program
> > wouldn't stop at the breakpoint and the gdb would just die, as
> > shown below.
> > I have GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh) and mpich2-1.0.4p1.
> >
> > Could that be a configuration problem? Any hints about how could I
> > investigate what happens? Why is the breakpoint bypassed?
> >
> > c1::test(10:25am) #16% mpiexec -gdb -n 1 test
> > 0: (gdb) l
> > 0: 1 void test_dt() {
> > 0: 2 int *i =0;
> > 0: 3 *i=1;
> > 0: 4 }
> > 0: 5
> > 0: 6 int main(int argc, char* argv[]) {
> > 0: 7 MPI_Init(&argc, &argv);
> > 0: 8 test_dt();
> > 0: 9 MPI_Finalize();
> > 0: 10 return 0;
> > 0: (gdb) b 8
> > 0: Breakpoint 2 at 0x804969a: file test.c, line 8.
> > 0: (gdb) r
> > rank 0 in job 167 c1_32771 caused collective abort of all ranks
> > exit status of rank 0: killed by signal 9
> > c1::test(10:25am) #17%
> >
> > Thanks
> > Florin
> >
> > On 10/10/06, Ralph Butler <rbutler at mtsu.edu> wrote:
> > On TueOct 10, at Tue Oct 10 4:48PM, Florin Isaila wrote:
> >
> > > Hi,
> > >
> > > I am having a problem running mpiexec with gdb. I set a breakpoint
> > > at a program line, but the program wouldnt stop there in case an
> > > error occurs (o/w it stops normally). The error can be a
> > > segmentation fault or a call to MPI_Abort.
> > >
> > > This makes debugging impossible. Is the old style of starting each
> > > mpi process in a separate debugging session possible?
> >
> > I have tried running the pgm we see in your output in the same way
> > you show and have included the output below.
> > However, many folks prefer to use ddd like this:
> > mpiexec -n 2 ddd mpi_pgm
> >
> > This will launch 2 ddd windows on the desktop each running mpi_pgm.
> > It's pretty easy to do around 4 this way.
> >
> > > While merging the output of several debuggers is helpful in some
> > > cases, controlling each independent process is sometimes very
> > > important.
> > >
> > > Here the simplest example with a forced segmentation fault. The
> > > breakpoint at line 229 is ignored, even though the segmentation
> > > fault occurs after. The gdb is also quited, without making clear
> > > the source of error.
> > >
> > > stallion:~/tests/mpi/dtype % mpiexec -gdb -n 1 test
> > > 0: (gdb) l 204
> > >
> > > 0: 204 void test_dt() {
> > > 0: 205 int *i = 0;
> > > 0: 206 *i = 1;
> > > 0: 209}
> > >
> > > 0: (gdb) l 227
> > > 0: 227 int main(int argc, char* argv[]) {
> > > 0: 228 MPI_Init(&argc, &argv);
> > > 0: 229 test_dt();
> > > 0: 230 MPI_Finalize();
> > > 0: 231 return 0;
> > > 0: 232 }
> > >
> > > 0: (gdb) b 229
> > > 0: Breakpoint 2 at 0x8049f79: file test.c, line 229.
> > > 0: (gdb) r
> > > rank 0 in job 72 stallion.ece.northwestern.edu_42447 caused
> > > collective abort of all ranks
> > > exit status of rank 0: killed by signal 9
> > >
> > > Many thanks
> > > Florin
> >
> > My run of the pgm:
> >
> > (magpie:52) % mpiexec -gdb -n 1 temp
> > 0: (gdb) l
> > 0: 1 void test_dt() {
> > 0: 2 int *i = 0;
> > 0: 3 *i = 1;
> > 0: 4 }
> > 0: 5
> > 0: 6 int main(int argc, char* argv[]) {
> > 0: 7 MPI_Init(&argc, &argv);
> > 0: 8 test_dt();
> > 0: 9 MPI_Finalize();
> > 0: 10 return 0;
> > 0: (gdb) b 8
> > 0: Breakpoint 2 at 0x80495fe: file temp.c, line 8.
> > 0: (gdb) r
> > 0: Continuing.
> > 0:
> > 0: Breakpoint 2, main (argc=1, argv=0xbffff3b4) at temp.c:8
> > 0: 8 test_dt();
> > 0: (gdb) 0: (gdb) s
> > 0: test_dt () at temp.c:2
> > 0: 2 int *i = 0;
> > 0: (gdb) s
> > 0: 3 *i = 1;
> > 0: (gdb) p *i
> > 0: Cannot access memory at address 0x0
> > 0: (gdb) p i
> > 0: $1 = (int *) 0x0
> > 0: (gdb) c
> > 0: Continuing.
> > 0:
> > 0: Program received signal SIGSEGV, Segmentation fault.
> > 0: 0x080495d4 in test_dt () at temp.c:3
> > 0: 3 *i = 1;
> > 0: (gdb) where
> > 0: #0 0x080495d4 in test_dt () at temp.c:3
> > 0: #1 0x08049603 in main (argc=1, argv=0xbffff3b4) at temp.c:8
> > 0: (gdb) q
> > rank 0 in job 2 magpie_42682 caused collective abort of all ranks
> > exit status of rank 0: killed by signal 9
> > (magpie:53) %
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20061011/f3b89cc7/attachment.htm>
More information about the mpich2-dev
mailing list