[MPICH] Strange problem with MPICH2

Suman Kansakar suman.kansakar at gmail.com
Wed Sep 21 01:06:33 CDT 2005


I'm encountering a strange problem with my MPICH2 installation that
uses the PGI Fortran90 compiler. It was working fine until I tried to
run my MPI program yesterday after about 2 weeks break. This is what's
happening now. When I start the MPD daemons using mpdboot, it starts
up fine. I have tried both mpdtrace and mpdringtest to verify this.
However, as soon as I try to run another command, it gives me error
message. For example, mpdtrace the first time shows the nodes in the
ring, but then if I run mpdtrace again, it stops responding and
finally comes up with an error message 'mpdtrace (mpdtrace 41): got
eof on console'

This is on on Fedora Core 2 with modified 2.6 Linux kernel. The only
change that I'm aware of that has occurred since the last time it
worked and now is the 2.6 kernel. I'm inclined to point the problem to
the kernel, but I'd like to know what the error messages I'm
encountering mean so that I can trace it back to the kernel
modification that broke my MPICH2 installation.

The following is the sequence of commands I used and the respective
messages that I got.

124 @ 12:17AM on Sep,21:Wed <user at testbed56>
$ mpdboot -n 2

125 @ 12:17AM on Sep,21:Wed <user at testbed56>
$ mpdtrace -l
testbed56.ittc.ku.edu_32770
testbed57.ittc.ku.edu_32770

126 @ 12:17AM on Sep,21:Wed <user at testbed56>
$ mpdtrace -l
mpdtrace (mpdtrace 41): got eof on console

127 @ 12:17AM on Sep,21:Wed <user at testbed56>
< ~ >
$ mpdallexit
no msg recvd from mpd before timeout

Any idea/suggestion on what's wrong or what these messages mean?

Thank you,
Suman




More information about the mpich-discuss mailing list