[MPICH] Strange problem with MPICH2

Rajeev Thakur thakur at mcs.anl.gov
Mon Sep 26 14:04:02 CDT 2005


-----Original Message-----
From: Ralph M. Butler [mailto:rbutler at mtsu.edu] 
Sent: Monday, September 26, 2005 2:01 PM
Subject: Re: [MPICH] Strange problem with MPICH2

My guess is that when you upgraded the kernel, something about the
configuration changed.  But, I can not guess what.  I am attaching the
new mpdcheck and new install manual since it may be a bit more complete
than the one from the distro.  Hopefully, the "Troubleshooting MPDs"
section (appendix) of the updated manual may shed some light.

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Suman Kansakar
> Sent: Wednesday, September 21, 2005 1:07 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] Strange problem with MPICH2
>
> I'm encountering a strange problem with my MPICH2 installation that
> uses the PGI Fortran90 compiler. It was working fine until I tried to
> run my MPI program yesterday after about 2 weeks break. This is what's
> happening now. When I start the MPD daemons using mpdboot, it starts
> up fine. I have tried both mpdtrace and mpdringtest to verify this.
> However, as soon as I try to run another command, it gives me error
> message. For example, mpdtrace the first time shows the nodes in the
> ring, but then if I run mpdtrace again, it stops responding and
> finally comes up with an error message 'mpdtrace (mpdtrace 41): got
> eof on console'
>
> This is on on Fedora Core 2 with modified 2.6 Linux kernel. The only
> change that I'm aware of that has occurred since the last time it
> worked and now is the 2.6 kernel. I'm inclined to point the problem to
> the kernel, but I'd like to know what the error messages I'm
> encountering mean so that I can trace it back to the kernel
> modification that broke my MPICH2 installation.
>
> The following is the sequence of commands I used and the respective
> messages that I got.
>
> 124 @ 12:17AM on Sep,21:Wed <user at testbed56>
> $ mpdboot -n 2
>
> 125 @ 12:17AM on Sep,21:Wed <user at testbed56>
> $ mpdtrace -l
> testbed56.ittc.ku.edu_32770
> testbed57.ittc.ku.edu_32770
>
> 126 @ 12:17AM on Sep,21:Wed <user at testbed56>
> $ mpdtrace -l
> mpdtrace (mpdtrace 41): got eof on console
>
> 127 @ 12:17AM on Sep,21:Wed <user at testbed56>
> < ~ >
> $ mpdallexit
> no msg recvd from mpd before timeout
>
> Any idea/suggestion on what's wrong or what these messages mean?
>
> Thank you,
> Suman
>
>
>
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mpdcheck.py
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20050926/6079ecba/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: install.pdf
Type: application/pdf
Size: 216815 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20050926/6079ecba/attachment.pdf>


More information about the mpich-discuss mailing list