[MPICH] EOF from console

Ralph M. Butler rbutler at mtsu.edu
Thu Jun 15 09:45:57 CDT 2006


Since contact is lost with the mpd, it is possible that either a
connection was lost or the mpd died for some reason.  Although, you
would see it  terminate if you were watching the window where you
started it.  I also note there seems to be a problem with python on the
board5 as pointed out below.

> Date: Thu, 15 Jun 2006 14:25:59 +0100
> From: Matthew Fowler <tjue1 at central.susx.ac.uk>
> To: Ralph M. Butler <rbutler at mtsu.edu>
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] EOF from console
> 
> Hello Ralph, thankyou for the suggestions
>
> The problem arrises when any board is added an already existing cluster. 
> Actually say five was too specific. After experimenting for a while long I 
> have found that sometimes the same output appears when adding a fourth to a 
> cluster of three. I also thought that maybe the problem was with the fith 
> board but as it happens the problem can be reproduced when board05 is the 
> first to enter the ring, then 4, 3, 2, 1 (eof from console).
>
> Just to make sure that my setup is correct would you please look over the 
> output from mpdcheck -v below:
>
> board01:
>
> # mpdcheck -v
> mpdcheck -v
> obtaining hostname via gethostname and getfqdn
> gethostname gives  board01
> getfqdn gives  board01
> checking out unqualified hostname; make sure is not "localhost", etc.
> checking out qualified hostname; make sure is not "localhost", etc.
> obtain IP addrs via qualified and unqualified hostnames;  make sure other 
> than 127.0.0.1
> gethostbyname_ex:  ('board01', [], ['10.9.10.1'])
> gethostbyname_ex:  ('board01', [], ['10.9.10.1'])
> checking that IP addrs resolve to same host
> now do some gethostbyaddr and gethostbyname_ex for machines in hosts file
> #
>
> board05:
>
> # mpdcheck -v
> mpdcheck -v
> Could not find platform dependent libraries <exec_prefix>
> Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]

These 2 lines seem to indicate a problem with the installed python.
I am not sure if they could eventually lead to the problem.

> obtaining hostname via gethostname and getfqdn
> gethostname gives  board05
> getfqdn gives  board05
> checking out unqualified hostname; make sure is not "localhost", etc.
> checking out qualified hostname; make sure is not "localhost", etc.
> obtain IP addrs via qualified and unqualified hostnames;  make sure other 
> than 127.0.0.1
> gethostbyname_ex:  ('board05', [], ['10.9.10.5'])
> gethostbyname_ex:  ('board05', [], ['10.9.10.5'])
> checking that IP addrs resolve to same host
> now do some gethostbyaddr and gethostbyname_ex for machines in hosts file
> #
>
> Best Regards
>
> Matthew
>
>
>
> Ralph M. Butler wrote:
>
>> Hi Matthew:
>> 
>> mpdtrace is one of the mpd "console" programs.  It does not wait forever 
>> for an
>> mpdp to respond.  This error msg is telling us that there was no response 
>> from
>> the mpd.  mpdtrace does not know why of course.  So, we have to assume that 
>> the
>> mpd was (becaame) inaccessible for some reason.
>> 
>> It is not clear if the problem is adding a fifth or the particular fifth
>> being added.  I will assume that problem always arises adding in a host
>> named board05.  Can you reproduce the problem by starting an mpd on
>> board01 and then immediately adding in board05?  If this reproduces the
>> problem, it implies that there is some issue with board05 itself. In
>> that case, I would suggest using mpdcheck on board05 as discussed in the
>> Troubleshooting portion of the install manual.
>> 
>> --ralph
>> 
>> On Mon, 12 Jun 2006, tjue1 at sussex.ac.uk wrote:
>> 
>>> Date: Mon, 12 Jun 2006 16:49:08 +0100
>>> From: tjue1 at sussex.ac.uk
>>> To: beowulf at beowulf.org, mpich-discuss at mcs.anl.gov
>>> Subject: [MPICH] EOF from console
>>> 
>>> Hi list
>>> 
>>> Im doing some experiments on an embedded platform and am building a
>>> Beowulf cluster from them. I have a unusual setup as the boards have
>>> limited memory and i am using MPICH 2 (latest). The setup is a bit
>>> strange as Python is accessable to the boards via an NFS mount.
>>> 
>>> I can start an MPD daemon on a single board with no problems. I can
>>> also add a further three to the ring with no probs. Adding a fith
>>> causes an error. (see below)
>>> 
>>> (im adding nodes manually rather than using mpdboot. When I get it
>>> working manually I will get mpdboot working.
>>> 
>>> Heres the problem:
>>> 
>>> (from first board)
>>> 
>>> mpdtrace -l
>>> board01_2048 (10.9.10.1)
>>> 
>>> I then add others into the ring as:
>>> 
>>> mpd -h board01 -p 2048 &
>>> 
>>> mpdtrace
>>> board02
>>> board01
>>> 
>>> I can continue to add boards until I try and add a 5th. When adding a
>>> 5th using the above method I get:
>>> 
>>> mpdtrace &
>>> mpdtrace (mpdtrace 57): got eof on console
>>> Jul 22 08:33:49 board05 python2.3: mpdtrace (mpdtrace 57): got eof on
>>> console
>>> 
>>> I have to admit im baffled. Can anyone shed some light on this? If more
>>> specific information will help please tell me.
>>> 
>>> Regards
>>> 
>>> Matthew
>> 
>
>
>




More information about the mpich-discuss mailing list