[MPICH] EOF from console
Ralph M. Butler
rbutler at mtsu.edu
Thu Jun 15 09:45:57 CDT 2006
Since contact is lost with the mpd, it is possible that either a
connection was lost or the mpd died for some reason. Although, you
would see it terminate if you were watching the window where you
started it. I also note there seems to be a problem with python on the
board5 as pointed out below.
> Date: Thu, 15 Jun 2006 14:25:59 +0100
> From: Matthew Fowler <tjue1 at central.susx.ac.uk>
> To: Ralph M. Butler <rbutler at mtsu.edu>
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] EOF from console
>
> Hello Ralph, thankyou for the suggestions
>
> The problem arrises when any board is added an already existing cluster.
> Actually say five was too specific. After experimenting for a while long I
> have found that sometimes the same output appears when adding a fourth to a
> cluster of three. I also thought that maybe the problem was with the fith
> board but as it happens the problem can be reproduced when board05 is the
> first to enter the ring, then 4, 3, 2, 1 (eof from console).
>
> Just to make sure that my setup is correct would you please look over the
> output from mpdcheck -v below:
>
> board01:
>
> # mpdcheck -v
> mpdcheck -v
> obtaining hostname via gethostname and getfqdn
> gethostname gives board01
> getfqdn gives board01
> checking out unqualified hostname; make sure is not "localhost", etc.
> checking out qualified hostname; make sure is not "localhost", etc.
> obtain IP addrs via qualified and unqualified hostnames; make sure other
> than 127.0.0.1
> gethostbyname_ex: ('board01', [], ['10.9.10.1'])
> gethostbyname_ex: ('board01', [], ['10.9.10.1'])
> checking that IP addrs resolve to same host
> now do some gethostbyaddr and gethostbyname_ex for machines in hosts file
> #
>
> board05:
>
> # mpdcheck -v
> mpdcheck -v
> Could not find platform dependent libraries <exec_prefix>
> Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
These 2 lines seem to indicate a problem with the installed python.
I am not sure if they could eventually lead to the problem.
> obtaining hostname via gethostname and getfqdn
> gethostname gives board05
> getfqdn gives board05
> checking out unqualified hostname; make sure is not "localhost", etc.
> checking out qualified hostname; make sure is not "localhost", etc.
> obtain IP addrs via qualified and unqualified hostnames; make sure other
> than 127.0.0.1
> gethostbyname_ex: ('board05', [], ['10.9.10.5'])
> gethostbyname_ex: ('board05', [], ['10.9.10.5'])
> checking that IP addrs resolve to same host
> now do some gethostbyaddr and gethostbyname_ex for machines in hosts file
> #
>
> Best Regards
>
> Matthew
>
>
>
> Ralph M. Butler wrote:
>
>> Hi Matthew:
>>
>> mpdtrace is one of the mpd "console" programs. It does not wait forever
>> for an
>> mpdp to respond. This error msg is telling us that there was no response
>> from
>> the mpd. mpdtrace does not know why of course. So, we have to assume that
>> the
>> mpd was (becaame) inaccessible for some reason.
>>
>> It is not clear if the problem is adding a fifth or the particular fifth
>> being added. I will assume that problem always arises adding in a host
>> named board05. Can you reproduce the problem by starting an mpd on
>> board01 and then immediately adding in board05? If this reproduces the
>> problem, it implies that there is some issue with board05 itself. In
>> that case, I would suggest using mpdcheck on board05 as discussed in the
>> Troubleshooting portion of the install manual.
>>
>> --ralph
>>
>> On Mon, 12 Jun 2006, tjue1 at sussex.ac.uk wrote:
>>
>>> Date: Mon, 12 Jun 2006 16:49:08 +0100
>>> From: tjue1 at sussex.ac.uk
>>> To: beowulf at beowulf.org, mpich-discuss at mcs.anl.gov
>>> Subject: [MPICH] EOF from console
>>>
>>> Hi list
>>>
>>> Im doing some experiments on an embedded platform and am building a
>>> Beowulf cluster from them. I have a unusual setup as the boards have
>>> limited memory and i am using MPICH 2 (latest). The setup is a bit
>>> strange as Python is accessable to the boards via an NFS mount.
>>>
>>> I can start an MPD daemon on a single board with no problems. I can
>>> also add a further three to the ring with no probs. Adding a fith
>>> causes an error. (see below)
>>>
>>> (im adding nodes manually rather than using mpdboot. When I get it
>>> working manually I will get mpdboot working.
>>>
>>> Heres the problem:
>>>
>>> (from first board)
>>>
>>> mpdtrace -l
>>> board01_2048 (10.9.10.1)
>>>
>>> I then add others into the ring as:
>>>
>>> mpd -h board01 -p 2048 &
>>>
>>> mpdtrace
>>> board02
>>> board01
>>>
>>> I can continue to add boards until I try and add a 5th. When adding a
>>> 5th using the above method I get:
>>>
>>> mpdtrace &
>>> mpdtrace (mpdtrace 57): got eof on console
>>> Jul 22 08:33:49 board05 python2.3: mpdtrace (mpdtrace 57): got eof on
>>> console
>>>
>>> I have to admit im baffled. Can anyone shed some light on this? If more
>>> specific information will help please tell me.
>>>
>>> Regards
>>>
>>> Matthew
>>
>
>
>
More information about the mpich-discuss
mailing list