[mpich-discuss] FW: problems with mpdboot
Ralph Butler
rbutler at mtsu.edu
Tue Apr 7 13:08:49 CDT 2009
You might also try this option to mpdboot:
--maxbranch=1
On Apr 7, 2009, at 1:05 PM, bjday wrote:
> Rajeev,
>
> Yes you are correct I can build a ring by hand but not by using
> mpdboot. Once I build a ring by hand i can run mpiexec hostname and
> it works, see below. I installed using the latest download that is
> on the website. In my research before contacting the forums i found
> this website. I dont know if this helps. http://ubuntuforums.org/showthread.php?t=1016984
> it has to do with setting LD_LIBRARY_PATH and python, but I used
> CenoOS's add remove programs so I never touched the package. I will
> try to reinstall MPICH2 on both computer just in case some how
> different versions were installed. Any other suggestions or help
> would be great.
>
> Thank you,
> Brian
> % mpiexec -l -n 30 /bin/hostname
> 2: c4labpc19.csee.usf.edu
> 3: c4labpc12.csee.usf.edu
> 1: c4labpc12.csee.usf.edu
> 4: c4labpc19.csee.usf.edu
> 5: c4labpc12.csee.usf.edu
> 6: c4labpc19.csee.usf.edu
> 7: c4labpc12.csee.usf.edu
> 9: c4labpc12.csee.usf.edu
> 8: c4labpc19.csee.usf.edu
> 11: c4labpc12.csee.usf.edu
> 12: c4labpc19.csee.usf.edu
> 13: c4labpc12.csee.usf.edu
> 15: c4labpc12.csee.usf.edu
> 14: c4labpc19.csee.usf.edu
> 10: c4labpc19.csee.usf.edu
> 17: c4labpc12.csee.usf.edu
> 16: c4labpc19.csee.usf.edu
> 19: c4labpc12.csee.usf.edu
> 21: c4labpc12.csee.usf.edu
> 22: c4labpc19.csee.usf.edu
> 23: c4labpc12.csee.usf.edu
> 20: c4labpc19.csee.usf.edu
> 18: c4labpc19.csee.usf.edu
> 24: c4labpc19.csee.usf.edu
> 25: c4labpc12.csee.usf.edu
> 27: c4labpc12.csee.usf.edu
> 29: c4labpc12.csee.usf.edu
> 0: c4labpc19.csee.usf.edu
> 28: c4labpc19.csee.usf.edu
> 26: c4labpc19.csee.usf.edu
> %
>
>
> Rajeev Thakur wrote:
>>
>> -----Original Message-----
>> From: Ralph Butler [mailto:rbutler at mtsu.edu] Sent: Tuesday, April
>> 07, 2009 12:06 PM
>> To: Rajeev Thakur
>> Subject: Re: [mpich-discuss] problems with mpdboot
>>
>> I can not reproduce it of course. He seems to indicate that he can
>> build a
>> ring by hand, but does not say that it is usable with mpiexec to run
>> something like hostname. If he can do that and it still fails, I
>> am at a
>> loss as to what the problem can be. I ran into this one time when
>> the
>> mpd.py and mpdboot.py happened to be from different releases of
>> mpich2, but
>> seriously doubt that is his problem.
>>
>> On TueApr 7, at Tue Apr 7 11:28AM, Rajeev Thakur wrote:
>>
>>
>>> Ralph, any comments?
>>>
>>> Rajeev
>>>
>>> -----Original Message-----
>>> From: mpich-discuss-bounces at mcs.anl.gov [mailto:mpich-discuss-bounces at mcs.anl.gov
>>> ] On Behalf Of bjday
>>> Sent: Tuesday, April 07, 2009 10:25 AM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] problems with mpdboot
>>>
>>> I tried mpdcheck as instructed in the Troubleshooting in the
>>> instillation guide again and the client (pc12) successfully recved
>>> ack form server. The server (pc19) has conn from the client and
>>> successfully recvd msg from client.
>>>
>>> I also tryed the ssh command and received
>>> "c4labpc19.csee.usf.edu" as a
>>> response.
>>>
>>> Once mpd is started on the master i can connect the slaves once i
>>> get the port number from the server. I can also run $mpdboot -n 1
>>> and the master node will be the only output when mpdtrace is ran.
>>> The error is when n>1, when trying to remotely start eh slave nodes.
>>>
>>> thank you,
>>> Brian
>>>
>>> Pavan Balaji wrote:
>>>
>>>> Can you try mpdcheck to make sure there are no network
>>>> infrastructure issues (e.g., firewalls or errors in /etc/hosts)?
>>>>
>>>> Another quick check is to make sure each host can ssh to another
>>>> host with the name given in the host file. For example, try:
>>>>
>>>> $ ssh c4labpc12.csee.usf.edu -t "ssh c4labpc19.csee.usf.edu
>>>> hostname"
>>>>
>>>> -- Pavan
>>>>
>>>> bjday wrote:
>>>>
>>>>> Pavan,
>>>>>
>>>>> Yes the names returned by "hostname" and the names in mpd.hosts
>>>>> are the fully qualified names.
>>>>>
>>>>> Thank you,
>>>>> Brian
>>>>>
>>>>>
>>>>> Pavan Balaji wrote:
>>>>>
>>>>>> Check if your host file contains the same name as what is
>>>>>> returned by the "hostname" command (e.g., "foo" is different
>>>>>> from "foo.domain.edu"). Otherwise, mpd can't find the local
>>>>>> hostname in your host file.
>>>>>>
>>>>>> -- Pavan
>>>>>>
>>>>>> bjday wrote:
>>>>>>
>>>>>>> Hello MPICH2 Gurus
>>>>>>>
>>>>>>> I am installing MPICH2 on some lab computers at the request of
>>>>>>> a professor. I have ran into a during testing. When i run
>>>>>>> mpdboot I receive this error
>>>>>>>
>>>>>>> mpdboot -n 2 -f mpd.hosts -v -d
>>>>>>> debug: starting
>>>>>>> running mpdallexit on c4labpc19.csee.usf.edu LAUNCHED mpd on
>>>>>>> c4labpc19.csee.usf.edu via
>>>>>>> debug: launch cmd= /usr/local/mpich2/bin/mpd.py --ncpus=1 -e
>>>>>>> -d
>>>>>>> debug: mpd on c4labpc19.csee.usf.edu on port 37116
>>>>>>> RUNNING: mpd on c4labpc19.csee.usf.edu
>>>>>>> debug: info for running mpd: {'ncpus': 1, 'list_port': 37116,
>>>>>>> 'entry_port': '', 'host': 'c4labpc19.csee.usf.edu',
>>>>>>> 'entry_host':
>>>>>>> '', 'ifhn': ''}
>>>>>>> LAUNCHED mpd on c4labpc12.csee.usf.edu via
>>>>>>> c4labpc19.csee.usf.edu
>>>>>>> debug: launch cmd= ssh -x -n -q c4labpc12.csee.usf.edu '/usr/
>>>>>>> local/mpich2/bin/mpd.py -h c4labpc19.csee.usf.edu -p 37116
>>>>>>> --ncpus=1 -e -d'
>>>>>>> debug: mpd on c4labpc12.csee.usf.edu on port no_port
>>>>>>> mpdboot_c4labpc19.csee.usf.edu (handle_mpd_output 406): from
>>>>>>> mpd on c4labpc12.csee.usf.edu, invalid port info:
>>>>>>> no_port
>>>>>>>
>>>>>>> I have seen this in the forums but there was not a resolution
>>>>>>> posted. I have gone through the trouble shooting in the
>>>>>>> install guide and i can complete until step 7 where mpdboot is
>>>>>>> used.. I can start mpd on the master, get the port, then
>>>>>>> connect the slave computers by specifying the master name and
>>>>>>> port number. Any ideas why pc12 is reporting no port?
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Brian
>>>>>>>
>>>
>>
>>
>
More information about the mpich-discuss
mailing list