[MPICH] Mpich2 mpd ping one side working only

Rusty Lusk lusk at mcs.anl.gov
Sun Feb 4 12:10:23 CST 2007


When you are having this kind of trouble, it is nearly always because  
of configuration problems with your systems, not with
MPICH2.   The manuals describe how you should debug your system  
before using mpdboot.   In particular, with small clusters like this  
it is recommended that you start the mpd's one at a time "by hand",  
telling each one (except the first) where to contact the existing ring.
If there are difficulties at this step, you should use the program  
mpdcheck, supplied with MPICH2, to test each pair of machines, using  
on each both server and client mode.  mpdcheck is designed to help  
diagnose and suggest fixes for machine configurations that prevent  
one machine from connecting to another.

Regards,
Rusty Lusk

On Feb 3, 2007, at 5:17 PM, Luiz Mendes wrote:

> Hi all,
>
> I have been analysing a problem about comunication between mpd  
> processes on several PCS.
>
> I want to know, if mpdboot, at launch time, requests comunications  
> between processes in a ciclic manner.
>
> Other doubt is, i have the following pcs:
>
> fisio1
> fisio2
> fisio3
>
> I have Mpich2 installed in a shared folder located at fisio1.
>
> Well, other pcs have this folder mounted on it folder structures.  
> If i configure MPDBOOT -n 2 -f hostfile, it works for the following  
> pairs
>
> fisio1 and fisio 2
> fisio1 and fisio 3
>
> And when i execute mpdboot from fisio2 for example, it returns for  
> me the following message:
>
> mpdboot_fisio2 (handle_mpd_output 374): failed to ping mpd on  
> fisio1; recvd output={}
>
> I have access from fisio2 to fisio1, when i give the following  
> command: ssh fisio1 date, from fisio2 it works ok.
>
> So why it is happening, do you know something that could cause this?
>
> The strange thing is that Mpich1 it works perfectly with all nodes.
>
> Thanks
> Luiz Mendes




More information about the mpich-discuss mailing list