[MPICH] Problem setting up MPICH between a 32 bit INTEL and a 32 bit AMD machine

Dave Goodell goodell at mcs.anl.gov
Wed Feb 20 10:28:26 CST 2008


Unfortunately, this state machine and the design goals behind it are  
not very well documented.  The only real documentation that I know of  
is a diagram of the FSM that I pieced together from reading the code  
during debugging: http://wiki.mcs.anl.gov/mpich2/index.php/ 
Sock_conn_protocol

It's not 100% complete, and it doesn't explain very much about the  
meaning behind any of the states or the code itself.  However, if you  
look at the code alongside this diagram, it should help you in trying  
to make sense of it.

Generally, states like LSEND and LRECV are the listen side of the  
connection, while CSEND and CRECV are the connect (initiating) side  
of the connection.

-Dave

On Feb 19, 2008, at 11:00 PM, Krishna Chaitanya wrote:

> Hi,
>        Just out of curiosity, though I am not trying to do anything  
> with the control signals that are exchanged during the progress  
> engine, I wish to know what exactly the LSEND , LRECV and the  
> like,are.
>
> Thanks,
> Krishna Chaitanya K
>
> On Feb 19, 2008 12:32 PM, Krishna Chaitanya <kris.c1986 at gmail.com>  
> wrote:
> Hi Dave,
>      Thanks for that. I was pretty much lost over the last couple of
> days. Will give it a fresh try again.
>      About the AMD machine. I should be able to have access to it in
> about 7-8 hours.
>
> Thanks,
> Krishna Chaitanya K
>
> On 2/19/08, Dave Goodell <goodell at mcs.anl.gov> wrote:
> > responses inline
> >
> > On Feb 18, 2008, at 10:35 PM, Krishna Chaitanya wrote:
> > > Sorry for the delay.
> > > >Can you ping from one to the other
> > >           Yes, I was able to ssh into the other machine and try
> > > mpdcheck and the rest. Will try to figure out what the problem is.
> >
> > Be sure that you actually perform a ping between the two hosts in
> > question.  If you ssh'd in from a third host to both of them, then
> > you don't have proof of proper routing between the two compute  
> nodes.
> >
> > > In the mean-time, I have been trying to understand the progress
> > > engine by tracing a standard blocking mode send/recv program, on
> > > one machine. ( by using mpdboot -n 1). What exactly are the .i
> > > files in the directory /mpid/common/sock/poll for?
> > > I noticed that a function like "MPIDU_Sock_post_readv" is at :
> > > 1) src/mpid/common/sock/iocp/sock.c, which includes functions like
> > > "WSARecv",which is a function to receive data from a socket in
> > > windows. ( I am working on a linux platform)
> > > 2)/mpich-src/src/mpid/common/sock/poll/sock_post.i.
> > >              Interestingly, I am not able to navigate through the
> > > macros and functions in this file,by using tags (Why? ) . So, I  
> can
> > > only see that we are playing around with pointers to update the
> > > pollinfo structure. Where is this structure defined? The .i file
> > > does not include any .h file. I tried "grep" on the main dir to
> > > locate the definition, it didnt return anything useful.
> > >              Can someone point me to a wiki article or any
> > > documentation that gives some info on the .i files?
> >
> > There are two implementations of the sock code: "iocp" is the  
> Windows
> > implementation and "poll" is the unix-style implementation.  Only  
> one
> > of the two directories will be used in any particular build.  In  
> your
> > case, the "poll" directory will be chosen.
> >
> > As for the *.i files, they confused me the first time that I saw
> > them.  If you look at src/mpid/common/sock/poll/sock.c:215-222  
> you'll
> > see that they are included via the C preprocessor.  I don't know the
> > rationale for this approach as the code was written before I joined
> > the project.  It is likely that your ctags program is not indexing
> > these *.i files because they don't end in *.h or *.c.  You can
> > probably convince it to index the *.i files as well with a
> > configuration file or some command-line switches (which will vary
> > among various ctags implementations).
> >
> > "struct pollinfo" is also defined in that same sock.c file.
> >
> > Hope that helps,
> > -Dave
> >
> > > Thanks,
> > > Krishna Chaitanya K
> > >
> > > On Feb 15, 2008 3:22 PM, Dave Goodell <goodell at mcs.anl.gov> wrote:
> > > What evidence do you have that the two machines are able to see  
> each
> > > other on the network?  Can you ping from one to the other (and  
> vice
> > > versa)?  What is the output of the 'route' command on each of the
> > > hosts?
> > >
> > > -Dave
> > >
> > > On Feb 14, 2008, at 10:30 PM, Krishna Chaitanya wrote:
> > >
> > > > Hi,
> > > >           Turns out that the settings in the /etc/hosts file  
> on the
> > > > AMD machine was incorrect. So, mpdcheck -v -f mpd.hosts gives  
> this :
> > > >
> > > > AMD machine : ( outwit )
> > > > kc at outwit:~$ mpdcheck -v -f mpd.hosts
> > > > obtaining hostname via gethostname and getfqdn
> > > > gethostname gives  outwit
> > > > getfqdn gives  outwit.nitk.ac.in
> > > > checking out unqualified hostname; make sure is not "localhost",
> > > etc.
> > > > checking out qualified hostname; make sure is not  
> "localhost", etc.
> > > > obtain IP addrs via qualified and unqualified hostnames;   
> make sure
> > > > other than 127.0.0.1
> > > > gethostbyname_ex:  ('outwit.nitk.ac.in', ['outwit'],
> > > ['172.16.54.54'])
> > > > gethostbyname_ex:  ('outwit.nitk.ac.in', ['outwit'],
> > > ['172.16.54.54'])
> > > > checking that IP addrs resolve to same host
> > > > now do some gethostbyaddr and gethostbyname_ex for machines in
> > > > hosts file
> > > > checking gethostbyXXX for unqualified zeus
> > > > gethostbyname_ex:  ('zeus', [], ['172.16.54.71'])
> > > > checking gethostbyXXX for qualified zeus
> > > > gethostbyname_ex:  ('zeus', [], ['172.16.54.71'])
> > > >
> > > >
> > > > INTEL machine ( zeus )
> > > > kris.c1986 at zeus ~]$ mpdcheck -v -f mpd.hosts
> > > > obtaining hostname via gethostname and getfqdn
> > > > gethostname gives  zeus
> > > > getfqdn gives  zeus.nitk.ac.in
> > > > checking out unqualified hostname; make sure is not "localhost",
> > > etc.
> > > > checking out qualified hostname; make sure is not  
> "localhost", etc.
> > > > obtain IP addrs via qualified and unqualified hostnames;   
> make sure
> > > > other than 127.0.0.1
> > > > gethostbyname_ex:  ('zeus.nitk.ac.in', ['zeus'],  
> ['172.16.54.71'])
> > > > gethostbyname_ex:  ('zeus.nitk.ac.in', ['zeus'],  
> ['172.16.54.71'])
> > > > checking that IP addrs resolve to same host
> > > > now do some gethostbyaddr and gethostbyname_ex for machines in
> > > > hosts file
> > > > checking gethostbyXXX for unqualified outwit
> > > > gethostbyname_ex:  ('outwit', [], ['172.16.54.54'])
> > > > checking gethostbyXXX for qualified outwit
> > > > gethostbyname_ex:  ('outwit', [], ['172.16.54.54'])
> > > >
> > > >                Seems to be ok. But I still get this error when I
> > > > try mpdcheck -c on the AMD comp :
> > > > kc at outwit:~$ mpdcheck -c zeus 33737
> > > > Traceback (most recent call last):
> > > >   File "/home/kc/mpich-install/bin/mpdcheck", line 103, in  
> <module>
> > > >     sock.connect((argv[argidx+1],int(argv[argidx+2])))  # note
> > > > double parens
> > > >   File "<string>", line 1, in connect
> > > > socket.error: (113, 'No route to host')
> > > >
> > > >
> > > >            The two machines are able to see each other on the
> > > > network. Cant exaplain why it complains that there is "No  
> route to
> > > > host"
> > > >
> > > > Krishna Chaitanya K
> > > >
> > > >
> > > > On Thu, Feb 14, 2008 at 2:50 PM, Rajeev Thakur  
> <thakur at mcs.anl.gov>
> > > > wrote:
> > > > The second test times out perhaps indicates that there might  
> be a
> > > > firewall on the AMD machine. See the section A.3 of the
> > > > installation guide.
> > > >
> > > > Rajeev
> > > >
> > > > From: Krishna Chaitanya [mailto:kris.c1986 at gmail.com]
> > > > Sent: Thursday, February 14, 2008 11:41 AM
> > > > To: Rajeev Thakur
> > > > Cc: mpich-discuss at mcs.anl.gov
> > > > Subject: Re: [MPICH] Problem setting up MPICH between a 32 bit
> > > > INTEL and a 32 bit AMD machine
> > > >
> > > > So, what is the error trying to convey? Googling for it, gave  
> this.
> > > > I have flushed the IPtables on both the machines and the  
> firewalls
> > > > are de-activated. Could you please elaborate on what kind of
> > > > settings I need to look into?
> > > >
> > > > Thanks,
> > > > Krishna Chaitanya K
> > > >
> > > > On Thu, Feb 14, 2008 at 10:58 PM, Rajeev Thakur
> > > > <thakur at mcs.anl.gov> wrote:
> > > > It should be possible. mpdcheck is a tool to diagnose whether  
> the
> > > > network configuration settings on the machines are ok or not,  
> and
> > > > whether a process on one machine can talk to a process on the
> > > > other. It looks like the settings need to be fixed in some way.
> > > >
> > > > Rajeev
> > > >
> > > > From: owner-mpich-discuss at mcs.anl.gov [mailto:owner-mpich-
> > > > discuss at mcs.anl.gov] On Behalf Of Krishna Chaitanya
> > > > Sent: Thursday, February 14, 2008 10:26 AM
> > > > To: mpich-discuss at mcs.anl.gov
> > > > Subject: [MPICH] Problem setting up MPICH between a 32 bit INTEL
> > > > and a 32 bit AMD machine
> > > >
> > > > Hi,
> > > >         In one of the previous posts, you had replied back  
> saying
> > > > MPICH cannot be put to use between a 32 bit INTEL machine and  
> a 64
> > > > bit AMD machine. Is it possible to do so between an INTEL and an
> > > > AMD machine, both of them being 32 bit processors?
> > > >         Anyway, on trying mpdcheck -f mpd.hosts on the 32 bit  
> AMD,
> > > > I am getting the following error :
> > > >    ipaddr via uqn (208.67.216.130) does not match via fqn
> > > > (208.69.32.130)
> > > >         And if I try the mpdcheck -s on the AMD node and  
> mpdcheck -
> > > > c on the INTEL node, the client times out. The test message gets
> > > > delivered with the client and server swapped.
> > > >
> > > > Thanks,
> > > > Krishna Chaitanya K
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > In the middle of difficulty, lies opportunity
> > > >
> > > >
> > > >
> > > > --
> > > > In the middle of difficulty, lies opportunity
> > > >
> > > >
> > > >
> > > > --
> > > > In the middle of difficulty, lies opportunity
> > >
> > >
> > >
> > >
> > > --
> > > In the middle of difficulty, lies opportunity
> >
> >
>
>
> --
> In the middle of difficulty, lies opportunity
>
>
>
> -- 
> In the middle of difficulty, lies opportunity




More information about the mpich-discuss mailing list