[MPICH] Problem setting up MPICH between a 32 bit INTEL and a 32 bit AMD machine

Dave Goodell goodell at mcs.anl.gov
Tue Feb 19 11:02:36 CST 2008


responses inline

On Feb 18, 2008, at 10:35 PM, Krishna Chaitanya wrote:
> Sorry for the delay.
> >Can you ping from one to the other
>           Yes, I was able to ssh into the other machine and try  
> mpdcheck and the rest. Will try to figure out what the problem is.

Be sure that you actually perform a ping between the two hosts in  
question.  If you ssh'd in from a third host to both of them, then  
you don't have proof of proper routing between the two compute nodes.

> In the mean-time, I have been trying to understand the progress  
> engine by tracing a standard blocking mode send/recv program, on  
> one machine. ( by using mpdboot -n 1). What exactly are the .i  
> files in the directory /mpid/common/sock/poll for?
> I noticed that a function like "MPIDU_Sock_post_readv" is at :
> 1) src/mpid/common/sock/iocp/sock.c, which includes functions like  
> "WSARecv",which is a function to receive data from a socket in  
> windows. ( I am working on a linux platform)
> 2)/mpich-src/src/mpid/common/sock/poll/sock_post.i.
>              Interestingly, I am not able to navigate through the  
> macros and functions in this file,by using tags (Why? ) . So, I can  
> only see that we are playing around with pointers to update the  
> pollinfo structure. Where is this structure defined? The .i file  
> does not include any .h file. I tried "grep" on the main dir to  
> locate the definition, it didnt return anything useful.
>              Can someone point me to a wiki article or any  
> documentation that gives some info on the .i files?

There are two implementations of the sock code: "iocp" is the Windows  
implementation and "poll" is the unix-style implementation.  Only one  
of the two directories will be used in any particular build.  In your  
case, the "poll" directory will be chosen.

As for the *.i files, they confused me the first time that I saw  
them.  If you look at src/mpid/common/sock/poll/sock.c:215-222 you'll  
see that they are included via the C preprocessor.  I don't know the  
rationale for this approach as the code was written before I joined  
the project.  It is likely that your ctags program is not indexing  
these *.i files because they don't end in *.h or *.c.  You can  
probably convince it to index the *.i files as well with a  
configuration file or some command-line switches (which will vary  
among various ctags implementations).

"struct pollinfo" is also defined in that same sock.c file.

Hope that helps,
-Dave

> Thanks,
> Krishna Chaitanya K
>
> On Feb 15, 2008 3:22 PM, Dave Goodell <goodell at mcs.anl.gov> wrote:
> What evidence do you have that the two machines are able to see each
> other on the network?  Can you ping from one to the other (and vice
> versa)?  What is the output of the 'route' command on each of the  
> hosts?
>
> -Dave
>
> On Feb 14, 2008, at 10:30 PM, Krishna Chaitanya wrote:
>
> > Hi,
> >           Turns out that the settings in the /etc/hosts file on the
> > AMD machine was incorrect. So, mpdcheck -v -f mpd.hosts gives this :
> >
> > AMD machine : ( outwit )
> > kc at outwit:~$ mpdcheck -v -f mpd.hosts
> > obtaining hostname via gethostname and getfqdn
> > gethostname gives  outwit
> > getfqdn gives  outwit.nitk.ac.in
> > checking out unqualified hostname; make sure is not "localhost",  
> etc.
> > checking out qualified hostname; make sure is not "localhost", etc.
> > obtain IP addrs via qualified and unqualified hostnames;  make sure
> > other than 127.0.0.1
> > gethostbyname_ex:  ('outwit.nitk.ac.in', ['outwit'],  
> ['172.16.54.54'])
> > gethostbyname_ex:  ('outwit.nitk.ac.in', ['outwit'],  
> ['172.16.54.54'])
> > checking that IP addrs resolve to same host
> > now do some gethostbyaddr and gethostbyname_ex for machines in
> > hosts file
> > checking gethostbyXXX for unqualified zeus
> > gethostbyname_ex:  ('zeus', [], ['172.16.54.71'])
> > checking gethostbyXXX for qualified zeus
> > gethostbyname_ex:  ('zeus', [], ['172.16.54.71'])
> >
> >
> > INTEL machine ( zeus )
> > kris.c1986 at zeus ~]$ mpdcheck -v -f mpd.hosts
> > obtaining hostname via gethostname and getfqdn
> > gethostname gives  zeus
> > getfqdn gives  zeus.nitk.ac.in
> > checking out unqualified hostname; make sure is not "localhost",  
> etc.
> > checking out qualified hostname; make sure is not "localhost", etc.
> > obtain IP addrs via qualified and unqualified hostnames;  make sure
> > other than 127.0.0.1
> > gethostbyname_ex:  ('zeus.nitk.ac.in', ['zeus'], ['172.16.54.71'])
> > gethostbyname_ex:  ('zeus.nitk.ac.in', ['zeus'], ['172.16.54.71'])
> > checking that IP addrs resolve to same host
> > now do some gethostbyaddr and gethostbyname_ex for machines in
> > hosts file
> > checking gethostbyXXX for unqualified outwit
> > gethostbyname_ex:  ('outwit', [], ['172.16.54.54'])
> > checking gethostbyXXX for qualified outwit
> > gethostbyname_ex:  ('outwit', [], ['172.16.54.54'])
> >
> >                Seems to be ok. But I still get this error when I
> > try mpdcheck -c on the AMD comp :
> > kc at outwit:~$ mpdcheck -c zeus 33737
> > Traceback (most recent call last):
> >   File "/home/kc/mpich-install/bin/mpdcheck", line 103, in <module>
> >     sock.connect((argv[argidx+1],int(argv[argidx+2])))  # note
> > double parens
> >   File "<string>", line 1, in connect
> > socket.error: (113, 'No route to host')
> >
> >
> >            The two machines are able to see each other on the
> > network. Cant exaplain why it complains that there is "No route to
> > host"
> >
> > Krishna Chaitanya K
> >
> >
> > On Thu, Feb 14, 2008 at 2:50 PM, Rajeev Thakur <thakur at mcs.anl.gov>
> > wrote:
> > The second test times out perhaps indicates that there might be a
> > firewall on the AMD machine. See the section A.3 of the
> > installation guide.
> >
> > Rajeev
> >
> > From: Krishna Chaitanya [mailto:kris.c1986 at gmail.com]
> > Sent: Thursday, February 14, 2008 11:41 AM
> > To: Rajeev Thakur
> > Cc: mpich-discuss at mcs.anl.gov
> > Subject: Re: [MPICH] Problem setting up MPICH between a 32 bit
> > INTEL and a 32 bit AMD machine
> >
> > So, what is the error trying to convey? Googling for it, gave this.
> > I have flushed the IPtables on both the machines and the firewalls
> > are de-activated. Could you please elaborate on what kind of
> > settings I need to look into?
> >
> > Thanks,
> > Krishna Chaitanya K
> >
> > On Thu, Feb 14, 2008 at 10:58 PM, Rajeev Thakur
> > <thakur at mcs.anl.gov> wrote:
> > It should be possible. mpdcheck is a tool to diagnose whether the
> > network configuration settings on the machines are ok or not, and
> > whether a process on one machine can talk to a process on the
> > other. It looks like the settings need to be fixed in some way.
> >
> > Rajeev
> >
> > From: owner-mpich-discuss at mcs.anl.gov [mailto:owner-mpich-
> > discuss at mcs.anl.gov] On Behalf Of Krishna Chaitanya
> > Sent: Thursday, February 14, 2008 10:26 AM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: [MPICH] Problem setting up MPICH between a 32 bit INTEL
> > and a 32 bit AMD machine
> >
> > Hi,
> >         In one of the previous posts, you had replied back saying
> > MPICH cannot be put to use between a 32 bit INTEL machine and a 64
> > bit AMD machine. Is it possible to do so between an INTEL and an
> > AMD machine, both of them being 32 bit processors?
> >         Anyway, on trying mpdcheck -f mpd.hosts on the 32 bit AMD,
> > I am getting the following error :
> >    ipaddr via uqn (208.67.216.130) does not match via fqn
> > (208.69.32.130)
> >         And if I try the mpdcheck -s on the AMD node and mpdcheck -
> > c on the INTEL node, the client times out. The test message gets
> > delivered with the client and server swapped.
> >
> > Thanks,
> > Krishna Chaitanya K
> >
> >
> >
> >
> >
> > --
> > In the middle of difficulty, lies opportunity
> >
> >
> >
> > --
> > In the middle of difficulty, lies opportunity
> >
> >
> >
> > --
> > In the middle of difficulty, lies opportunity
>
>
>
>
> -- 
> In the middle of difficulty, lies opportunity




More information about the mpich-discuss mailing list