Hi,<br> Just out of curiosity, though I am not trying to do anything with the control signals that are exchanged during the progress engine, I wish to know what exactly the LSEND , LRECV and the like,are. <br><br>Thanks,<br>
Krishna Chaitanya K <br><br><div class="gmail_quote">On Feb 19, 2008 12:32 PM, Krishna Chaitanya <<a href="mailto:kris.c1986@gmail.com">kris.c1986@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi Dave,<br> Thanks for that. I was pretty much lost over the last couple of<br>days. Will give it a fresh try again.<br> About the AMD machine. I should be able to have access to it in<br>about 7-8 hours.<br><br>
Thanks,<br><font color="#888888">Krishna Chaitanya K<br></font><div><div></div><div class="Wj3C7c"><br>On 2/19/08, Dave Goodell <<a href="mailto:goodell@mcs.anl.gov">goodell@mcs.anl.gov</a>> wrote:<br>> responses inline<br>
><br>> On Feb 18, 2008, at 10:35 PM, Krishna Chaitanya wrote:<br>> > Sorry for the delay.<br>> > >Can you ping from one to the other<br>> > Yes, I was able to ssh into the other machine and try<br>
> > mpdcheck and the rest. Will try to figure out what the problem is.<br>><br>> Be sure that you actually perform a ping between the two hosts in<br>> question. If you ssh'd in from a third host to both of them, then<br>
> you don't have proof of proper routing between the two compute nodes.<br>><br>> > In the mean-time, I have been trying to understand the progress<br>> > engine by tracing a standard blocking mode send/recv program, on<br>
> > one machine. ( by using mpdboot -n 1). What exactly are the .i<br>> > files in the directory /mpid/common/sock/poll for?<br>> > I noticed that a function like "MPIDU_Sock_post_readv" is at :<br>
> > 1) src/mpid/common/sock/iocp/sock.c, which includes functions like<br>> > "WSARecv",which is a function to receive data from a socket in<br>> > windows. ( I am working on a linux platform)<br>
> > 2)/mpich-src/src/mpid/common/sock/poll/sock_post.i.<br>> > Interestingly, I am not able to navigate through the<br>> > macros and functions in this file,by using tags (Why? ) . So, I can<br>
> > only see that we are playing around with pointers to update the<br>> > pollinfo structure. Where is this structure defined? The .i file<br>> > does not include any .h file. I tried "grep" on the main dir to<br>
> > locate the definition, it didnt return anything useful.<br>> > Can someone point me to a wiki article or any<br>> > documentation that gives some info on the .i files?<br>><br>> There are two implementations of the sock code: "iocp" is the Windows<br>
> implementation and "poll" is the unix-style implementation. Only one<br>> of the two directories will be used in any particular build. In your<br>> case, the "poll" directory will be chosen.<br>
><br>> As for the *.i files, they confused me the first time that I saw<br>> them. If you look at src/mpid/common/sock/poll/sock.c:215-222 you'll<br>> see that they are included via the C preprocessor. I don't know the<br>
> rationale for this approach as the code was written before I joined<br>> the project. It is likely that your ctags program is not indexing<br>> these *.i files because they don't end in *.h or *.c. You can<br>
> probably convince it to index the *.i files as well with a<br>> configuration file or some command-line switches (which will vary<br>> among various ctags implementations).<br>><br>> "struct pollinfo" is also defined in that same sock.c file.<br>
><br>> Hope that helps,<br>> -Dave<br>><br>> > Thanks,<br>> > Krishna Chaitanya K<br>> ><br>> > On Feb 15, 2008 3:22 PM, Dave Goodell <<a href="mailto:goodell@mcs.anl.gov">goodell@mcs.anl.gov</a>> wrote:<br>
> > What evidence do you have that the two machines are able to see each<br>> > other on the network? Can you ping from one to the other (and vice<br>> > versa)? What is the output of the 'route' command on each of the<br>
> > hosts?<br>> ><br>> > -Dave<br>> ><br>> > On Feb 14, 2008, at 10:30 PM, Krishna Chaitanya wrote:<br>> ><br>> > > Hi,<br>> > > Turns out that the settings in the /etc/hosts file on the<br>
> > > AMD machine was incorrect. So, mpdcheck -v -f mpd.hosts gives this :<br>> > ><br>> > > AMD machine : ( outwit )<br>> > > kc@outwit:~$ mpdcheck -v -f mpd.hosts<br>> > > obtaining hostname via gethostname and getfqdn<br>
> > > gethostname gives outwit<br>> > > getfqdn gives <a href="http://outwit.nitk.ac.in" target="_blank">outwit.nitk.ac.in</a><br>> > > checking out unqualified hostname; make sure is not "localhost",<br>
> > etc.<br>> > > checking out qualified hostname; make sure is not "localhost", etc.<br>> > > obtain IP addrs via qualified and unqualified hostnames; make sure<br>> > > other than <a href="http://127.0.0.1" target="_blank">127.0.0.1</a><br>
> > > gethostbyname_ex: ('<a href="http://outwit.nitk.ac.in" target="_blank">outwit.nitk.ac.in</a>', ['outwit'],<br>> > ['<a href="http://172.16.54.54" target="_blank">172.16.54.54</a>'])<br>
> > > gethostbyname_ex: ('<a href="http://outwit.nitk.ac.in" target="_blank">outwit.nitk.ac.in</a>', ['outwit'],<br>> > ['<a href="http://172.16.54.54" target="_blank">172.16.54.54</a>'])<br>
> > > checking that IP addrs resolve to same host<br>> > > now do some gethostbyaddr and gethostbyname_ex for machines in<br>> > > hosts file<br>> > > checking gethostbyXXX for unqualified zeus<br>
> > > gethostbyname_ex: ('zeus', [], ['<a href="http://172.16.54.71" target="_blank">172.16.54.71</a>'])<br>> > > checking gethostbyXXX for qualified zeus<br>> > > gethostbyname_ex: ('zeus', [], ['<a href="http://172.16.54.71" target="_blank">172.16.54.71</a>'])<br>
> > ><br>> > ><br>> > > INTEL machine ( zeus )<br>> > > kris.c1986@zeus ~]$ mpdcheck -v -f mpd.hosts<br>> > > obtaining hostname via gethostname and getfqdn<br>> > > gethostname gives zeus<br>
> > > getfqdn gives <a href="http://zeus.nitk.ac.in" target="_blank">zeus.nitk.ac.in</a><br>> > > checking out unqualified hostname; make sure is not "localhost",<br>> > etc.<br>> > > checking out qualified hostname; make sure is not "localhost", etc.<br>
> > > obtain IP addrs via qualified and unqualified hostnames; make sure<br>> > > other than <a href="http://127.0.0.1" target="_blank">127.0.0.1</a><br>> > > gethostbyname_ex: ('<a href="http://zeus.nitk.ac.in" target="_blank">zeus.nitk.ac.in</a>', ['zeus'], ['<a href="http://172.16.54.71" target="_blank">172.16.54.71</a>'])<br>
> > > gethostbyname_ex: ('<a href="http://zeus.nitk.ac.in" target="_blank">zeus.nitk.ac.in</a>', ['zeus'], ['<a href="http://172.16.54.71" target="_blank">172.16.54.71</a>'])<br>> > > checking that IP addrs resolve to same host<br>
> > > now do some gethostbyaddr and gethostbyname_ex for machines in<br>> > > hosts file<br>> > > checking gethostbyXXX for unqualified outwit<br>> > > gethostbyname_ex: ('outwit', [], ['<a href="http://172.16.54.54" target="_blank">172.16.54.54</a>'])<br>
> > > checking gethostbyXXX for qualified outwit<br>> > > gethostbyname_ex: ('outwit', [], ['<a href="http://172.16.54.54" target="_blank">172.16.54.54</a>'])<br>> > ><br>> > > Seems to be ok. But I still get this error when I<br>
> > > try mpdcheck -c on the AMD comp :<br>> > > kc@outwit:~$ mpdcheck -c zeus 33737<br>> > > Traceback (most recent call last):<br>> > > File "/home/kc/mpich-install/bin/mpdcheck", line 103, in <module><br>
> > > sock.connect((argv[argidx+1],int(argv[argidx+2]))) # note<br>> > > double parens<br>> > > File "<string>", line 1, in connect<br>> > > socket.error: (113, 'No route to host')<br>
> > ><br>> > ><br>> > > The two machines are able to see each other on the<br>> > > network. Cant exaplain why it complains that there is "No route to<br>> > > host"<br>
> > ><br>> > > Krishna Chaitanya K<br>> > ><br>> > ><br>> > > On Thu, Feb 14, 2008 at 2:50 PM, Rajeev Thakur <<a href="mailto:thakur@mcs.anl.gov">thakur@mcs.anl.gov</a>><br>
> > > wrote:<br>> > > The second test times out perhaps indicates that there might be a<br>> > > firewall on the AMD machine. See the section A.3 of the<br>> > > installation guide.<br>
> > ><br>> > > Rajeev<br>> > ><br>> > > From: Krishna Chaitanya [mailto:<a href="mailto:kris.c1986@gmail.com">kris.c1986@gmail.com</a>]<br>> > > Sent: Thursday, February 14, 2008 11:41 AM<br>
> > > To: Rajeev Thakur<br>> > > Cc: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>> > > Subject: Re: [MPICH] Problem setting up MPICH between a 32 bit<br>> > > INTEL and a 32 bit AMD machine<br>
> > ><br>> > > So, what is the error trying to convey? Googling for it, gave this.<br>> > > I have flushed the IPtables on both the machines and the firewalls<br>> > > are de-activated. Could you please elaborate on what kind of<br>
> > > settings I need to look into?<br>> > ><br>> > > Thanks,<br>> > > Krishna Chaitanya K<br>> > ><br>> > > On Thu, Feb 14, 2008 at 10:58 PM, Rajeev Thakur<br>> > > <<a href="mailto:thakur@mcs.anl.gov">thakur@mcs.anl.gov</a>> wrote:<br>
> > > It should be possible. mpdcheck is a tool to diagnose whether the<br>> > > network configuration settings on the machines are ok or not, and<br>> > > whether a process on one machine can talk to a process on the<br>
> > > other. It looks like the settings need to be fixed in some way.<br>> > ><br>> > > Rajeev<br>> > ><br>> > > From: <a href="mailto:owner-mpich-discuss@mcs.anl.gov">owner-mpich-discuss@mcs.anl.gov</a> [mailto:<a href="mailto:owner-mpich-">owner-mpich-</a><br>
> > > <a href="mailto:discuss@mcs.anl.gov">discuss@mcs.anl.gov</a>] On Behalf Of Krishna Chaitanya<br>> > > Sent: Thursday, February 14, 2008 10:26 AM<br>> > > To: <a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a><br>
> > > Subject: [MPICH] Problem setting up MPICH between a 32 bit INTEL<br>> > > and a 32 bit AMD machine<br>> > ><br>> > > Hi,<br>> > > In one of the previous posts, you had replied back saying<br>
> > > MPICH cannot be put to use between a 32 bit INTEL machine and a 64<br>> > > bit AMD machine. Is it possible to do so between an INTEL and an<br>> > > AMD machine, both of them being 32 bit processors?<br>
> > > Anyway, on trying mpdcheck -f mpd.hosts on the 32 bit AMD,<br>> > > I am getting the following error :<br>> > > ipaddr via uqn (<a href="http://208.67.216.130" target="_blank">208.67.216.130</a>) does not match via fqn<br>
> > > (<a href="http://208.69.32.130" target="_blank">208.69.32.130</a>)<br>> > > And if I try the mpdcheck -s on the AMD node and mpdcheck -<br>> > > c on the INTEL node, the client times out. The test message gets<br>
> > > delivered with the client and server swapped.<br>> > ><br>> > > Thanks,<br>> > > Krishna Chaitanya K<br>> > ><br>> > ><br>> > ><br>> > ><br>> > ><br>
> > > --<br>> > > In the middle of difficulty, lies opportunity<br>> > ><br>> > ><br>> > ><br>> > > --<br>> > > In the middle of difficulty, lies opportunity<br>
> > ><br>> > ><br>> > ><br>> > > --<br>> > > In the middle of difficulty, lies opportunity<br>> ><br>> ><br>> ><br>> ><br>> > --<br>> > In the middle of difficulty, lies opportunity<br>
><br>><br><br><br></div></div>--<br><div><div></div><div class="Wj3C7c">In the middle of difficulty, lies opportunity<br></div></div></blockquote></div><br><br clear="all"><br>-- <br>In the middle of difficulty, lies opportunity