<br><font size=2 face="sans-serif">We removed the "primary DNS suffix"
from 10.0.0.6, and everything seems to work now. The "helloworld"
app no longer crashes, and the ordering of hosts no longer matters.</font>
<br><font size=2 face="sans-serif">Thank you for your guidance,</font>
<br><font size=2 face="sans-serif">David</font>
<br>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=40%><font size=1 face="sans-serif"><b>jayesh@mcs.anl.gov</b>
</font>
<p><font size=1 face="sans-serif">07/23/2010 03:42 PM</font>
<br><font size=1 face="sans-serif">Expire Date:</font><font size=2 face="sans-serif">
</font><font size=1 face="sans-serif">07/25/2012</font>
<br>
<td width=59%>
<table width=100%>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td><font size=1 face="sans-serif">mpich-discuss@mcs.anl.gov</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td><font size=1 face="sans-serif">David_Lowinger@ea.epson.com</font>
<tr valign=top>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td><font size=1 face="sans-serif">Re: [mpich-discuss] command line ordering
of hosts matters?</font></table>
<br>
<table>
<tr valign=top>
<td>
<td></table>
<br></table>
<br>
<br>
<br><tt><font size=2>Hi,<br>
From the error message, gethostbyname() fails for ipaddress 10.0.0.6.
Looks like the DNS entries for the host is incomplete. Can you (or a sysadmin)
verify that the DNS entries for the host is actually valid/complete ?<br>
Can you also try the newer nemesis channel (which will be the default
channel, replacing the sock channel, in 1.3 series) and see if it works
(mpiexec -channel nemesis -hosts 2 10.0.0.6 1 10.0.0.1 1 helloworld.exe)?<br>
<br>
Regards,<br>
Jayesh<br>
----- Original Message -----<br>
From: "David Lowinger" <David_Lowinger@ea.epson.com><br>
To: mpich-discuss@mcs.anl.gov<br>
Sent: Friday, July 23, 2010 2:24:02 PM GMT -06:00 US/Canada Central<br>
Subject: Re: [mpich-discuss] command line ordering of hosts matters?<br>
<br>
<br>
<br>
<br>
Hi Jayesh, <br>
I do indeed see 1 instance of my MPI program ("myapp.exe") on
each machine's Process Manager whenever I run any test, whether the MPI_Bcast
fails or not. So, it looks like both machines are successfully launching
processes. <br>
<br>
Yes, I am running the 64-bit version of MPICH2 on both the machines. I
just tried recompiling the code separately on both the machines, and I
see the exact same result as before ( ie. the MPI_Bcast never completes
if I put 10.0.0.101 before 10.0.0.6 ). <br>
<br>
I tried both the simple hello world program and the compute pi example
that you sent in the links. The simple "hello world" program
actually crashes... this is very odd, because I've successfully run much
more complex apps than this with no problems. Please see the attached file
("helloworld_output.jpg") for the error I get when I run the
https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/examples/hellow.c example.
I see this crash as long as I use 2 hosts, listed in either order on the
command line. If I only use 1 host (ie. "mpiexec -np 5 helloworld.exe"),
it runs fine. <br>
<br>
The PI example seems to have the same "host ordering" issue I
initially described to you. If I place 10.0.0.6 first in the command line,
there's no problem (apart from me being unable to input the value for 'n'
using scanf... instead, I just commented out the scanf section and hardcoded
'n' to be 5). If 10.0.0.6 is not listed first, the PI calculation app never
gets past the initial call to 'MPI_Bcast()'. <br>
<br>
David <br>
<br>
<br>
<br>
Jayesh Krishna <jayesh@mcs.anl.gov> <br>
<br>
07/13/2010 08:56 PM <br>
Expire Date: 07/22/2012 <br>
<br>
To mpich-discuss@mcs.anl.gov
<br>
<br>
cc David_Lowinger@ea.epson.com
<br>
<br>
Subject
Re: [mpich-discuss] command line ordering of hosts matters?
<br>
<br>
<br>
<br>
<br>
Hi, <br>
Sorry for the delayed response. <br>
Since you are able to run "hostname" on the two machines the
process manager should be able to launch processes on the two machines
(In the process manager tab you should see 1 instance of your MPI program,
MYMPIPGM.exe, on each machine). <br>
Are you running a 64-bit version of MPICH2 on both the machines ? Have
you recompiled your code on both the machines ? Did you try a simple hello
world program to see if it works (https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/examples/hellow.c)?
Try compiling the compute pi example to see if it works (https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/examples/icpi.c
- Make sure that you compile the code separately on both the machines)?
<br>
Let us know the results. <br>
<br>
Regards, <br>
Jayesh <br>
<br>
----- Original Message ----- <br>
From: "David Lowinger" <David_Lowinger@ea.epson.com> <br>
To: mpich-discuss@mcs.anl.gov <br>
Sent: Monday, July 5, 2010 11:11:16 AM GMT -06:00 US/Canada Central <br>
Subject: Re: [mpich-discuss] command line ordering of hosts matters? <br>
<br>
<br>
<br>
Hi Jayesh, <br>
<br>
Yes, I can run a simple non-MPI program (ie. the program "hostname")
successfully with " mpiexec -hosts 2 10.0.0.101 1 10.0.0.6 1 hostname
". I see both hostnames get printed out, and then the program exits.
The ordering of hosts does not matter in this case. <br>
I see the exact same behavior as before when I provide the complete path
to the executable; ie. the MPI_Bcase never completes if I put 10.0.0.101
before 10.0.0.6. <br>
I see the exact same behavior as before when I use the machinefile option.
<br>
<br>
With regards to checking the control panel, I assume you mean the "Processes"
tab of the Windows Task Manager? If so, which processes should I be looking
for as the program runs? I changed the MPI "hello world" app
to MPI_Bcast a very large buffer over and over, so that I can look for
some kind of process in the Windows Task Manager. What is the name of the
process I should be looking for? Thanks, <br>
<br>
David <br>
<br>
<br>
<br>
Jayesh Krishna <jayesh@mcs.anl.gov> <br>
<br>
07/01/2010 11:54 AM <br>
Expire Date: 07/04/2012 <br>
<br>
To mpich-discuss@mcs.anl.gov <br>
<br>
cc David_Lowinger@ea.epson.com <br>
<br>
Subject Re: [mpich-discuss] command line ordering of hosts matters? <br>
<br>
<br>
<br>
<br>
Hi, <br>
Along with checking whether the processes are launched, please provide
us with the information below, <br>
<br>
# Are you able to run a simple non-MPI program (mpiexec -hosts 2 10.0.0.101
1 10.0.0.6 1 hostname)? <br>
# Try providing the complete path to the executable (mpiexec -hosts 2 10.0.0.101
1 10.0.0.6 1 c:\temp\helloworld.exe). Without the complete path you might
actually be executing two different MPI programs on the hosts. <br>
# Does the machinefile option work (mpiexec -n 2 -machinefile mf.txt c:\temp\helloworld.exe
; Where the file mf.txt contains the host ipaddresses)? <br>
<br>
Regards, <br>
Jayesh <br>
----- Original Message ----- <br>
From: "David Lowinger" <David_Lowinger@ea.epson.com> <br>
To: mpich-discuss@mcs.anl.gov <br>
Sent: Wednesday, June 30, 2010 4:26:16 PM GMT -06:00 US/Canada Central
<br>
Subject: Re: [mpich-discuss] command line ordering of hosts matters? <br>
<br>
<br>
<br>
Hi Jayesh, <br>
On both machines, when I type, "smpd -version", both display
"1.2.1p1". <br>
When I call "MPI_Get_version()" on both machines, they both show
version 2, subversion 2. <br>
<br>
Both machines are running 64-bit Windows Vista. <br>
<br>
I'll check the control panel to see if the MPI processes are being launched.
<br>
<br>
David <br>
<br>
<br>
<br>
Jayesh Krishna <jayesh@mcs.anl.gov> <br>
<br>
06/30/2010 04:53 PM <br>
Expire Date: 06/29/2012 <br>
<br>
To mpich-discuss@mcs.anl.gov <br>
<br>
cc David_Lowinger@ea.epson.com <br>
<br>
Subject Re: [mpich-discuss] command line ordering of hosts matters? <br>
<br>
<br>
<br>
<br>
Hi, <br>
Which version of MPICH2 are you using ? <br>
Do the two machines have the same underlying architecture (MPICH2 currently
does not support heterogeneous systems - So you cannot run your job across
32-bit and 64-bit machines/MPICH2_libs)? <br>
Also try checking the control panel to see if the MPI processes are being
launched on the machines. <br>
<br>
Regards, <br>
Jayesh <br>
<br>
----- Original Message ----- <br>
From: "David Lowinger" <David_Lowinger@ea.epson.com> <br>
To: mpich-discuss@mcs.anl.gov <br>
Sent: Wednesday, June 30, 2010 3:42:22 PM GMT -06:00 US/Canada Central
<br>
Subject: Re: [mpich-discuss] command line ordering of hosts matters? <br>
<br>
<br>
<br>
Firewall is turned off on both machines. <br>
There is no error message... the MPI_Bcase simply never completes. I've
left it running for 10 minutes, and the second printf ("completed
MPI_Bcast") never appears when I use the second host ordering below
("10.0.0.101 1 10.0.0.6 1"). <br>
When I run "smpd -status 10.0.0.6" from 10.0.0.101, I see the
message "smpd running on 10.0.0.6". When I run "smpd -status
10.0.0.101" from 10.0.0.6, I see the message "smpd running on
10.0.0.101". <br>
<br>
<br>
<br>
jayesh@mcs.anl.gov <br>
<br>
06/30/2010 10:50 AM <br>
Expire Date: 06/29/2012 <br>
<br>
To mpich-discuss@mcs.anl.gov <br>
<br>
cc David_Lowinger@ea.epson.com <br>
<br>
Subject Re: [mpich-discuss] command line ordering of hosts matters? <br>
<br>
<br>
<br>
<br>
Hi, <br>
Do you have a firewall running on any of these machines (If so, can you
try running your job after turning off the firewall)? <br>
What is the error message that you get when you run your job ? <br>
Can you try running "smpd -status REMOTE_MACHINE" from each of
the machines and let us know the results ("smpd -status 10.0.0.6"
from 10.0.0.101 & "smpd -status 10.0.0.101" from 10.0.0.6)?
<br>
<br>
Regards, <br>
Jayesh <br>
----- Original Message ----- <br>
From: "David Lowinger" <David_Lowinger@ea.epson.com> <br>
To: mpich-discuss@mcs.anl.gov <br>
Sent: Tuesday, June 29, 2010 5:53:07 PM GMT -06:00 US/Canada Central <br>
Subject: [mpich-discuss] command line ordering of hosts matters? <br>
<br>
<br>
<br>
Hi, <br>
When running a very basic "hello world" app, I've found that
the app's behavior depends on the order I use for hosts in the command
line. For example, if I use: <br>
<br>
mpiexec -hosts 2 10.0.0.6 1 10.0.0.101 1 helloworld.exe <br>
<br>
The program executes flawlessly. But, if I use: <br>
<br>
mpiexec -hosts 2 10.0.0.101 1 10.0.0.6 1 helloworld.exe <br>
<br>
then the program never gets past the call to "MPI_Bcast()". Here
is my code: <br>
<br>
------------------- <br>
<br>
#include "mpi.h" <br>
<br>
#define MPI_FLUSH() fflush(stdout) <br>
<br>
int main( int argc, char* argv[] ) <br>
{ <br>
int g_Thread_ID, g_Num_Threads; <br>
int test = 0; <br>
<br>
/**************************************************\ <br>
* MPI Initialization * <br>
\**************************************************/ <br>
MPI_Init(&argc, &argv); <br>
MPI_Comm_rank(MPI_COMM_WORLD, &g_Thread_ID); <br>
MPI_Comm_size(MPI_COMM_WORLD, &g_Num_Threads); <br>
<br>
printf("thread %d: main: About to execute MPI_Bcast\n", g_Thread_ID);
<br>
MPI_FLUSH(); <br>
<br>
// Broadcast integer <br>
int err = MPI_Bcast(&test, 1, MPI_INT, 0, MPI_COMM_WORLD); <br>
<br>
printf("thread %d: completed MPI_Bcast\n", g_Thread_ID); <br>
MPI_FLUSH(); <br>
<br>
MPI_Finalize(); <br>
} <br>
<br>
------------------ <br>
<br>
I am running Windows Vista on both machines. Has anyone seen this before?
Thanks, <br>
David <br>
<br>
_______________________________________________ <br>
mpich-discuss mailing list <br>
mpich-discuss@mcs.anl.gov <br>
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss <br>
<br>
<br>
_______________________________________________ <br>
mpich-discuss mailing list <br>
mpich-discuss@mcs.anl.gov <br>
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss <br>
<br>
<br>
_______________________________________________ <br>
mpich-discuss mailing list <br>
mpich-discuss@mcs.anl.gov <br>
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss <br>
<br>
<br>
_______________________________________________ <br>
mpich-discuss mailing list <br>
mpich-discuss@mcs.anl.gov <br>
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss <br>
<br>
<br>
_______________________________________________<br>
mpich-discuss mailing list<br>
mpich-discuss@mcs.anl.gov<br>
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss<br>
</font></tt>
<br>