[mpich-discuss] RE: Questions about connecting other computers (using other CPUs)

Jayesh Krishna jayesh at mcs.anl.gov
Tue Sep 2 11:14:12 CDT 2008


Hi,
 Try just registering the username (OEM instead of HTHWANG\OEM) without
domain name (since the usernames are local --- or you could have a domain
username, make all the computers part of the domain and register the
domain username. I would suggest trying out the local usernames first
though...) and see if it works.
 
>>... mpiexec -n 4 -machinefile machine.txt blackbox hello.exe ...
    As I mentioned in my prev email, why do you have "blackbox" in the
mpiexec command ? Shouldn't the command be "mpiexec -n 4 -machinefile
machine.txt hello.exe " ?
 
 Also did you try checking the status of windows firewall and smpd as
mentioned in my prev email.
 
Regards,
Jayesh

  _____  

From: Hyoun-Tae [mailto:hthwang at uwaterloo.ca] 
Sent: Tuesday, September 02, 2008 11:07 AM
To: 'Jayesh Krishna'
Subject: RE: Questions about connecting other computers (using other CPUs)



Hello, 

 

Thanks for your help. 

I'm still struggling to connect another computer but it does not work yet.


 

I'm using my computer's username as OEM User and trying to connect another
machine (hostname is blackbox)

Do I have to have the same user name (OEM User) on every computer I want
to use? 

Ex) my computer ==> HTHWANG\OEM User ..        another computer ==>
BLACKBOX\OEM User

 

If I have to do that, how can I open a new account as OEM User in other
computers?

I'm wondering if I should re-install MPICH2 program in the new account
(OEM User) after making the new account in other computers.

 

 

I tried the following commander as like:

 

"mpiexec -n 4 -machinefile machine.txt blackbox hello.exe"

 

this time I got error message, which is slightly different from what I got
the last time.  

(red letters with underlined)

 

Unable to connect to 'BLACKBOX:8676'

Sock error: generic socket failure, error stack:

MPIDU_Sock_post_connect (1228): unable to connect to
129.97.81.173/blackbox on port 8676, 

Exhausted all endpoints (errno -1)

MPIDU_Sock_post_connect(1244): gethostbyname failed, The requested name is
valid and was found in the database,

but does not have the correct associated data being resolved for. <errno
11004>

Thank you in advance, 

HT

 

 

**********************************************************************

Hyoun-Tae Hwang 

 

#2051 EIT, Department of Earth and Environmental Sci

University of Waterloo

office : (+1) 519-888-4567 (EXT. 37343)              

home : (+1) 519-880-9794 

**********************************************************************

 

-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov] 
Sent: September 2, 2008 10:49 AM
To: 'Hyoun-Tae'
Cc: mpich-discuss at mcs.anl.gov
Subject: RE: Questions about connecting other computers (using other CPUs)

 

Hi,

 

>>>,,, mpiexec -n 2 -machinfile machine.txt BLACKBOX...

 

 You should run "mpiexec -n 2 -machinefile machine.txt hostname" (the last
parameter should be the executable name)

 

>>> ...Credintals for HTHWANG\OEM User rejected connecting to BLACKBOX...

 

 You should make sure that you have the same username (with the same
password)  on both the machines. The error message says that OEM user is
not present in BLACKBOX

 

>>> ... Unable to connect to...

 

 Make sure that the firewalls (windows firewalls and 3rd party firewalls)
are not running on the machines. Also make sure that smpd (the MPICH2
process manager) is running on all the machines (Type "smpd -status" at
the command prompt to get the status of smpd).

 

Regards,

Jayesh

  _____  

From: Hyoun-Tae [mailto:hthwang at uwaterloo.ca] 
Sent: Tuesday, September 02, 2008 9:32 AM
To: jayesh at mcs.anl.gov
Subject: FW: Questions about connecting other computers (using other CPUs)

Hi, 

 

Thank you for your help. 

 

I tried to connect other computers for several times based on your
instructions, but I couldn't connect with other computers. 

 

What I have done is:

 

1.      install MPICH2 on other computer  (host name is BLACKBOX) 

2.      make machine file. In the machine file (machine.txt), I wrote like


               

127.97.81.172/BLACKBOX

        

3.      try to run mpiexec on my computer (host name is HTHWANG) by using
the following commander 

   

      mpiexec -n 2 -machinfile machine.txt BLACKBOX

 

After performing this, I always got error messages like

 

Error message 1:

 

Credintals for HTHWANG\OEM User rejected connecting to BLACKBOX

Aborting: unable to connect to BLACKBOX

 

 

Error message 2:

 

Unable to connect to 'BLACKBOX:8676'

Sock error: generic socket failure, error stack:

MPIDU_Sock_post_connect (1228): unable to connect to BLACKBOX on port
8676, 

Exhausted all endpoints (errno -1)

MPIDU_Sock_post_connect(1244): gethostbyname failed, No such host is
known. 

(error no 11001)

 

 

Could you tell me how I can solve these problems?

Thank you, 

 

HT

 

 

 

 

**********************************************************************

Hyoun-Tae Hwang 

 

#2051 EIT, Department of Earth and Environmental Sci

University of Waterloo

office : (+1) 519-888-4567 (EXT. 37343)              

home : (+1) 519-880-9794 

**********************************************************************

 

-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov] 
Sent: August 27, 2008 12:52 PM
To: 'Hyoun-Tae'
Cc: mpich-discuss at mcs.anl.gov
Subject: RE: Questions about connecting other computers (using other CPUs)

 

Hi,

 To use multiple nodes/hosts/machines to run your MPI application,

 

=========== SETUP =================

# Install MPICH2 on all the machines/hosts/nodes involved.

 

=========== TESTING YOUR SETUP ================

 

# Try running a non-MPI program using the "-machinefile" option of mpiexec
(List the ipaddresses/hostnames(Try ipaddresses first) of the machines in
the machinefile ). See the windows developer's guide for details on using
the "-machinefile" option.

 

    --- Try running, 

 

            mpiexec -n 2 -machinefile my_machine_file.txt hostname

 

# If you run hostname as mentioned in the prev step using mpiexec, you
should see the hostnames of the indiv machines listed.

# Now try running cpi.exe (MPICH2\examples\cpi.exe) using the
"-machinefile" option of mpiexec.

 

=========== RUNNING YOUR PROGRAM =============

 

# Make your application available on all the machines

 

   1) You can copy your application to the same location on all the
machines (Try this first)

        OR

 

   2) You can copy your application to diff locations in each machine and
use the "-path" option in mpiexec to specify the path in each machine (see
the windows developer's guide for details)

        OR

 

    3) Use the "-map" or "-mapall" option of mpiexec to map the directory
containing the executable on all the machines (see the windows developer's
guide for details)

 

# Run your MPI program using the "-machinefile" option of mpiexec (List
the ipaddresses/hostnames(Try ipaddresses first) of the machines in the
machinefile )

 

  Let us know if you need further help.

 

(PS: By the way, if you have 2 CPUs in your machine if you run your MPI
application with more than 1 process, the OS schedules the processes to
use both processors.)

 

Regards,

Jayesh

 

  _____  

From: Hyoun-Tae [mailto:hthwang at uwaterloo.ca] 
Sent: Wednesday, August 27, 2008 11:33 AM
To: 'Jayesh Krishna'
Subject: RE: Questions about connecting other computers (using other CPUs)

HI Jayesh, 

 

My computer has only 2 CPUs so I cannot find any advantages for using MPI
application. 

Now I'm trying to connect other computers to use CPUs for my MPI code. 

 

Could you tell me how I can connect with other computers and apply my MPI
code?

Regards, 

 

Hyoun-Tae

 

 

**********************************************************************

Hyoun-Tae Hwang 

 

#2051 EIT, Department of Earth and Environmental Sci

University of Waterloo

office : (+1) 519-888-4567 (EXT. 37343)              

home : (+1) 519-880-9794 

**********************************************************************

 

-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov] 
Sent: August 18, 2008 1:58 PM
To: 'Hyoun-Tae'
Subject: RE: Questions about MPI application

 

Hi,

 Great! Let us know if you need any further help.

 

Regards,

Jayesh

 

  _____  

From: Hyoun-Tae [mailto:hthwang at uwaterloo.ca] 
Sent: Monday, August 18, 2008 12:57 PM
To: 'Jayesh Krishna'
Subject: RE: Questions about MPI application

Hello, 

 

Thanks for your help. As you said, I debugged my code. 

I finally got right results. It really works!

I appreciate your help. 

 

HT

 

**********************************************************************

Hyoun-Tae Hwang 

 

#2051 EIT, Department of Earth and Environmental Sci

University of Waterloo

office : (+1) 519-888-4567 (EXT. 37343)              

home : (+1) 519-880-9794 

**********************************************************************

 

-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov] 
Sent: August 14, 2008 12:37 PM
To: 'Hyoun-Tae'
Cc: mpich-discuss at mcs.anl.gov
Subject: RE: Questions about MPI application

 

Hi,

 The error "Error creating mpiexec process" indicates that you have tried
to run a singleton MPI process without mpiexec in the PATH. Make sure that
you have mpiexec in your PATH (more precisely the path to mpiexec in your
PATH env variable) when running an MPI process as a singleton process
(singleton process - process launched without mpiexec - when you run your
MPI process in the VS command window). To get rid of the error I would
recommend that you run your MPI application from a command window using
mpiexec OR run your MPI application from a command window without mpiexec
but making sure that path to mpiexec (typically c:\program
files\mpich2\bin) is in the PATH.

 The error "process 0 exited without calling finalize" can happen due to
many reasons (Basically the error just says that the MPI process exited
before calling MPI_Finalize()). Most likely it is due to an error in the
application resulting in a segfault, aborting the job. I would recommend
that, to start with your debugging, you run your application using mpiexec
from a command prompt and try debugging the application with some debug
statements.

 Let us know if you need further assistance

 

Regards,

Jayesh

 

 

  _____  

From: Hyoun-Tae [mailto:hthwang at uwaterloo.ca] 
Sent: Thursday, August 14, 2008 10:47 AM
To: jayesh at mcs.anl.gov
Subject: Questions about MPI application

Hello Jayesh, 

 

I just make an MPI application code, which solves non-linear PDEs. 

Of the code, I applied parallel computing method only on a matrix solver,
Thomas algorithm. 

The scheme of my program follows as:

 

1.      read input data 

2.      make a matrix 

3.      solve the matrix by using parallel method   

4.      print output 

 

As mentioned before, I just parallelized only the matrix solver. 

I compiled the application code by using MS visual studio 2005 

and executed the application code with using MPIEXEC wrapper.  

 

However, I always got an error massage like (I'm using 2 CPUs)

 

job aborted:

rank: node: exit code[: error message]

0: hthwang: -1073741819: process 0 exited without calling finalize

1: hthwang: -1073741819: process 1 exited without calling finalize

 

Could you tell me why I always get the error message and how I can solve
this problem?

 

When I compile the code by using MS visual studio 2005, 

I also get a message on a command window such as:

 

[0] Error creating mpiexec process .2

[0] launchMpiexecProcess failed

Fatal error in MPI_Init: Other MPI error, error stack:

MPIR_Init_thread(294): Initialization failed

MPID_Init(82)....: channel initialization failed

MPID_Init(383).....: PMI_Get_id returned 1

 

 

Is this error after compilation related to MPI application error?

Could you tell me how I can solve these problems?

 

I can send the MPI application code if you need.

Thank you, 

 

HT

**********************************************************************

Hyoun-Tae Hwang 

 

#2051 EIT, Department of Earth and Environmental Sci

University of Waterloo

office : (+1) 519-888-4567 (EXT. 37343)              

home : (+1) 519-880-9794 

**********************************************************************

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080902/e4166ae9/attachment.htm>


More information about the mpich-discuss mailing list