[mpich-discuss] RE: Questions about low CPU usage and increased computing time
Jayesh Krishna
jayesh at mcs.anl.gov
Thu Sep 4 09:28:34 CDT 2008
Hi,
Great to know its working for you now !
>> ..the format of machine file is slightly different from what you
mentioned. ...
I don't remember mentioned any particular format for the machinefile in
my prev emails (The format specified in the windows developer's guide
should work)
>> ...The computing time was three times larger than that of serial
code...
This depends on the problem at hand and the extent of parallelization
utilized by the developer. You must also take into account the cost of
communication required when running btw two machines/hosts/nodes.
>> ...I checked Windows Task Manager and found that the CPU usage was less
than 10% for four CPUs...
As I mentioned before speed up depends on the problem at hand and the
extent of parallelization utilized by dev. What you ideally want to do is
to be able to partition the problem in such a way that all the processes
do almost equal work (also at the same time) to reduce the computation
time (If a process is waiting for result from another process to proceed
you might not get enough speed up). In other words you will have to look
into your MPI application to understand the reduction in computation time.
Let us know if you need further help.
Regards,
Jayesh
_____
From: Hyoun-Tae [mailto:hthwang at uwaterloo.ca]
Sent: Thursday, September 04, 2008 9:13 AM
To: 'Jayesh Krishna'
Subject: RE: Questions about low CPU usage and increased computing time
Hello,
Thank you for your help. It works now!
Actually, the format of machine file is slightly different from what you
mentioned.
The format of the machine file I used was
Blackbox:2 129.12.13.134
Hthwang:2 129.12.0.154
However, there is another problem on computing time.
The computing time was three times larger than that of serial code.
For example,
For serial code, the computing time was 10 min.
For the case of using 4 CPUs, the computing time was 30 min.
I checked Windows Task Manager and found that the CPU usage was less than
10% for four CPUs.
I think the low CPU usage causes the increased computing time. Is it
right?
If it is, how could I increase the CPU usage to 100%?
Thank you very much for your time.
HT
**********************************************************************
Hyoun-Tae Hwang
#2051 EIT, Department of Earth and Environmental Sci
University of Waterloo
office : (+1) 519-888-4567 (EXT. 37343)
home : (+1) 519-880-9794
**********************************************************************
-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov]
Sent: September 2, 2008 12:14 PM
To: 'Hyoun-Tae'
Cc: mpich-discuss at mcs.anl.gov
Subject: RE: Questions about connecting other computers (using other CPUs)
Hi,
Try just registering the username (OEM instead of HTHWANG\OEM) without
domain name (since the usernames are local --- or you could have a domain
username, make all the computers part of the domain and register the
domain username. I would suggest trying out the local usernames first
though...) and see if it works.
>>... mpiexec -n 4 -machinefile machine.txt blackbox hello.exe ...
As I mentioned in my prev email, why do you have "blackbox" in the
mpiexec command ? Shouldn't the command be "mpiexec -n 4 -machinefile
machine.txt hello.exe " ?
Also did you try checking the status of windows firewall and smpd as
mentioned in my prev email.
Regards,
Jayesh
_____
From: Hyoun-Tae [mailto:hthwang at uwaterloo.ca]
Sent: Tuesday, September 02, 2008 11:07 AM
To: 'Jayesh Krishna'
Subject: RE: Questions about connecting other computers (using other CPUs)
Hello,
Thanks for your help.
IR17;m still struggling to connect another computer but it does not work
yet.
IR17;m using my computer's username as OEM User and trying to connect
another machine (hostname is blackbox)
Do I have to have the same user name (OEM User) on every computer I want
to use?
Ex) my computer ==> HTHWANG\OEM User .. another computer ==>
BLACKBOX\OEM User
If I have to do that, how can I open a new account as OEM User in other
computers?
IR17;m wondering if I should re-install MPICH2 program in the new account
(OEM User) after making the new account in other computers.
I tried the following commander as like:
"mpiexec -n 4 -machinefile machine.txt blackbox hello.exe"
this time I got error message, which is slightly different from what I got
the last time.
(red letters with underlined)
Unable to connect to 'BLACKBOX:8676'
Sock error: generic socket failure, error stack:
MPIDU_Sock_post_connect (1228): unable to connect to
129.97.81.173/blackbox on port 8676,
Exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1244): gethostbyname failed, The requested name is
valid and was found in the database,
but does not have the correct associated data being resolved for. <errno
11004>
Thank you in advance,
HT
**********************************************************************
Hyoun-Tae Hwang
#2051 EIT, Department of Earth and Environmental Sci
University of Waterloo
office : (+1) 519-888-4567 (EXT. 37343)
home : (+1) 519-880-9794
**********************************************************************
-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov]
Sent: September 2, 2008 10:49 AM
To: 'Hyoun-Tae'
Cc: mpich-discuss at mcs.anl.gov
Subject: RE: Questions about connecting other computers (using other CPUs)
Hi,
>>>,,, mpiexec -n 2 -machinfile machine.txt BLACKBOX...
You should run "mpiexec -n 2 -machinefile machine.txt hostname" (the last
parameter should be the executable name)
>>> ...Credintals for HTHWANG\OEM User rejected connecting to BLACKBOX...
You should make sure that you have the same username (with the same
password) on both the machines. The error message says that OEM user is
not present in BLACKBOX
>>> ... Unable to connect to...
Make sure that the firewalls (windows firewalls and 3rd party firewalls)
are not running on the machines. Also make sure that smpd (the MPICH2
process manager) is running on all the machines (Type "smpd -status" at
the command prompt to get the status of smpd).
Regards,
Jayesh
_____
From: Hyoun-Tae [mailto:hthwang at uwaterloo.ca]
Sent: Tuesday, September 02, 2008 9:32 AM
To: jayesh at mcs.anl.gov
Subject: FW: Questions about connecting other computers (using other CPUs)
Hi,
Thank you for your help.
I tried to connect other computers for several times based on your
instructions, but I couldn't connect with other computers.
What I have done is:
1. install MPICH2 on other computer (host name is BLACKBOX)
2. make machine file. In the machine file (machine.txt), I wrote like
127.97.81.172/BLACKBOX
3. try to run mpiexec on my computer (host name is HTHWANG) by using
the following commander
mpiexec -n 2 -machinfile machine.txt BLACKBOX
After performing this, I always got error messages like
Error message 1:
Credintals for HTHWANG\OEM User rejected connecting to BLACKBOX
Aborting: unable to connect to BLACKBOX
Error message 2:
Unable to connect to 'BLACKBOX:8676'
Sock error: generic socket failure, error stack:
MPIDU_Sock_post_connect (1228): unable to connect to BLACKBOX on port
8676,
Exhausted all endpoints (errno -1)
MPIDU_Sock_post_connect(1244): gethostbyname failed, No such host is
known.
(error no 11001)
Could you tell me how I can solve these problems?
Thank you,
HT
**********************************************************************
Hyoun-Tae Hwang
#2051 EIT, Department of Earth and Environmental Sci
University of Waterloo
office : (+1) 519-888-4567 (EXT. 37343)
home : (+1) 519-880-9794
**********************************************************************
-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov]
Sent: August 27, 2008 12:52 PM
To: 'Hyoun-Tae'
Cc: mpich-discuss at mcs.anl.gov
Subject: RE: Questions about connecting other computers (using other CPUs)
Hi,
To use multiple nodes/hosts/machines to run your MPI application,
=========== SETUP =================
# Install MPICH2 on all the machines/hosts/nodes involved.
=========== TESTING YOUR SETUP ================
# Try running a non-MPI program using the "-machinefile" option of mpiexec
(List the ipaddresses/hostnames(Try ipaddresses first) of the machines in
the machinefile ). See the windows developer's guide for details on using
the "-machinefile" option.
--- Try running,
mpiexec -n 2 -machinefile my_machine_file.txt hostname
# If you run hostname as mentioned in the prev step using mpiexec, you
should see the hostnames of the indiv machines listed.
# Now try running cpi.exe (MPICH2\examples\cpi.exe) using the
"-machinefile" option of mpiexec.
=========== RUNNING YOUR PROGRAM =============
# Make your application available on all the machines
1) You can copy your application to the same location on all the
machines (Try this first)
OR
2) You can copy your application to diff locations in each machine and
use the "-path" option in mpiexec to specify the path in each machine (see
the windows developer's guide for details)
OR
3) Use the "-map" or "-mapall" option of mpiexec to map the directory
containing the executable on all the machines (see the windows developer's
guide for details)
# Run your MPI program using the "-machinefile" option of mpiexec (List
the ipaddresses/hostnames(Try ipaddresses first) of the machines in the
machinefile )
Let us know if you need further help.
(PS: By the way, if you have 2 CPUs in your machine if you run your MPI
application with more than 1 process, the OS schedules the processes to
use both processors.)
Regards,
Jayesh
_____
From: Hyoun-Tae [mailto:hthwang at uwaterloo.ca]
Sent: Wednesday, August 27, 2008 11:33 AM
To: 'Jayesh Krishna'
Subject: RE: Questions about connecting other computers (using other CPUs)
HI Jayesh,
My computer has only 2 CPUs so I cannot find any advantages for using MPI
application.
Now I'm trying to connect other computers to use CPUs for my MPI code.
Could you tell me how I can connect with other computers and apply my MPI
code?
Regards,
Hyoun-Tae
**********************************************************************
Hyoun-Tae Hwang
#2051 EIT, Department of Earth and Environmental Sci
University of Waterloo
office : (+1) 519-888-4567 (EXT. 37343)
home : (+1) 519-880-9794
**********************************************************************
-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov]
Sent: August 18, 2008 1:58 PM
To: 'Hyoun-Tae'
Subject: RE: Questions about MPI application
Hi,
Great! Let us know if you need any further help.
Regards,
Jayesh
_____
From: Hyoun-Tae [mailto:hthwang at uwaterloo.ca]
Sent: Monday, August 18, 2008 12:57 PM
To: 'Jayesh Krishna'
Subject: RE: Questions about MPI application
Hello,
Thanks for your help. As you said, I debugged my code.
I finally got right results. It really works!
I appreciate your help.
HT
**********************************************************************
Hyoun-Tae Hwang
#2051 EIT, Department of Earth and Environmental Sci
University of Waterloo
office : (+1) 519-888-4567 (EXT. 37343)
home : (+1) 519-880-9794
**********************************************************************
-----Original Message-----
From: Jayesh Krishna [mailto:jayesh at mcs.anl.gov]
Sent: August 14, 2008 12:37 PM
To: 'Hyoun-Tae'
Cc: mpich-discuss at mcs.anl.gov
Subject: RE: Questions about MPI application
Hi,
The error "Error creating mpiexec process" indicates that you have tried
to run a singleton MPI process without mpiexec in the PATH. Make sure that
you have mpiexec in your PATH (more precisely the path to mpiexec in your
PATH env variable) when running an MPI process as a singleton process
(singleton process - process launched without mpiexec - when you run your
MPI process in the VS command window). To get rid of the error I would
recommend that you run your MPI application from a command window using
mpiexec OR run your MPI application from a command window without mpiexec
but making sure that path to mpiexec (typically c:\program
files\mpich2\bin) is in the PATH.
The error "process 0 exited without calling finalize" can happen due to
many reasons (Basically the error just says that the MPI process exited
before calling MPI_Finalize()). Most likely it is due to an error in the
application resulting in a segfault, aborting the job. I would recommend
that, to start with your debugging, you run your application using mpiexec
from a command prompt and try debugging the application with some debug
statements.
Let us know if you need further assistance
Regards,
Jayesh
_____
From: Hyoun-Tae [mailto:hthwang at uwaterloo.ca]
Sent: Thursday, August 14, 2008 10:47 AM
To: jayesh at mcs.anl.gov
Subject: Questions about MPI application
Hello Jayesh,
I just make an MPI application code, which solves non-linear PDEs.
Of the code, I applied parallel computing method only on a matrix solver,
Thomas algorithm.
The scheme of my program follows as:
1. read input data
2. make a matrix
3. solve the matrix by using parallel method
4. print output
As mentioned before, I just parallelized only the matrix solver.
I compiled the application code by using MS visual studio 2005
and executed the application code with using MPIEXEC wrapper.
However, I always got an error massage like (I'm using 2 CPUs)
job aborted:
rank: node: exit code[: error message]
0: hthwang: -1073741819: process 0 exited without calling finalize
1: hthwang: -1073741819: process 1 exited without calling finalize
Could you tell me why I always get the error message and how I can solve
this problem?
When I compile the code by using MS visual studio 2005,
I also get a message on a command window such as:
[0] Error creating mpiexec process .2
[0] launchMpiexecProcess failed
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(294): Initialization failed
MPID_Init(82)....: channel initialization failed
MPID_Init(383).....: PMI_Get_id returned 1
Is this error after compilation related to MPI application error?
Could you tell me how I can solve these problems?
I can send the MPI application code if you need.
Thank you,
HT
**********************************************************************
Hyoun-Tae Hwang
#2051 EIT, Department of Earth and Environmental Sci
University of Waterloo
office : (+1) 519-888-4567 (EXT. 37343)
home : (+1) 519-880-9794
**********************************************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080904/d450088f/attachment.htm>
More information about the mpich-discuss
mailing list