[mpich-discuss] Problem in running a program with MPICH2

tetsuro_kikuchi at jesc.or.jp tetsuro_kikuchi at jesc.or.jp
Thu Jun 23 19:10:46 CDT 2011


Thank you for your advice, Rajeev.

Following your advice, I changed the hosts file as follows:


localhost
jesc-HP-Z800-Workstation
#127.0.0.1  #localhost
#127.0.1.1  #jesc-HP-Z800-Workstation

# The following lines are desirable for IPv6 capable hosts
#:1     #ip6-localhost ip6-loopback
#fe00:0 #ip6-localnet
#ff00:0 #ip6-mcastprefix
#ff02:1 #ip6-allnodes
#ff02:2 #ip6-allrouters


In addition, I changed the values of several variables in one of source
codes of the program executable.
I recompiled the executable and then run the program. And it finally worked
in success!

Again, thank you very much for your supports.

Kind regards,

Tetsuro





You don't need to put IP addresses in the hosts file, just the host names.
See the MPICH2 users guide or
http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
for how to run a program across machines with MPICH2. Try to get the cpi
example running first before running your application.

Rajeev


On Jun 21, 2011, at 11:53 PM, tetsuro_kikuchi at jesc.or.jp wrote:

>
> Thank you very much for your instructions, Rajeev.
>
> The machinefile "machine8" that I had tried to use is originally included
> in the package of this program, and it does not match the configuration
of
> my conputing system. So, I decided to use the file "hosts" as the
> machinefile instead. (I apologize for confusing you.) I have copied the
> original "hosts" file (located in /etc directory) to the directory where
> the run script of the program is located, and then modified it to
activate
> the two IP addresses in the first two lines. (Please see
> "hosts_modified110622.txt" attached below.)
>
> My workstation has two processors, each of which has 4 cores. So I
modified
> a part of the run script to set the number of processors to 2 as follows:
>
> #> horizontal domain decomposition
> setenv NPCOL_NPROW "2 1"; set NPROCS   =  2 (originally, "NPCOL_NPROW "4
> 2"; set NPROCS = 8")
>
> where "NPROCS" determines the number of processors, which is equal to the
> product of "NPCOL" and "NPROW". The whole run script I used this time is
> attached below ("run.cctm_script110622.txt").
>
> After saving all the modifications mentioned above, I run the program.
> However, one process had not completed for more than 25 minutes. (When I
> run the program in single-processor mode, it took only about 10 minutes
to
> complete all the processes.) So I pressed Ctrl-C to stop the process. The
> result was the same even when the program was rerun after resetting the
> number of processors as original (8) in the script. Please refer to the
run
> log ("run.cctm.log110622.txt") attached below.
> In the run log, I've found the following part:
>
> Proxy information:
>    *********************
>      [1] proxy: 127.0.0.1 (1 cores)
>      Exec list: /home/jesc/cmaqv4.7.1/scripts/cctm/CCTM_e1a_Linux2_x86_64
> (1 processes);
>
>      [2] proxy: 127.0.1.1 (1 cores)
>      Exec list: /home/jesc/cmaqv4.7.1/scripts/cctm/CCTM_e1a_Linux2_x86_64
> (1 processes);
>
> (when the number of processors was set to 2).
>
> When the number of processors was set to 8, the phrase
> "/home/jesc/.../CCTM_e1a_Linux2_x86_64 (1 processes);" was repeated for 4
> times in both [1] and [2]. Does this mean MPI recognizes only one core in
> each processor, although it actually has 4 cores as mentioned above? For
> reference, I attach the output of the command /sbin/ifconfig
> ("sbin&ifconfig.txt") below.
>
> Could you again provide me some ideas to deal with this situation?
>
> Kind regards,
>
> Tetsuro
>
> (See attached file: hosts_modified110622.txt)(See attached file:
> run.cctm_script110622.txt) (See attached file: run.cctm.log110622.txt)
(See
> attached file: sbin&ifconfig.txt)
>
>
>
>
>
> Just do "ssh rain4"
>
> On Jun 21, 2011, at 12:46 AM, tetsuro_kikuchi at jesc.or.jp wrote:
>
>>
>> Thank you very much for your suggestion, Rajeev.
>>
>> However, I could have not understood how to do ssh between the host and
>> rain4, although I have read several web documents and manuals about ssh.
>> Could you teach me how to do it? The ssh program has been already
> installed
>> in my OS. I'm sorry for taking your time.
>>
>> Kind regards,
>>
>> Tetsuro
>>
>>
>>
>>
>>
>> There is probably some issue with the networking settings on the
> machines.
>> Is there a firewall? Try doing ssh between the host and rain4.
>>
>> Rajeev
>>
>> On Jun 20, 2011, at 9:40 PM, tetsuro_kikuchi at jesc.or.jp wrote:
>>
>>>
>>> Thank you for your quick reply, Rajeev.
>>>
>>> When I run cpi using the same machine file used by my application, it
>>> returned error messages as follows:
>>>
>>> $ mpiexec -f $M3HOME/scripts/cctm/machines8 -n 8 ./cpi
>>> [mpiexec at jesc-HP-Z800-Workstation] HYDU_sock_is_local
>>> (./utils/sock/sock.c:536): unable to get host address for rain4 (1)
>>> [mpiexec at jesc-HP-Z800-Workstation] main (./ui/mpich/mpiexec.c:356):
>> unable
>>> to check if rain4 is local
>>>
>>> Tetsuro
>>>
>>>
>>>
>>>
>>>
>>> Try running it across machines using the same machines file used by
your
>>> application.
>>>
>>> Rajeev
>>>
>>> On Jun 20, 2011, at 8:10 PM, tetsuro_kikuchi at jesc.or.jp wrote:
>>>
>>>>
>>>> Thank you for your consideration, Rajeev.
>>>>
>>>> When I run the cpi example in the examples directory ($ mpiexec -n 4
>>>> ./cpi), it calculated pi. The run log file is attached below.
>>>>
>>>> Kind regards,
>>>>
>>>> Tetsuro
>>>>
>>>> (See attached file: cpi_log1_110621.txt)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Can you first try running the cpi example from the examples directory?
>>>>
>>>> Rajeev
>>>>
>>>> On Jun 20, 2011, at 3:14 AM, tetsuro_kikuchi at jesc.or.jp wrote:
>>>>
>>>>> Hello. I'm a new MPICH2 user.
>>>>>
>>>>> I would like to run a program on my Linux workstation using MPICH2
>>>> (version
>>>>> 2-1.4). The processors, Linux OS and compiler that my machine uses
are
>>> as
>>>>> follows:
>>>>>
>>>>> Processors: Intel(R) Xeon(R) E5507 processor (2.26 Hz, 4 core) × 2
>>>>> Linux OS: Ubuntu 11.04 (natty)
>>>>> Compiler: Intel(R) Fortran Composer XE 2011 for Linux (Update 4)
>>>>>
>>>>> When I execute the run script of the program, it failed with the
>>>> following
>>>>> error message:
>>>>>
>>>>> [mpiexec at jesc-HP-Z800-Workstation] HYDU_sock_is_local
>>>>> (./utils/sock/sock.c:536): unable to get host address for rain4 (1)
>>>>> [mpiexec at jesc-HP-Z800-Workstation] main (./ui/mpich/mpiexec.c:356):
>>>> unable
>>>>> to check if rain4 is local
>>>>>
>>>>> Could anyone teach me what is wrong and how to solve this problem? I
>>>> attach
>>>>> below the run script of the program ("run.cctm_script.txt") and its
>>> whole
>>>>> error log generated this time ("run.cctm_errorlog110620.txt"). In
>>>> addition,
>>>>> the machinefile as which mpiexec use in this program
("machines8.txt")
>>> is
>>>>> also attached. (Please refer to the bottom part of the attached run
>>>>> script.)
>>>>>
>>>>> The hostname of my machine is "jesc-HP-Z800-Workstation". The file
>>>> "hosts"
>>>>> located in /etc directory is also attached below.
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Tetsuro
>>>>>
>>>>> (See attached file: run.cctm_script.txt)(See attached file:
>>>>> run.cctm_errorlog110620.txt)(See attached file: machines8.txt)(See
>>>> attached
>>>>> file: hosts.txt)
>>>>>
>>>>
>>>
>>
>
<run.cctm_script.txt><run.cctm_errorlog110620.txt><machines8.txt><hosts.txt>

>
>>
>>>
>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>
>>>>
>>>>
>>>> <cpi_log1_110621.txt>_______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>>
>>
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
>
>
<hosts_modified110622.txt><run.cctm_script110622.txt><run.cctm.log110622.txt><sbin&ifconfig.txt>
_______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss







More information about the mpich-discuss mailing list