[mpich-discuss] Problem in running a program with MPICH2

Rajeev Thakur thakur at mcs.anl.gov
Wed Jun 22 12:15:02 CDT 2011


You don't need to put IP addresses in the hosts file, just the host names. See the MPICH2 users guide or http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager for how to run a program across machines with MPICH2. Try to get the cpi example running first before running your application.

Rajeev


On Jun 21, 2011, at 11:53 PM, tetsuro_kikuchi at jesc.or.jp wrote:

> 
> Thank you very much for your instructions, Rajeev.
> 
> The machinefile "machine8" that I had tried to use is originally included
> in the package of this program, and it does not match the configuration of
> my conputing system. So, I decided to use the file "hosts" as the
> machinefile instead. (I apologize for confusing you.) I have copied the
> original "hosts" file (located in /etc directory) to the directory where
> the run script of the program is located, and then modified it to activate
> the two IP addresses in the first two lines. (Please see
> "hosts_modified110622.txt" attached below.)
> 
> My workstation has two processors, each of which has 4 cores. So I modified
> a part of the run script to set the number of processors to 2 as follows:
> 
> #> horizontal domain decomposition
> setenv NPCOL_NPROW "2 1"; set NPROCS   =  2 (originally, "NPCOL_NPROW "4
> 2"; set NPROCS = 8")
> 
> where "NPROCS" determines the number of processors, which is equal to the
> product of "NPCOL" and "NPROW". The whole run script I used this time is
> attached below ("run.cctm_script110622.txt").
> 
> After saving all the modifications mentioned above, I run the program.
> However, one process had not completed for more than 25 minutes. (When I
> run the program in single-processor mode, it took only about 10 minutes to
> complete all the processes.) So I pressed Ctrl-C to stop the process. The
> result was the same even when the program was rerun after resetting the
> number of processors as original (8) in the script. Please refer to the run
> log ("run.cctm.log110622.txt") attached below.
> In the run log, I've found the following part:
> 
> Proxy information:
>    *********************
>      [1] proxy: 127.0.0.1 (1 cores)
>      Exec list: /home/jesc/cmaqv4.7.1/scripts/cctm/CCTM_e1a_Linux2_x86_64
> (1 processes);
> 
>      [2] proxy: 127.0.1.1 (1 cores)
>      Exec list: /home/jesc/cmaqv4.7.1/scripts/cctm/CCTM_e1a_Linux2_x86_64
> (1 processes);
> 
> (when the number of processors was set to 2).
> 
> When the number of processors was set to 8, the phrase
> "/home/jesc/.../CCTM_e1a_Linux2_x86_64 (1 processes);" was repeated for 4
> times in both [1] and [2]. Does this mean MPI recognizes only one core in
> each processor, although it actually has 4 cores as mentioned above? For
> reference, I attach the output of the command /sbin/ifconfig
> ("sbin&ifconfig.txt") below.
> 
> Could you again provide me some ideas to deal with this situation?
> 
> Kind regards,
> 
> Tetsuro
> 
> (See attached file: hosts_modified110622.txt)(See attached file:
> run.cctm_script110622.txt) (See attached file: run.cctm.log110622.txt) (See
> attached file: sbin&ifconfig.txt)
> 
> 
> 
> 
> 
> Just do "ssh rain4"
> 
> On Jun 21, 2011, at 12:46 AM, tetsuro_kikuchi at jesc.or.jp wrote:
> 
>> 
>> Thank you very much for your suggestion, Rajeev.
>> 
>> However, I could have not understood how to do ssh between the host and
>> rain4, although I have read several web documents and manuals about ssh.
>> Could you teach me how to do it? The ssh program has been already
> installed
>> in my OS. I'm sorry for taking your time.
>> 
>> Kind regards,
>> 
>> Tetsuro
>> 
>> 
>> 
>> 
>> 
>> There is probably some issue with the networking settings on the
> machines.
>> Is there a firewall? Try doing ssh between the host and rain4.
>> 
>> Rajeev
>> 
>> On Jun 20, 2011, at 9:40 PM, tetsuro_kikuchi at jesc.or.jp wrote:
>> 
>>> 
>>> Thank you for your quick reply, Rajeev.
>>> 
>>> When I run cpi using the same machine file used by my application, it
>>> returned error messages as follows:
>>> 
>>> $ mpiexec -f $M3HOME/scripts/cctm/machines8 -n 8 ./cpi
>>> [mpiexec at jesc-HP-Z800-Workstation] HYDU_sock_is_local
>>> (./utils/sock/sock.c:536): unable to get host address for rain4 (1)
>>> [mpiexec at jesc-HP-Z800-Workstation] main (./ui/mpich/mpiexec.c:356):
>> unable
>>> to check if rain4 is local
>>> 
>>> Tetsuro
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Try running it across machines using the same machines file used by your
>>> application.
>>> 
>>> Rajeev
>>> 
>>> On Jun 20, 2011, at 8:10 PM, tetsuro_kikuchi at jesc.or.jp wrote:
>>> 
>>>> 
>>>> Thank you for your consideration, Rajeev.
>>>> 
>>>> When I run the cpi example in the examples directory ($ mpiexec -n 4
>>>> ./cpi), it calculated pi. The run log file is attached below.
>>>> 
>>>> Kind regards,
>>>> 
>>>> Tetsuro
>>>> 
>>>> (See attached file: cpi_log1_110621.txt)
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Can you first try running the cpi example from the examples directory?
>>>> 
>>>> Rajeev
>>>> 
>>>> On Jun 20, 2011, at 3:14 AM, tetsuro_kikuchi at jesc.or.jp wrote:
>>>> 
>>>>> Hello. I'm a new MPICH2 user.
>>>>> 
>>>>> I would like to run a program on my Linux workstation using MPICH2
>>>> (version
>>>>> 2-1.4). The processors, Linux OS and compiler that my machine uses are
>>> as
>>>>> follows:
>>>>> 
>>>>> Processors: Intel(R) Xeon(R) E5507 processor (2.26 Hz, 4 core) × 2
>>>>> Linux OS: Ubuntu 11.04 (natty)
>>>>> Compiler: Intel(R) Fortran Composer XE 2011 for Linux (Update 4)
>>>>> 
>>>>> When I execute the run script of the program, it failed with the
>>>> following
>>>>> error message:
>>>>> 
>>>>> [mpiexec at jesc-HP-Z800-Workstation] HYDU_sock_is_local
>>>>> (./utils/sock/sock.c:536): unable to get host address for rain4 (1)
>>>>> [mpiexec at jesc-HP-Z800-Workstation] main (./ui/mpich/mpiexec.c:356):
>>>> unable
>>>>> to check if rain4 is local
>>>>> 
>>>>> Could anyone teach me what is wrong and how to solve this problem? I
>>>> attach
>>>>> below the run script of the program ("run.cctm_script.txt") and its
>>> whole
>>>>> error log generated this time ("run.cctm_errorlog110620.txt"). In
>>>> addition,
>>>>> the machinefile as which mpiexec use in this program ("machines8.txt")
>>> is
>>>>> also attached. (Please refer to the bottom part of the attached run
>>>>> script.)
>>>>> 
>>>>> The hostname of my machine is "jesc-HP-Z800-Workstation". The file
>>>> "hosts"
>>>>> located in /etc directory is also attached below.
>>>>> 
>>>>> Kind regards,
>>>>> 
>>>>> Tetsuro
>>>>> 
>>>>> (See attached file: run.cctm_script.txt)(See attached file:
>>>>> run.cctm_errorlog110620.txt)(See attached file: machines8.txt)(See
>>>> attached
>>>>> file: hosts.txt)
>>>>> 
>>>> 
>>> 
>> 
> <run.cctm_script.txt><run.cctm_errorlog110620.txt><machines8.txt><hosts.txt>
> 
>> 
>>> 
>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> 
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> 
>>>> 
>>>> 
>>>> <cpi_log1_110621.txt>_______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> 
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> 
> 
> <hosts_modified110622.txt><run.cctm_script110622.txt><run.cctm.log110622.txt><sbin&ifconfig.txt>_______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list