[mpich-discuss] Problem in running a program with MPICH2

tetsuro_kikuchi at jesc.or.jp tetsuro_kikuchi at jesc.or.jp
Tue Jun 21 23:53:25 CDT 2011


Thank you very much for your instructions, Rajeev.

The machinefile "machine8" that I had tried to use is originally included
in the package of this program, and it does not match the configuration of
my conputing system. So, I decided to use the file "hosts" as the
machinefile instead. (I apologize for confusing you.) I have copied the
original "hosts" file (located in /etc directory) to the directory where
the run script of the program is located, and then modified it to activate
the two IP addresses in the first two lines. (Please see
"hosts_modified110622.txt" attached below.)

My workstation has two processors, each of which has 4 cores. So I modified
a part of the run script to set the number of processors to 2 as follows:

#> horizontal domain decomposition
setenv NPCOL_NPROW "2 1"; set NPROCS   =  2 (originally, "NPCOL_NPROW "4
2"; set NPROCS = 8")

where "NPROCS" determines the number of processors, which is equal to the
product of "NPCOL" and "NPROW". The whole run script I used this time is
attached below ("run.cctm_script110622.txt").

After saving all the modifications mentioned above, I run the program.
However, one process had not completed for more than 25 minutes. (When I
run the program in single-processor mode, it took only about 10 minutes to
complete all the processes.) So I pressed Ctrl-C to stop the process. The
result was the same even when the program was rerun after resetting the
number of processors as original (8) in the script. Please refer to the run
log ("run.cctm.log110622.txt") attached below.
In the run log, I've found the following part:

Proxy information:
    *********************
      [1] proxy: 127.0.0.1 (1 cores)
      Exec list: /home/jesc/cmaqv4.7.1/scripts/cctm/CCTM_e1a_Linux2_x86_64
(1 processes);

      [2] proxy: 127.0.1.1 (1 cores)
      Exec list: /home/jesc/cmaqv4.7.1/scripts/cctm/CCTM_e1a_Linux2_x86_64
(1 processes);

(when the number of processors was set to 2).

When the number of processors was set to 8, the phrase
"/home/jesc/.../CCTM_e1a_Linux2_x86_64 (1 processes);" was repeated for 4
times in both [1] and [2]. Does this mean MPI recognizes only one core in
each processor, although it actually has 4 cores as mentioned above? For
reference, I attach the output of the command /sbin/ifconfig
("sbin&ifconfig.txt") below.

Could you again provide me some ideas to deal with this situation?

Kind regards,

Tetsuro

(See attached file: hosts_modified110622.txt)(See attached file:
run.cctm_script110622.txt) (See attached file: run.cctm.log110622.txt) (See
attached file: sbin&ifconfig.txt)





Just do "ssh rain4"

On Jun 21, 2011, at 12:46 AM, tetsuro_kikuchi at jesc.or.jp wrote:

>
> Thank you very much for your suggestion, Rajeev.
>
> However, I could have not understood how to do ssh between the host and
> rain4, although I have read several web documents and manuals about ssh.
> Could you teach me how to do it? The ssh program has been already
installed
> in my OS. I'm sorry for taking your time.
>
> Kind regards,
>
> Tetsuro
>
>
>
>
>
> There is probably some issue with the networking settings on the
machines.
> Is there a firewall? Try doing ssh between the host and rain4.
>
> Rajeev
>
> On Jun 20, 2011, at 9:40 PM, tetsuro_kikuchi at jesc.or.jp wrote:
>
>>
>> Thank you for your quick reply, Rajeev.
>>
>> When I run cpi using the same machine file used by my application, it
>> returned error messages as follows:
>>
>> $ mpiexec -f $M3HOME/scripts/cctm/machines8 -n 8 ./cpi
>> [mpiexec at jesc-HP-Z800-Workstation] HYDU_sock_is_local
>> (./utils/sock/sock.c:536): unable to get host address for rain4 (1)
>> [mpiexec at jesc-HP-Z800-Workstation] main (./ui/mpich/mpiexec.c:356):
> unable
>> to check if rain4 is local
>>
>> Tetsuro
>>
>>
>>
>>
>>
>> Try running it across machines using the same machines file used by your
>> application.
>>
>> Rajeev
>>
>> On Jun 20, 2011, at 8:10 PM, tetsuro_kikuchi at jesc.or.jp wrote:
>>
>>>
>>> Thank you for your consideration, Rajeev.
>>>
>>> When I run the cpi example in the examples directory ($ mpiexec -n 4
>>> ./cpi), it calculated pi. The run log file is attached below.
>>>
>>> Kind regards,
>>>
>>> Tetsuro
>>>
>>> (See attached file: cpi_log1_110621.txt)
>>>
>>>
>>>
>>>
>>>
>>> Can you first try running the cpi example from the examples directory?
>>>
>>> Rajeev
>>>
>>> On Jun 20, 2011, at 3:14 AM, tetsuro_kikuchi at jesc.or.jp wrote:
>>>
>>>> Hello. I'm a new MPICH2 user.
>>>>
>>>> I would like to run a program on my Linux workstation using MPICH2
>>> (version
>>>> 2-1.4). The processors, Linux OS and compiler that my machine uses are
>> as
>>>> follows:
>>>>
>>>> Processors: Intel(R) Xeon(R) E5507 processor (2.26 Hz, 4 core) × 2
>>>> Linux OS: Ubuntu 11.04 (natty)
>>>> Compiler: Intel(R) Fortran Composer XE 2011 for Linux (Update 4)
>>>>
>>>> When I execute the run script of the program, it failed with the
>>> following
>>>> error message:
>>>>
>>>> [mpiexec at jesc-HP-Z800-Workstation] HYDU_sock_is_local
>>>> (./utils/sock/sock.c:536): unable to get host address for rain4 (1)
>>>> [mpiexec at jesc-HP-Z800-Workstation] main (./ui/mpich/mpiexec.c:356):
>>> unable
>>>> to check if rain4 is local
>>>>
>>>> Could anyone teach me what is wrong and how to solve this problem? I
>>> attach
>>>> below the run script of the program ("run.cctm_script.txt") and its
>> whole
>>>> error log generated this time ("run.cctm_errorlog110620.txt"). In
>>> addition,
>>>> the machinefile as which mpiexec use in this program ("machines8.txt")
>> is
>>>> also attached. (Please refer to the bottom part of the attached run
>>>> script.)
>>>>
>>>> The hostname of my machine is "jesc-HP-Z800-Workstation". The file
>>> "hosts"
>>>> located in /etc directory is also attached below.
>>>>
>>>> Kind regards,
>>>>
>>>> Tetsuro
>>>>
>>>> (See attached file: run.cctm_script.txt)(See attached file:
>>>> run.cctm_errorlog110620.txt)(See attached file: machines8.txt)(See
>>> attached
>>>> file: hosts.txt)
>>>>
>>>
>>
>
<run.cctm_script.txt><run.cctm_errorlog110620.txt><machines8.txt><hosts.txt>

>
>>
>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>>>
>>>
>>> <cpi_log1_110621.txt>_______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>
>>
>>
>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
>
>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



-------------- next part --------------
A non-text attachment was scrubbed...
Name: hosts_modified110622.txt
Type: application/octet-stream
Size: 260 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110622/3f222b6b/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run.cctm_script110622.txt
Type: application/octet-stream
Size: 6773 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110622/3f222b6b/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run.cctm.log110622.txt
Type: application/octet-stream
Size: 53876 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110622/3f222b6b/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sbin&ifconfig.txt
Type: application/octet-stream
Size: 1142 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110622/3f222b6b/attachment-0007.obj>


More information about the mpich-discuss mailing list