[mpich-discuss] how to make the hosts files

Mandar Gurav mandarwce at gmail.com
Thu May 12 22:34:01 CDT 2011


Thanks Reuti for nice article....
Its really helpful for me...

-- Mandar Gurav

On 5/13/11, hyunduk kim <fororigin at gmail.com> wrote:
> Thank you for your response.
>
> Best Regards
> H.D., Kim
>
> 2011/5/12 Mandar Gurav <mandarwce at gmail.com>
>
>> Hi hyunduk !
>>
>> This is not about the total number of processors.... Its about to
>> create number of processes. You can create as many number of processes
>> as you can. Simultaneously, only those many processes(in your case 2 X
>> 6 = 12) will be executing on the actual processors. Other will be
>> waiting for processor quantum (This is Operating system concept.. you
>> can refer to any Operating system book...). As you can see in your
>> computer many processes(programs) are running simultaneously. Only few
>> of the processes are running on the processors and others are waiting
>> for their chance. But, you cannot realize this phenomenon because
>> within a second Operating system switches among different processes
>> for tens of hundreds of times.
>>
>> You can run your program with 20,25,30 ... processes but only few (12
>> in your case) will be executing...
>>
>> -- Mandar Gurav
>>
>> On Thu, May 12, 2011 at 6:41 PM, hyunduk kim <fororigin at gmail.com> wrote:
>> > Dear Pavan
>> > The hostname of My linux machine is francium.ac.kr.
>> > And I removed my machine as your comment.
>> > I received message as like
>> > [root at francium machine]# mpiexec -n 11
>> > /usr/local/mpich2-1.3.2p1/examples/cpi
>> > Process 0 of 11 is on francium
>> > Process 2 of 11 is on francium
>> > Process 3 of 11 is on francium
>> > Process 4 of 11 is on francium
>> > Process 5 of 11 is on francium
>> > Process 7 of 11 is on francium
>> > Process 8 of 11 is on francium
>> > Process 9 of 11 is on francium
>> > Process 10 of 11 is on francium
>> > Process 6 of 11 is on francium
>> > Process 1 of 11 is on francium
>> > pi is approximately 3.1415926544231247, Error is 0.0000000008333316
>> > wall clock time = 0.000453
>> > In above command, the option " -n 11" means that some program is going
>> > to
>> > use the 11 machine.
>> > Then I modified my run command as below message.
>> > [root at francium machine]# mpiexec -n 16
>> > /usr/local/mpich2-1.3.2p1/examples/cpi
>> > Process 0 of 16 is on francium
>> > Process 1 of 16 is on francium
>> > Process 2 of 16 is on francium
>> > Process 3 of 16 is on francium
>> > Process 4 of 16 is on francium
>> > Process 6 of 16 is on francium
>> > Process 7 of 16 is on francium
>> > Process 8 of 16 is on francium
>> > Process 9 of 16 is on francium
>> > Process 12 of 16 is on francium
>> > Process 14 of 16 is on francium
>> > Process 10 of 16 is on francium
>> > Process 15 of 16 is on francium
>> > Process 11 of 16 is on francium
>> > Process 13 of 16 is on francium
>> > Process 5 of 16 is on francium
>> > pi is approximately 3.1415926544231274, Error is 0.0000000008333343
>> > wall clock time = 0.000500
>> > In this command, I expected the error message because my linux machine
>> > is
>> > composed of 2 CPU, and each CPU has the 6 core.(Then my machine for
>> mpich2
>> > is just 12.)
>> > Question is the meaning of the option "-n" in execute command.
>> > Thank for your kindness
>> >
>> > H.D., Kim
>> >
>> >
>> >
>> >
>> >
>> > 2011/5/12 Pavan Balaji <balaji at mcs.anl.gov>
>> >>
>> >> Is there an actual machine with the name "host1" or "host2" in your
>> setup?
>> >>
>> >> If you are just running it on the local node, you should not give the
>> >> -machinefile or -f option.
>> >>
>> >>  -- Pavan
>> >>
>> >> On 05/12/2011 03:28 AM, hyunduk kim wrote:
>> >>>
>> >>> Thanks for your response
>> >>> However, my setup is not working.
>> >>>
>> >>> In my check progress.
>> >>> 1) I installed mpich2 on intel muti-core 2 cpu machine
>> >>> 2) check : /etc/hosts file
>> >>>     127.0.0.1               localhost.localdomain localhost
>> >>>     ::1                        localhost6.localdomain6 localhost6
>> >>>
>> >>> 3) made the machinefile for mpiexec :
>> >>> /usr/local/mpich2/machine/machinefile
>> >>>
>> >>>  host1:6
>> >>>  host2:6
>> >>>
>> >>> 4) run : [root at francium machine]# mpiexec -n 10 -machinefile
>> >>> ./machinefile /usr/local/mpich2-1.3.2p1/examples/cpi
>> >>>    ==> I received messages as below
>> >>>          ssh: connect to host host1 port 22: Connection timed out
>> >>>          ssh: connect to host host2 port 22: Connection timed out
>> >>>
>> >>>   Question is :
>> >>> 1) why do I setup passwordless login among the two hosts?
>> >>> 2) Mpich2 was installed on the just multi-core 2 cpu machine. Why dose
>> >>> the mpiexec try to connect host1 and host2 using port 22 ?
>> >>> 3) Is there other method for defining the machinefile on the
>> >>> multi-core
>> >>> 2 cpu machine ?
>> >>>
>> >>>  I will attach my log files.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> mpich-discuss mailing list
>> >>> mpich-discuss at mcs.anl.gov
>> >>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> >>
>> >> --
>> >> Pavan Balaji
>> >> http://www.mcs.anl.gov/~balaji
>> >
>> >
>>
>>
>>
>> --
>> Mandar Gurav
>> http://www.mandargurav.org
>>
>


-- 
Mandar Gurav
http://www.mandargurav.org


More information about the mpich-discuss mailing list