[MPICH] -nolocal for mpiexec ?

Ralph Butler rbutler at mtsu.edu
Tue Aug 1 17:10:35 CDT 2006


Note that mpdboot always puts an mpd on the local host.  Thus, if you  
start a ring
with mpdboot using "-n 4", you are getting one local and 3 remote.    
So, in the
problem below, c4 is actually not in the ring and thus is not found  
when trying
to start processes via the machinefile that mentions it by name.   
mpdtrace
should verify that c4 is not in the ring.
--ralph

On TueAug 1, at Tue Aug 1 4:44PM, Wei-keng Liao wrote:

>
> I just tested mpdcheck on each of the compute nodes with the host  
> machine. They are all fine. The strange thing is if I did not use - 
> machinefile option, they all turn out OK without such error messages.
>
> Wei-keng
>
>
> On Tue, 1 Aug 2006, Rajeev Thakur wrote:
>
>> There is something wrong with the networking setup on c4 then. It  
>> says
>> invalid machine name. Can you ssh to it? If you can, and cannot  
>> detect any
>> other problem, then try running the mpdcheck utility as described  
>> in the
>> install guide.
>>
>> Rajeev
>>
>>
>>> -----Original Message-----
>>> From: Wei-keng Liao [mailto:wkliao at ece.northwestern.edu]
>>> Sent: Tuesday, August 01, 2006 4:20 PM
>>> To: Rajeev Thakur
>>> Cc: mpich-discuss at mcs.anl.gov
>>> Subject: RE: [MPICH] -nolocal for mpiexec ?
>>>
>>> Rajeev,
>>>
>>> I tried that but I don't know why it is not working on my machine.
>>> Here is my mpd.hosts file used in mpdboot
>>> % cat mpd.hosts
>>> c1
>>> c2
>>> c3
>>> c4
>>>
>>> My host machine is not in mpd.hosts.
>>> % mpdboot -n 4 -f mpd.hosts
>>> % cat machines
>>> c1
>>> c2
>>> c3
>>> c4
>>> c1
>>> c2
>>> c3
>>> c4
>>> % mpiexec -machinefile machines -n 4 hello
>>> mpiexec: unable to start all procs; may have invalid machine names
>>>      remaining specified hosts:
>>>          192.168.1.14 (c4)
>>>
>>> Using 3 and less nodes are fine. I ran all these on the host  
>>> machine.
>>>
>>> Wei-keng
>>>
>>>
>>> On Tue, 1 Aug 2006, Rajeev Thakur wrote:
>>>
>>>> Wei-keng,
>>>>         Let's say you have 4 machines: host, node1, node2,
>>> node3. You run
>>>> mpiexec from host and want the jobs to run only the 3
>>> nodes. Here's what you
>>>> do:
>>>>
>>>> * From host, start an MPD ring on all 4 machines using mpdboot.
>>>>
>>>> * Create a machine file containing
>>>> node1
>>>> node2
>>>> node3
>>>> node1
>>>> node2
>>>> node3
>>>> (repeated as many times as needed to cover the maximum
>>> number of processes
>>>> you want to run).
>>>>
>>>> * Then run the job from host as
>>>> mpiexec -machinefile FILE -n NPROCS a.out
>>>>
>>>> NPROCS has to be <= the number of machines listed in the
>>> machinefile.
>>>>
>>>> Rajeev
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Wei-keng Liao [mailto:wkliao at ece.northwestern.edu]
>>>>> Sent: Monday, July 31, 2006 10:58 PM
>>>>> To: Rajeev Thakur
>>>>> Cc: mpich-discuss at mcs.anl.gov
>>>>> Subject: RE: [MPICH] -nolocal for mpiexec ?
>>>>>
>>>>> I tried -1, not working.
>>>>> Based on mpiexec help page, it just tries not to run the 1st
>>>>> proc locally.
>>>>> So, the local machine eventually appears as one of the MPI
>>> node with
>>>>> higher rank.
>>>>>
>>>>> Wei-keng
>>>>>
>>>>>
>>>>> On Mon, 31 Jul 2006, Rajeev Thakur wrote:
>>>>>
>>>>>> Try the -1 option to mpiexec.
>>>>>>
>>>>>> Rajeev
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
>>> Wei-keng Liao
>>>>>>> Sent: Monday, July 31, 2006 8:17 PM
>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>> Subject: [MPICH] -nolocal for mpiexec ?
>>>>>>>
>>>>>>>
>>>>>>> How do I run mpdboot and mpiexec so I can run MPI jobs
>>> on non-local
>>>>>>> machines? In mpich1, mpirun has an option -nolocal for not
>>>>>>> running job on
>>>>>>> local machine. How do I achieve the same effect iof -nolocal
>>>>>>> on mpich2?
>>>>>>>
>>>>>>> Wei-keng
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>




More information about the mpich-discuss mailing list