[MPICH] -nolocal for mpiexec ?

Wei-keng Liao wkliao at ece.northwestern.edu
Tue Aug 1 17:20:35 CDT 2006


mpdtrace did tell me c4 is not include. After rerun mpdboot with
% mpdboot -n 5 -f mpd.hosts
and mpiexec -machinefile works fine now. Thanks.

Is there a way to setup this -nolocal during the start of mpdboot? So, I 
don't need to specify -machinefile each time I ran mpiexec.

Wei-keng



On Tue, 1 Aug 2006, Ralph Butler wrote:

> Note that mpdboot always puts an mpd on the local host.  Thus, if you start a 
> ring
> with mpdboot using "-n 4", you are getting one local and 3 remote.   So, in 
> the
> problem below, c4 is actually not in the ring and thus is not found when 
> trying
> to start processes via the machinefile that mentions it by name.  mpdtrace
> should verify that c4 is not in the ring.
> --ralph
>
> On TueAug 1, at Tue Aug 1 4:44PM, Wei-keng Liao wrote:
>
>> 
>> I just tested mpdcheck on each of the compute nodes with the host machine. 
>> They are all fine. The strange thing is if I did not use -machinefile 
>> option, they all turn out OK without such error messages.
>> 
>> Wei-keng
>> 
>> 
>> On Tue, 1 Aug 2006, Rajeev Thakur wrote:
>> 
>>> There is something wrong with the networking setup on c4 then. It says
>>> invalid machine name. Can you ssh to it? If you can, and cannot detect any
>>> other problem, then try running the mpdcheck utility as described in the
>>> install guide.
>>> 
>>> Rajeev
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Wei-keng Liao [mailto:wkliao at ece.northwestern.edu]
>>>> Sent: Tuesday, August 01, 2006 4:20 PM
>>>> To: Rajeev Thakur
>>>> Cc: mpich-discuss at mcs.anl.gov
>>>> Subject: RE: [MPICH] -nolocal for mpiexec ?
>>>> 
>>>> Rajeev,
>>>> 
>>>> I tried that but I don't know why it is not working on my machine.
>>>> Here is my mpd.hosts file used in mpdboot
>>>> % cat mpd.hosts
>>>> c1
>>>> c2
>>>> c3
>>>> c4
>>>> 
>>>> My host machine is not in mpd.hosts.
>>>> % mpdboot -n 4 -f mpd.hosts
>>>> % cat machines
>>>> c1
>>>> c2
>>>> c3
>>>> c4
>>>> c1
>>>> c2
>>>> c3
>>>> c4
>>>> % mpiexec -machinefile machines -n 4 hello
>>>> mpiexec: unable to start all procs; may have invalid machine names
>>>>      remaining specified hosts:
>>>>          192.168.1.14 (c4)
>>>> 
>>>> Using 3 and less nodes are fine. I ran all these on the host machine.
>>>> 
>>>> Wei-keng
>>>> 
>>>> 
>>>> On Tue, 1 Aug 2006, Rajeev Thakur wrote:
>>>> 
>>>>> Wei-keng,
>>>>>         Let's say you have 4 machines: host, node1, node2,
>>>> node3. You run
>>>>> mpiexec from host and want the jobs to run only the 3
>>>> nodes. Here's what you
>>>>> do:
>>>>> 
>>>>> * From host, start an MPD ring on all 4 machines using mpdboot.
>>>>> 
>>>>> * Create a machine file containing
>>>>> node1
>>>>> node2
>>>>> node3
>>>>> node1
>>>>> node2
>>>>> node3
>>>>> (repeated as many times as needed to cover the maximum
>>>> number of processes
>>>>> you want to run).
>>>>> 
>>>>> * Then run the job from host as
>>>>> mpiexec -machinefile FILE -n NPROCS a.out
>>>>> 
>>>>> NPROCS has to be <= the number of machines listed in the
>>>> machinefile.
>>>>> 
>>>>> Rajeev
>>>>> 
>>>>> 
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Wei-keng Liao [mailto:wkliao at ece.northwestern.edu]
>>>>>> Sent: Monday, July 31, 2006 10:58 PM
>>>>>> To: Rajeev Thakur
>>>>>> Cc: mpich-discuss at mcs.anl.gov
>>>>>> Subject: RE: [MPICH] -nolocal for mpiexec ?
>>>>>> 
>>>>>> I tried -1, not working.
>>>>>> Based on mpiexec help page, it just tries not to run the 1st
>>>>>> proc locally.
>>>>>> So, the local machine eventually appears as one of the MPI
>>>> node with
>>>>>> higher rank.
>>>>>> 
>>>>>> Wei-keng
>>>>>> 
>>>>>> 
>>>>>> On Mon, 31 Jul 2006, Rajeev Thakur wrote:
>>>>>> 
>>>>>>> Try the -1 option to mpiexec.
>>>>>>> 
>>>>>>> Rajeev
>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
>>>> Wei-keng Liao
>>>>>>>> Sent: Monday, July 31, 2006 8:17 PM
>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>> Subject: [MPICH] -nolocal for mpiexec ?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> How do I run mpdboot and mpiexec so I can run MPI jobs
>>>> on non-local
>>>>>>>> machines? In mpich1, mpirun has an option -nolocal for not
>>>>>>>> running job on
>>>>>>>> local machine. How do I achieve the same effect iof -nolocal
>>>>>>>> on mpich2?
>>>>>>>> 
>>>>>>>> Wei-keng
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> 
>> 
>




More information about the mpich-discuss mailing list