[MPICH] -nolocal for mpiexec ?
Wei-keng Liao
wkliao at ece.northwestern.edu
Tue Aug 1 17:20:35 CDT 2006
mpdtrace did tell me c4 is not include. After rerun mpdboot with
% mpdboot -n 5 -f mpd.hosts
and mpiexec -machinefile works fine now. Thanks.
Is there a way to setup this -nolocal during the start of mpdboot? So, I
don't need to specify -machinefile each time I ran mpiexec.
Wei-keng
On Tue, 1 Aug 2006, Ralph Butler wrote:
> Note that mpdboot always puts an mpd on the local host. Thus, if you start a
> ring
> with mpdboot using "-n 4", you are getting one local and 3 remote. So, in
> the
> problem below, c4 is actually not in the ring and thus is not found when
> trying
> to start processes via the machinefile that mentions it by name. mpdtrace
> should verify that c4 is not in the ring.
> --ralph
>
> On TueAug 1, at Tue Aug 1 4:44PM, Wei-keng Liao wrote:
>
>>
>> I just tested mpdcheck on each of the compute nodes with the host machine.
>> They are all fine. The strange thing is if I did not use -machinefile
>> option, they all turn out OK without such error messages.
>>
>> Wei-keng
>>
>>
>> On Tue, 1 Aug 2006, Rajeev Thakur wrote:
>>
>>> There is something wrong with the networking setup on c4 then. It says
>>> invalid machine name. Can you ssh to it? If you can, and cannot detect any
>>> other problem, then try running the mpdcheck utility as described in the
>>> install guide.
>>>
>>> Rajeev
>>>
>>>
>>>> -----Original Message-----
>>>> From: Wei-keng Liao [mailto:wkliao at ece.northwestern.edu]
>>>> Sent: Tuesday, August 01, 2006 4:20 PM
>>>> To: Rajeev Thakur
>>>> Cc: mpich-discuss at mcs.anl.gov
>>>> Subject: RE: [MPICH] -nolocal for mpiexec ?
>>>>
>>>> Rajeev,
>>>>
>>>> I tried that but I don't know why it is not working on my machine.
>>>> Here is my mpd.hosts file used in mpdboot
>>>> % cat mpd.hosts
>>>> c1
>>>> c2
>>>> c3
>>>> c4
>>>>
>>>> My host machine is not in mpd.hosts.
>>>> % mpdboot -n 4 -f mpd.hosts
>>>> % cat machines
>>>> c1
>>>> c2
>>>> c3
>>>> c4
>>>> c1
>>>> c2
>>>> c3
>>>> c4
>>>> % mpiexec -machinefile machines -n 4 hello
>>>> mpiexec: unable to start all procs; may have invalid machine names
>>>> remaining specified hosts:
>>>> 192.168.1.14 (c4)
>>>>
>>>> Using 3 and less nodes are fine. I ran all these on the host machine.
>>>>
>>>> Wei-keng
>>>>
>>>>
>>>> On Tue, 1 Aug 2006, Rajeev Thakur wrote:
>>>>
>>>>> Wei-keng,
>>>>> Let's say you have 4 machines: host, node1, node2,
>>>> node3. You run
>>>>> mpiexec from host and want the jobs to run only the 3
>>>> nodes. Here's what you
>>>>> do:
>>>>>
>>>>> * From host, start an MPD ring on all 4 machines using mpdboot.
>>>>>
>>>>> * Create a machine file containing
>>>>> node1
>>>>> node2
>>>>> node3
>>>>> node1
>>>>> node2
>>>>> node3
>>>>> (repeated as many times as needed to cover the maximum
>>>> number of processes
>>>>> you want to run).
>>>>>
>>>>> * Then run the job from host as
>>>>> mpiexec -machinefile FILE -n NPROCS a.out
>>>>>
>>>>> NPROCS has to be <= the number of machines listed in the
>>>> machinefile.
>>>>>
>>>>> Rajeev
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Wei-keng Liao [mailto:wkliao at ece.northwestern.edu]
>>>>>> Sent: Monday, July 31, 2006 10:58 PM
>>>>>> To: Rajeev Thakur
>>>>>> Cc: mpich-discuss at mcs.anl.gov
>>>>>> Subject: RE: [MPICH] -nolocal for mpiexec ?
>>>>>>
>>>>>> I tried -1, not working.
>>>>>> Based on mpiexec help page, it just tries not to run the 1st
>>>>>> proc locally.
>>>>>> So, the local machine eventually appears as one of the MPI
>>>> node with
>>>>>> higher rank.
>>>>>>
>>>>>> Wei-keng
>>>>>>
>>>>>>
>>>>>> On Mon, 31 Jul 2006, Rajeev Thakur wrote:
>>>>>>
>>>>>>> Try the -1 option to mpiexec.
>>>>>>>
>>>>>>> Rajeev
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
>>>> Wei-keng Liao
>>>>>>>> Sent: Monday, July 31, 2006 8:17 PM
>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>> Subject: [MPICH] -nolocal for mpiexec ?
>>>>>>>>
>>>>>>>>
>>>>>>>> How do I run mpdboot and mpiexec so I can run MPI jobs
>>>> on non-local
>>>>>>>> machines? In mpich1, mpirun has an option -nolocal for not
>>>>>>>> running job on
>>>>>>>> local machine. How do I achieve the same effect iof -nolocal
>>>>>>>> on mpich2?
>>>>>>>>
>>>>>>>> Wei-keng
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
More information about the mpich-discuss
mailing list