[MPICH] -nolocal for mpiexec ?

Wei-keng Liao wkliao at ece.northwestern.edu
Tue Aug 1 16:44:58 CDT 2006


I just tested mpdcheck on each of the compute nodes with the host 
machine. They are all fine. The strange thing is if I did not use 
-machinefile option, they all turn out OK without such error messages.

Wei-keng


On Tue, 1 Aug 2006, Rajeev Thakur wrote:

> There is something wrong with the networking setup on c4 then. It says
> invalid machine name. Can you ssh to it? If you can, and cannot detect any
> other problem, then try running the mpdcheck utility as described in the
> install guide.
>
> Rajeev
>
>
>> -----Original Message-----
>> From: Wei-keng Liao [mailto:wkliao at ece.northwestern.edu]
>> Sent: Tuesday, August 01, 2006 4:20 PM
>> To: Rajeev Thakur
>> Cc: mpich-discuss at mcs.anl.gov
>> Subject: RE: [MPICH] -nolocal for mpiexec ?
>>
>> Rajeev,
>>
>> I tried that but I don't know why it is not working on my machine.
>> Here is my mpd.hosts file used in mpdboot
>> % cat mpd.hosts
>> c1
>> c2
>> c3
>> c4
>>
>> My host machine is not in mpd.hosts.
>> % mpdboot -n 4 -f mpd.hosts
>> % cat machines
>> c1
>> c2
>> c3
>> c4
>> c1
>> c2
>> c3
>> c4
>> % mpiexec -machinefile machines -n 4 hello
>> mpiexec: unable to start all procs; may have invalid machine names
>>      remaining specified hosts:
>>          192.168.1.14 (c4)
>>
>> Using 3 and less nodes are fine. I ran all these on the host machine.
>>
>> Wei-keng
>>
>>
>> On Tue, 1 Aug 2006, Rajeev Thakur wrote:
>>
>>> Wei-keng,
>>>         Let's say you have 4 machines: host, node1, node2,
>> node3. You run
>>> mpiexec from host and want the jobs to run only the 3
>> nodes. Here's what you
>>> do:
>>>
>>> * From host, start an MPD ring on all 4 machines using mpdboot.
>>>
>>> * Create a machine file containing
>>> node1
>>> node2
>>> node3
>>> node1
>>> node2
>>> node3
>>> (repeated as many times as needed to cover the maximum
>> number of processes
>>> you want to run).
>>>
>>> * Then run the job from host as
>>> mpiexec -machinefile FILE -n NPROCS a.out
>>>
>>> NPROCS has to be <= the number of machines listed in the
>> machinefile.
>>>
>>> Rajeev
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Wei-keng Liao [mailto:wkliao at ece.northwestern.edu]
>>>> Sent: Monday, July 31, 2006 10:58 PM
>>>> To: Rajeev Thakur
>>>> Cc: mpich-discuss at mcs.anl.gov
>>>> Subject: RE: [MPICH] -nolocal for mpiexec ?
>>>>
>>>> I tried -1, not working.
>>>> Based on mpiexec help page, it just tries not to run the 1st
>>>> proc locally.
>>>> So, the local machine eventually appears as one of the MPI
>> node with
>>>> higher rank.
>>>>
>>>> Wei-keng
>>>>
>>>>
>>>> On Mon, 31 Jul 2006, Rajeev Thakur wrote:
>>>>
>>>>> Try the -1 option to mpiexec.
>>>>>
>>>>> Rajeev
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
>> Wei-keng Liao
>>>>>> Sent: Monday, July 31, 2006 8:17 PM
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Subject: [MPICH] -nolocal for mpiexec ?
>>>>>>
>>>>>>
>>>>>> How do I run mpdboot and mpiexec so I can run MPI jobs
>> on non-local
>>>>>> machines? In mpich1, mpirun has an option -nolocal for not
>>>>>> running job on
>>>>>> local machine. How do I achieve the same effect iof -nolocal
>>>>>> on mpich2?
>>>>>>
>>>>>> Wei-keng
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>




More information about the mpich-discuss mailing list