[mpich-discuss] mpd and login node

Stephen Siegel siegel at cis.udel.edu
Sat Feb 14 09:33:08 CST 2009


Rajeev: if I use the -1 option with mpiexec, and my cluster has 8  
nodes, that when I run mpiexec with 9 processes, the 9th process I  
THINK gets run on the local node, which I don't want to happen.

In any case, I found another way to handle the situation which also  
takes into account that each node has two processors, so I want ranks  
0 and 1 to map to node1, ranks 1 and 2 to map to node2 etc.   What I  
did was this....(porsche is the local node, the others have names  
node1=honda, ...):

siegel at porsche ~ $ cat ~/mpd.hosts   /* I will also use this file as  
machinefile for mpiexec...*/
node1:2
node2:2
node3:2
node4:2
node5:2
node6:2
node7:2
node8:2
siegel at porsche ~ $ mpdboot -n 9    /* NOTE: 9=8+1 because the local  
host counts as 1 in mpd ring */
siegel at porsche ~ $ mpdtrace     /* all 9 machines appear...*/
porsche
nissan
honda
subaru
toyota
suzuki
acura
mazda
hyundai
siegel at porsche ~ $ cd code/mpi/hello/
siegel at porsche ~/code/mpi/hello $ cat hello2.c
#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<mpi.h>

int main(int argc,char *argv[]) {
   int rank;
   char *name = (char*)malloc(200*sizeof(char));

   gethostname(name, 200);
   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   printf("Hello from process %d running on %s.\n", rank, name);
   fflush(stdout);
   MPI_Finalize();
   return 0;
}

siegel at porsche ~/code/mpi/hello $ mpiexec -machinefile ~/mpd.hosts -n  
16 ./hello2
Hello from process 0 running on honda.cis.udel.edu.
Hello from process 4 running on subaru.cis.udel.edu.
Hello from process 2 running on toyota.cis.udel.edu.
Hello from process 1 running on honda.cis.udel.edu.
Hello from process 5 running on subaru.cis.udel.edu.
Hello from process 14 running on suzuki.cis.udel.edu.
Hello from process 3 running on toyota.cis.udel.edu.
Hello from process 10 running on acura.cis.udel.edu.
Hello from process 15 running on suzuki.cis.udel.edu.
Hello from process 12 running on mazda.cis.udel.edu.
Hello from process 13 running on mazda.cis.udel.edu.
Hello from process 11 running on acura.cis.udel.edu.
Hello from process 8 running on hyundai.cis.udel.edu.
Hello from process 6 running on nissan.cis.udel.edu.
Hello from process 9 running on hyundai.cis.udel.edu.
Hello from process 7 running on nissan.cis.udel.edu.

siegel at porsche ~/code/mpi/hello $

Looks like exactly what I want: 2 procs on each node, and no proc  
running on local node.

Does this seem like a reasonable way to set things up?

-Steve

On Feb 13, 2009, at 10:24 PM, Rajeev Thakur wrote:

> Use the "-1" option to mpiexec to prevent the first process from  
> being run
> on the local node.
>
> Rajeev
>
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Stephen  
>> Siegel
>> Sent: Friday, February 13, 2009 7:21 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] mpd and login node
>>
>> No queuing system.   Each user just runs mpd, and has a machine file
>> that lists the nodes in their home directory.   They are supposed to
>> terminate the mpd ring when they are done.   We are not too
>> concerned
>> with overloading---this is just being used by an introductory
>> parallel programming class.  A queuing system would probably
>> be a good idea though.  All the machines are running linux.
>>
>> On Feb 13, 2009, at 8:14 PM, Reuti wrote:
>>
>>> Am 14.02.2009 um 02:06 schrieb Stephen Siegel:
>>>
>>>> I am running mpich2 with mpd on a small cluster.   We have a
>>>> separate logon node which users will log on to, compile, and
>>>> execute programs using mpiexec.   We want to configure mpd
>> so that
>>>> it will distribute the processes over the nodes of the cluster,
>>>> excluding the logon node.   For some reason we could not
>> figure out
>>>> how to do this----the logon node itself ends up getting
>> some of the
>>>> processes.   Does anyone know how to do this?  Thanks.
>>>
>>> How are you distributing other jobs right now and avoid
>> overloading of
>>> certain machines - you have a queuing system like SUN Grid Engine?
>>>
>>> -- Reuti


More information about the mpich-discuss mailing list