[MPICH] MPICH2 startup w/ PBS

Adam Hock ahock at ittc.ku.edu
Tue Apr 4 11:49:32 CDT 2006


I have spent some time developing the following scripts. They are not 
perfect but they do get the job done. I use them a prologue and epilogue 
scripts. These scripts need to be put in /var/spool/PBS/mom_priv on each 
node and have the premission of 755

Here are a list of variables you will need to change to make it work on 
your system.

$mpich = "/bio/tools/mpich/mpich2-1.0.2-PGI-6.0/bin";
$HOME = "/bio/users";
$log = "/bio/tools/admin/report/log/epilogue.$date";

The log is for getting info about jobs and is not needed.

Feel free to contact me if you have any questions.

Here is the prologue
-------------------------------------------------------------------------
#!/usr/bin/perl
## prologue script
## for PBSpro to play nice and work with mpich2 daemon system
## Adam Hock - University of Kansas
## 02-07-2006
## Version 1.6

#NOTES#
#
# This script is ran as root and needs to have permissions of 644
# It also needs to be called prologue
# Put this file in ~PBS_INSTALL/mom_priv of all nodes

# The 10 arguments pbs passed in.
  $arg1 = shift;  # the job id
  $arg2 = shift;  # the user name under which the job executes
  $arg3 = shift;  # the group name under whch the job executes
  $arg4 = shift;  # the job name / not given in prologue
  $arg5 = shift;  # the session id / not given in prologue
  $arg6 = shift;  # the requested resource limits (list) / not given in 
prologue
  $arg7 = shift;  # the list of resources used / not given in prologue
  $arg8 = shift;  # the name of the queue in which the ob resides / not 
given in prologue

  $arg9 = shift;  # the account string, if one exits / not given in prologue
  $arg10 = shift;  # the exit status of the job.

#set to pbs_pro or torque
#pbs_pro = 1
#torque = 0
$mode = 1;

#set divider
$set = 0;
$divide = 0;

#path variables
$mpich = "/bio/tools/mpich/mpich2-1.0.2-PGI-6.0/bin";
$pbs = "/usr/pbs/bin";
$qstat_cm = "qstat -n -1 -u";

#files to make this all work nice and neat
  $output = "> /tmp/qstat.$arg1";
  $input  = "/tmp/qstat.$arg1";
  $node_list = "> /tmp/node_list.$arg1";
  $list = "/tmp/node_list.$arg1";
  $session_list = ">>/tmp/$arg2.session_list";
# $temp_debug =">/tmp/debug";
# open(debug,$temp_debug);

#needed commands
# have to use sudo so that the command is run as the user and not root
  $mpdboot = "/usr/bin/sudo -u $arg2 $mpich/mpdboot --remcons";

#Enviroment variable that needs to be set, so that mpdboot can find 
.mpd.conf
# in the users home directory
  $HOME = "/bio/users";


#get a queue status, this way we can grab the nodes the queue is going 
to use
# write them out to a tmp file
   system "$pbs/$qstat_cm $arg2 $output";

   open(F,"$input");
    @qstat=<F>;
   close(F);

#get the job id, so we start daemons for this job only.
   @id = split(/\./,$arg1);

#get the list of nodes after you find the right job
######################################################
if ( $mode == 1) {
   foreach $line (@qstat) {
    $line =~ m/(\d+)\./;
     if ("$1" eq "$id[0]") {
        @nodes = split(/\+/,$line);
        $size = @nodes;
        #get first node don't need it for node list
        @first = split(/ /,$nodes[0]);
        $size_first = @first;
     }
   }

#checking for multiples of same node.
    foreach $elem (@nodes) {
     chomp($elem);
     if("$first[$size_first-1]" eq "$elem") {
      #$size=$size/2; # need to add divider because diff node types
      $divide = $divide + 1;
     }   }
      $size=$size/($divide+1);

}

if ( $mode == 0) {
  $c = 0;
  foreach $line (@qstat) {
   $line =~ m/(\d+)\./;
    if ("$1" eq "$id[0]") {
      if ( $line =~ /compute\-\d+\-\d+/ ) {
        @temp_nodes = split(/\+/,$line);
        $temp_size = @temp_nodes;
        for ($i=0;$i<$temp_size;$i++) {
          if( $temp_nodes[$i] =~ /compute\-\d+\-\d+/) {
           if($i == $temp_size-1) {
             chop($temp_nodes[$i]);
           }
           $nodes[$c] = $temp_nodes[$i];
           $c = $c + 1;
          } #end for ( $temp_nodes[$i] =~ /compute\-\d+\-\d+/)
        }#end for ($i=0;$i<$temp_size;$i++)
      } #end if ( $line =~ /compute\-\d+\-\d+/ )
    } #end if for if ("$1" eq "$id[0]")
     $size = @nodes;
   } #end foreach $line (@qstat)

#checking for multiples of same node.
    foreach $elem (@nodes) {
     chomp($elem);
     if("$first[$size_first-1]" eq "$elem" ) {
      #$size=$size/2;
      $divide = $divide + 1;
     }
    }
      $size=$size/($divide+1);


} #end if ( $mode == "torque")
########################################################################

#create a nodes list for mpdboot to use to start the nodes
  open(A,"$node_list");
# some fancyness here to get the first node right -- NOT NEEDED
   #print A "$first[$size_first-1]\n" if ($mode == 1);
if($size > 1) {
  for($count=$mode;$count<$size;$count++) {
   print A "$nodes[$count]\n";
  }
}
  close(A);

#need to change the permissions so user submitting the job can read the 
file just created
    chmod 0644, "$list";

#set HOME to users home directory so mpdboot can find .mpd.conf
    $ENV{HOME} = "$HOME/$arg2";

#boot the nodes
   `$mpdboot -n $size -f $list`;
   open(B,"$session_list");
    print B "$id[0]\n";
   close(B);




Here is the epilogue
--------------------------------------------------------------------------------------------------------
#!/usr/bin/perl
## epilogue script
## for PBSpro to play nice and work with mpich2 daemon system
## Adam Hock
## 02-07-2006
## Version 1.6

#NOTES#
#
# This script is ran as root and needs to have permissions of 644
# It also needs to be called epilogue
# Put this file in ~PBS_INSTALL/mom_priv of all nodes


# The 10 arguments pbs pass to the script
  $arg1 = shift;  # the job id
  $arg2 = shift;  # the user name under which the job executes
  $arg3 = shift;  # the group name under whch the job executes
  $arg4 = shift;  # the job name
  $arg5 = shift;  # the session id
  $arg6 = shift $arg7 = shift;  # the list of resources used
  $arg8 = shift;  # the name of the queue in which the ob resides
  $arg9 = shift;  # the account string, if one exits
  $arg10 = shift;  # the exit status of the job.

#time stamp
$date = time();

#path variables
  $mpich = "/bio/tools/mpich/mpich2-1.0.2-PGI-6.0/bin";
$log = "/bio/tools/admin/report/log/epilogue.$date";

# Files need to make all things work created by the prologue script
  $input  = "/tmp/qstat.$arg1";
  $list = "/tmp/node_list.$arg1";
  $session_list = "/tmp/$arg2.session_list";
  $session_write = ">/tmp/$arg2.session_list";

# Commands need to be run at finish of job
  $mpdexit = "/usr/bin/sudo -u $arg2 $mpich/mpdallexit";
  $rm = "/bin/rm -f";

# Where home directories are found
  $HOME = "/bio/users";

#Make sure all enviromental variables are of the users especially HOME
    $ENV{HOME} = "$HOME/$arg2";

#Shut the daemons off for that user
  @id = split(/\./,$arg1);

#determine how many sessions are on this node
  open(A,"$session_list");
  @sessions=<A>;
  close(A);

  $num_sessions=@sessions;

#if there are more then one.. we don't want to kill the daemons
# remove my session and update sessions file
  if($num_sessions > 1) {
   open(B,"$session_write");
    foreach $line (@sessions) {
     chomp($line);
     if("$line" ne "$id[0]") {
      print B "$line";
     }
    }

#Remove all those files used to start the daemons. Clean up
   `$rm -f $input $list`;

#If there is only one session shut down daemons
  } elsif ($num_sessions == 1) {

#Shutdown daemons
    `$mpdexit`;

#Remove files
   `$rm -f $input $list $session_list`;
}

#update usage database
  if ( -e $log ) {
   sleep(1);
   $date = time();
   $log = "/bio/tools/admin/report/log/epilogue.$date";
  }
   `echo $arg1 >> $log`;
   `echo $arg2 >> $log`;
   `echo $arg3 >> $log`;
   `echo $arg4 >> $log`;
   `echo $arg5 >> $log`;
   `echo $arg6 >> $log`;
   `echo $arg7 >> $log`;
   `echo $arg8 >> $log`;
   `echo $arg9 >> $log`;
   `echo $arg10 >> $log`;




Jeffrey B. Layton wrote:
> Darius Buntinas wrote:
>>
>> What's "screaming"?  mpdboot or mpiexec?
> 
> I'm pretty sure it's mpdboot.
> 
> 
> I'll try the method below to see what happens. I'm also going
> to try Pete's mpiexec based on some recommendations to see
> if that reduces the pain.
> 
> Thanks!
> 
> Jeff
> 
>>
>> Try:
>>   mpdboot -n ${NP} -f ${PBS_NODEFILE}
>>   mpiexec -n ${NP} ./${EXE}
>>
>> You don't need a machinefile with mpiexec unless you want to execute 
>> on a subset of the nodes in your mpd ring, or you want control of the 
>> process-to-node mapping.
>>
>> I think that mpdboot should only start one mpd oer node, even if the 
>> node is specified more than one time in the file (you really only ever 
>> need one mpd per node).  If mpdboot is having trouble because you're 
>> asking for ${NP} mpds but there are only ${NP}/2 unique nodes in the 
>> file, you can try something like:
>>
>>   NUM_NODES=`sort -u ${PBS_NODEFILE} | wc -l | awk '{print $1}'`
>>   mpdboot -n ${NUM_NODES} -f ${PBS_NODEFILE}
>>   mpiexec -n ${NP} ./${EXE}
>>
>> I'm not PBS expert, so there might be an easier way to do that, but 
>> give it a try.
>>
>> If you are concerned about your process-to-node mapping and want to 
>> check what it is try:
>>   mpiexec -l -n ${NP} hostname
>>
>> -d
>>
>> On Tue, 4 Apr 2006, Jeffrey B. Layton wrote:
>>
>>> No joy. It always screams about not having enough hosts:
>>>
>>> totalnum=16  numhosts=8
>>> there are not enough hosts on which to start all processes
>>>
>>> I think this because we have two processors per node (ppn=2).
>>> Consequently PBS_NODEFILE has the hosts repeated. I've
>>> tried using --totalnum=${NP} --ncpus=2 and this didn't work
>>> either (same error message).
>>>
>>> Thanks!
>>>
>>> Jeff
>>>
>>>>
>>>> How about the following 3 lines in your script:
>>>>
>>>> mpdboot -n ${NP} -f ${PBS_NODEFILE}
>>>> mpiexec -machinefile ${PBS_NODEFILE} -n ${NP} ./${EXE}
>>>> mpdallexit
>>>>
>>>> Wei-keng
>>>>
>>>>
>>>> On Tue, 4 Apr 2006, Jeffrey B. Layton wrote:
>>>>
>>>>> Good morning,
>>>>>
>>>>>  I hate to bother everyone early in the morning, but I'm
>>>>> looking for some advice on MPICH2 startup. I've been starting
>>>>> an mpd on each node in the cluster via,
>>>>>
>>>>> mpdboot -n 25 -f /home/jlayton/mpd.hosts
>>>>>
>>>>> where the file mpd.hosts contains a list of all possible hosts.
>>>>> So I'm basically starting mpd on every node. Then I run the
>>>>> code using mpiexec
>>>>>
>>>>> mpiexec -machinefile ${PBS_NODEFILE} -n ${NP} ./${EXE}
>>>>>
>>>>> and run mpdallexit after the code is finished to stop all of the
>>>>> mpds. Notice that I'm using PBS for queuing/scheduling.
>>>>>  This is something of a pain, because we lose nodes for
>>>>> various projects or training so I'm constantly having to go into
>>>>> the list of hosts and edit it. I also have to change the count on
>>>>> the mpdboot command.
>>>>>  Is there a better way to start up MPICH2 codes using PBS?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Jeff
>>>>>
>>>>
>>>
>>>
>>

-- 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Adam Hock		Senior Network Systems Administrator +
+                       for Bioinformatics		     +
+							     +
+ Office		241 Nichols Hall		     +
+ Office Telephone	(785) 864-7728        		     +
+ Email			ahock at ittc.ku.edu		     +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




More information about the mpich-discuss mailing list