[MPICH] MPICH2 startup w/ PBS
Adam Hock
ahock at ittc.ku.edu
Tue Apr 4 11:49:32 CDT 2006
I have spent some time developing the following scripts. They are not
perfect but they do get the job done. I use them a prologue and epilogue
scripts. These scripts need to be put in /var/spool/PBS/mom_priv on each
node and have the premission of 755
Here are a list of variables you will need to change to make it work on
your system.
$mpich = "/bio/tools/mpich/mpich2-1.0.2-PGI-6.0/bin";
$HOME = "/bio/users";
$log = "/bio/tools/admin/report/log/epilogue.$date";
The log is for getting info about jobs and is not needed.
Feel free to contact me if you have any questions.
Here is the prologue
-------------------------------------------------------------------------
#!/usr/bin/perl
## prologue script
## for PBSpro to play nice and work with mpich2 daemon system
## Adam Hock - University of Kansas
## 02-07-2006
## Version 1.6
#NOTES#
#
# This script is ran as root and needs to have permissions of 644
# It also needs to be called prologue
# Put this file in ~PBS_INSTALL/mom_priv of all nodes
# The 10 arguments pbs passed in.
$arg1 = shift; # the job id
$arg2 = shift; # the user name under which the job executes
$arg3 = shift; # the group name under whch the job executes
$arg4 = shift; # the job name / not given in prologue
$arg5 = shift; # the session id / not given in prologue
$arg6 = shift; # the requested resource limits (list) / not given in
prologue
$arg7 = shift; # the list of resources used / not given in prologue
$arg8 = shift; # the name of the queue in which the ob resides / not
given in prologue
$arg9 = shift; # the account string, if one exits / not given in prologue
$arg10 = shift; # the exit status of the job.
#set to pbs_pro or torque
#pbs_pro = 1
#torque = 0
$mode = 1;
#set divider
$set = 0;
$divide = 0;
#path variables
$mpich = "/bio/tools/mpich/mpich2-1.0.2-PGI-6.0/bin";
$pbs = "/usr/pbs/bin";
$qstat_cm = "qstat -n -1 -u";
#files to make this all work nice and neat
$output = "> /tmp/qstat.$arg1";
$input = "/tmp/qstat.$arg1";
$node_list = "> /tmp/node_list.$arg1";
$list = "/tmp/node_list.$arg1";
$session_list = ">>/tmp/$arg2.session_list";
# $temp_debug =">/tmp/debug";
# open(debug,$temp_debug);
#needed commands
# have to use sudo so that the command is run as the user and not root
$mpdboot = "/usr/bin/sudo -u $arg2 $mpich/mpdboot --remcons";
#Enviroment variable that needs to be set, so that mpdboot can find
.mpd.conf
# in the users home directory
$HOME = "/bio/users";
#get a queue status, this way we can grab the nodes the queue is going
to use
# write them out to a tmp file
system "$pbs/$qstat_cm $arg2 $output";
open(F,"$input");
@qstat=<F>;
close(F);
#get the job id, so we start daemons for this job only.
@id = split(/\./,$arg1);
#get the list of nodes after you find the right job
######################################################
if ( $mode == 1) {
foreach $line (@qstat) {
$line =~ m/(\d+)\./;
if ("$1" eq "$id[0]") {
@nodes = split(/\+/,$line);
$size = @nodes;
#get first node don't need it for node list
@first = split(/ /,$nodes[0]);
$size_first = @first;
}
}
#checking for multiples of same node.
foreach $elem (@nodes) {
chomp($elem);
if("$first[$size_first-1]" eq "$elem") {
#$size=$size/2; # need to add divider because diff node types
$divide = $divide + 1;
} }
$size=$size/($divide+1);
}
if ( $mode == 0) {
$c = 0;
foreach $line (@qstat) {
$line =~ m/(\d+)\./;
if ("$1" eq "$id[0]") {
if ( $line =~ /compute\-\d+\-\d+/ ) {
@temp_nodes = split(/\+/,$line);
$temp_size = @temp_nodes;
for ($i=0;$i<$temp_size;$i++) {
if( $temp_nodes[$i] =~ /compute\-\d+\-\d+/) {
if($i == $temp_size-1) {
chop($temp_nodes[$i]);
}
$nodes[$c] = $temp_nodes[$i];
$c = $c + 1;
} #end for ( $temp_nodes[$i] =~ /compute\-\d+\-\d+/)
}#end for ($i=0;$i<$temp_size;$i++)
} #end if ( $line =~ /compute\-\d+\-\d+/ )
} #end if for if ("$1" eq "$id[0]")
$size = @nodes;
} #end foreach $line (@qstat)
#checking for multiples of same node.
foreach $elem (@nodes) {
chomp($elem);
if("$first[$size_first-1]" eq "$elem" ) {
#$size=$size/2;
$divide = $divide + 1;
}
}
$size=$size/($divide+1);
} #end if ( $mode == "torque")
########################################################################
#create a nodes list for mpdboot to use to start the nodes
open(A,"$node_list");
# some fancyness here to get the first node right -- NOT NEEDED
#print A "$first[$size_first-1]\n" if ($mode == 1);
if($size > 1) {
for($count=$mode;$count<$size;$count++) {
print A "$nodes[$count]\n";
}
}
close(A);
#need to change the permissions so user submitting the job can read the
file just created
chmod 0644, "$list";
#set HOME to users home directory so mpdboot can find .mpd.conf
$ENV{HOME} = "$HOME/$arg2";
#boot the nodes
`$mpdboot -n $size -f $list`;
open(B,"$session_list");
print B "$id[0]\n";
close(B);
Here is the epilogue
--------------------------------------------------------------------------------------------------------
#!/usr/bin/perl
## epilogue script
## for PBSpro to play nice and work with mpich2 daemon system
## Adam Hock
## 02-07-2006
## Version 1.6
#NOTES#
#
# This script is ran as root and needs to have permissions of 644
# It also needs to be called epilogue
# Put this file in ~PBS_INSTALL/mom_priv of all nodes
# The 10 arguments pbs pass to the script
$arg1 = shift; # the job id
$arg2 = shift; # the user name under which the job executes
$arg3 = shift; # the group name under whch the job executes
$arg4 = shift; # the job name
$arg5 = shift; # the session id
$arg6 = shift $arg7 = shift; # the list of resources used
$arg8 = shift; # the name of the queue in which the ob resides
$arg9 = shift; # the account string, if one exits
$arg10 = shift; # the exit status of the job.
#time stamp
$date = time();
#path variables
$mpich = "/bio/tools/mpich/mpich2-1.0.2-PGI-6.0/bin";
$log = "/bio/tools/admin/report/log/epilogue.$date";
# Files need to make all things work created by the prologue script
$input = "/tmp/qstat.$arg1";
$list = "/tmp/node_list.$arg1";
$session_list = "/tmp/$arg2.session_list";
$session_write = ">/tmp/$arg2.session_list";
# Commands need to be run at finish of job
$mpdexit = "/usr/bin/sudo -u $arg2 $mpich/mpdallexit";
$rm = "/bin/rm -f";
# Where home directories are found
$HOME = "/bio/users";
#Make sure all enviromental variables are of the users especially HOME
$ENV{HOME} = "$HOME/$arg2";
#Shut the daemons off for that user
@id = split(/\./,$arg1);
#determine how many sessions are on this node
open(A,"$session_list");
@sessions=<A>;
close(A);
$num_sessions=@sessions;
#if there are more then one.. we don't want to kill the daemons
# remove my session and update sessions file
if($num_sessions > 1) {
open(B,"$session_write");
foreach $line (@sessions) {
chomp($line);
if("$line" ne "$id[0]") {
print B "$line";
}
}
#Remove all those files used to start the daemons. Clean up
`$rm -f $input $list`;
#If there is only one session shut down daemons
} elsif ($num_sessions == 1) {
#Shutdown daemons
`$mpdexit`;
#Remove files
`$rm -f $input $list $session_list`;
}
#update usage database
if ( -e $log ) {
sleep(1);
$date = time();
$log = "/bio/tools/admin/report/log/epilogue.$date";
}
`echo $arg1 >> $log`;
`echo $arg2 >> $log`;
`echo $arg3 >> $log`;
`echo $arg4 >> $log`;
`echo $arg5 >> $log`;
`echo $arg6 >> $log`;
`echo $arg7 >> $log`;
`echo $arg8 >> $log`;
`echo $arg9 >> $log`;
`echo $arg10 >> $log`;
Jeffrey B. Layton wrote:
> Darius Buntinas wrote:
>>
>> What's "screaming"? mpdboot or mpiexec?
>
> I'm pretty sure it's mpdboot.
>
>
> I'll try the method below to see what happens. I'm also going
> to try Pete's mpiexec based on some recommendations to see
> if that reduces the pain.
>
> Thanks!
>
> Jeff
>
>>
>> Try:
>> mpdboot -n ${NP} -f ${PBS_NODEFILE}
>> mpiexec -n ${NP} ./${EXE}
>>
>> You don't need a machinefile with mpiexec unless you want to execute
>> on a subset of the nodes in your mpd ring, or you want control of the
>> process-to-node mapping.
>>
>> I think that mpdboot should only start one mpd oer node, even if the
>> node is specified more than one time in the file (you really only ever
>> need one mpd per node). If mpdboot is having trouble because you're
>> asking for ${NP} mpds but there are only ${NP}/2 unique nodes in the
>> file, you can try something like:
>>
>> NUM_NODES=`sort -u ${PBS_NODEFILE} | wc -l | awk '{print $1}'`
>> mpdboot -n ${NUM_NODES} -f ${PBS_NODEFILE}
>> mpiexec -n ${NP} ./${EXE}
>>
>> I'm not PBS expert, so there might be an easier way to do that, but
>> give it a try.
>>
>> If you are concerned about your process-to-node mapping and want to
>> check what it is try:
>> mpiexec -l -n ${NP} hostname
>>
>> -d
>>
>> On Tue, 4 Apr 2006, Jeffrey B. Layton wrote:
>>
>>> No joy. It always screams about not having enough hosts:
>>>
>>> totalnum=16 numhosts=8
>>> there are not enough hosts on which to start all processes
>>>
>>> I think this because we have two processors per node (ppn=2).
>>> Consequently PBS_NODEFILE has the hosts repeated. I've
>>> tried using --totalnum=${NP} --ncpus=2 and this didn't work
>>> either (same error message).
>>>
>>> Thanks!
>>>
>>> Jeff
>>>
>>>>
>>>> How about the following 3 lines in your script:
>>>>
>>>> mpdboot -n ${NP} -f ${PBS_NODEFILE}
>>>> mpiexec -machinefile ${PBS_NODEFILE} -n ${NP} ./${EXE}
>>>> mpdallexit
>>>>
>>>> Wei-keng
>>>>
>>>>
>>>> On Tue, 4 Apr 2006, Jeffrey B. Layton wrote:
>>>>
>>>>> Good morning,
>>>>>
>>>>> I hate to bother everyone early in the morning, but I'm
>>>>> looking for some advice on MPICH2 startup. I've been starting
>>>>> an mpd on each node in the cluster via,
>>>>>
>>>>> mpdboot -n 25 -f /home/jlayton/mpd.hosts
>>>>>
>>>>> where the file mpd.hosts contains a list of all possible hosts.
>>>>> So I'm basically starting mpd on every node. Then I run the
>>>>> code using mpiexec
>>>>>
>>>>> mpiexec -machinefile ${PBS_NODEFILE} -n ${NP} ./${EXE}
>>>>>
>>>>> and run mpdallexit after the code is finished to stop all of the
>>>>> mpds. Notice that I'm using PBS for queuing/scheduling.
>>>>> This is something of a pain, because we lose nodes for
>>>>> various projects or training so I'm constantly having to go into
>>>>> the list of hosts and edit it. I also have to change the count on
>>>>> the mpdboot command.
>>>>> Is there a better way to start up MPICH2 codes using PBS?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Jeff
>>>>>
>>>>
>>>
>>>
>>
--
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Adam Hock Senior Network Systems Administrator +
+ for Bioinformatics +
+ +
+ Office 241 Nichols Hall +
+ Office Telephone (785) 864-7728 +
+ Email ahock at ittc.ku.edu +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
More information about the mpich-discuss
mailing list