<html>
<body>
At 01:20 AM 9/27/2005, David Ashton wrote:<br>
<blockquote type=cite class=cite cite=""><font face="arial" size=2>Let’s
say you have 26 machines available: a-z<br>
Let’s say you have two machine files:<br>
mf1 contains:<br>
a<br>
b<br>
c<br>
d<br>
mf2 contains:<br>
m<br>
n<br>
o<br>
Let’s say you have an application that uses MPI_Comm_spawn to spawn
another 5 process job.<br>
<br>
What hosts would you expect to be used for the spawned processes given
the following commands?:<br>
<br>
A)</font><font face="Times New Roman, Times" size=1>
</font><font face="arial" size=2>mpiexec
-n 1 spawner<br>
B)</font><font face="Times New Roman, Times" size=1>
</font><font face="arial" size=2>mpiexec
-machinefile mf1 -n 3 spawner<br>
C)</font><font face="Times New Roman, Times" size=1>
</font><font face="arial" size=2>mpiexec
-machinefile mf1 -n 2 spawner : -machinefile mf2 -n 1 spawner<br>
D)</font><font face="Times New Roman, Times" size=1>
</font><font face="arial" size=2>mpiexec -host a
-n 1 spawner : -host b -n 1 spawner<br>
</font></blockquote><br>
These are good questions.<br><br>
The key question is "what is the scope of -machinefile"?
Since this isn't one of the MPI-2 standard names, there isn't any correct
answer to this. I will use the term "appnum" to denote
one of the colon-separated clauses, since processes in each cluase get
the same appnum value and that value is different from processes in
another clause.. Possibilities for interpreting -machinefile
include<br><br>
1. -machinefile has scope within the "appnum" during mpiexec
only. <br><br>
In this case, MPI_Comm_spawn draws from the pool of nodes in all 4
cases. In many ways, this is the interpretation most consistent
with the other arguments to mpiexec. For example, the -soft option
applies only to the creation of MPI_COMM_WORLD by mpiexec and does not
set a default "soft" value for MPI_Comm_spawn. Assuming
an MPI implementation had a "machinefile" info value for
MPI_Comm_spawn, it would be best if the behavior was consistent. Of
course, this begs the question of how the default may be set, but
overloading this mpiexec option may not be the right thing. For
example, if an implementation wants to control the placement of the first
few processes, but allow the spawns on the rest of the pool, you would
want the scope of the -machinefile option to be just the mpiexec
step.<br><br>
2. -machinefile has scope within the "appnum" for all time
(e.g., later MPI_Comm_spawn).<br><br>
This really opens up the question of the resource manager/selection
criteria, which the MPI Forum struggled with and eventually rejected
because of the complexity of the choices. As in case 1, it doesn't
explain how to achieve the other behavior.<br>
-------<br><br>
My preference would be to separate the information for mpiexec startup
from resource information available to the implementation of
MPI_Comm_spawn/spawn_multiple. A list of machines is only the first
step in this direction; a user may also want to know what the are
capabilities of the machines (memory, CPU speed, networks), software
environment (installed libraries, license availability), load, and use
restrictions (e.g., desktop machines may be available at night if idle
for an hour). In addition, there needs to be a mechanism (like the
environment variable handling used in the gforker and mpd mpiexec) to
apply to all "appnums" as well as to individual
"appnums". <br><br>
Bill<br><br>
<blockquote type=cite class=cite cite=""><font face="arial" size=2>
<br>
A)<br>
In the first example I would expect the hosts to come from the main pool
like this:<br>
1 process on host a, five spawned processes on b,c,d,e,f. The order
of the hosts is irrelevant but they would all come from the big
pool.<br>
<br>
B)<br>
In the second example I would expect one of two possibilities:<br>
The first 3 processes must come from the machinefile, a,b,c and the 5
spawned processes would either come from the main pool, d,e,f,g,h, or
from the machinefile, d,a,b,c,d.<br>
<br>
If a machinefile is specified would you expect spawned processes to come
from the machinefile or the global pool?<br>
<br>
What if the global pool was unknown but a machinefile was
specified? So in that case the global pool would be the local
machine and you could get the following:<br>
3 processes on a,b,c and 5 spawned processes on a<br>
Or<br>
3 processes on a,b,c and 5 spawned processes on d,a,b,c,d<br>
<br>
C)<br>
The first three processes must be on a,b,m and the 5 spawned processes
could come from the main pool. If the spawned processes came from
the machine files which one would be used? mf1? mf2? mf1+mf2?<br>
<br>
D)<br>
The first two processes must be on a and b. The spawned processes
could come from the pool, c,d,e,f,g. But if the pool was unknown
would you expect all the spawned processes to be on the local host
a? Or would you expect the processes to be placed on both a and
b?<br>
<br>
I know you could just run each of the examples and see what happens but
I’m interested in what you think should happen, not what the current
implementation actually does.<br>
<br>
-David Ashton<br>
</blockquote>
<x-sigsep><p></x-sigsep>
William Gropp<br>
<a href="http://www.mcs.anl.gov/~gropp" eudora="autourl">
http://www.mcs.anl.gov/~gropp</a></font></body>
</html>