<html>

<body>

At 01:20 AM 9/27/2005, David Ashton wrote:<br>

<blockquote type=cite class=cite cite=""><font face="arial" size=2>Let’s

say you have 26 machines available: a-z<br>

Let’s say you have two machine files:<br>

mf1 contains:<br>

a<br>

b<br>

c<br>

d<br>

mf2 contains:<br>

m<br>

n<br>

o<br>

Let’s say you have an application that uses MPI_Comm_spawn to spawn

another 5 process job.<br>

&nbsp;<br>

What hosts would you expect to be used for the spawned processes given

the following commands?:<br>

&nbsp;<br>

A)</font><font face="Times New Roman, Times" size=1>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </font><font face="arial" size=2>mpiexec

-n 1 spawner<br>

B)</font><font face="Times New Roman, Times" size=1>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </font><font face="arial" size=2>mpiexec

-machinefile mf1 -n 3 spawner<br>

C)</font><font face="Times New Roman, Times" size=1>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </font><font face="arial" size=2>mpiexec

-machinefile mf1 -n 2 spawner : -machinefile mf2 -n 1 spawner<br>

D)</font><font face="Times New Roman, Times" size=1>

&nbsp;&nbsp;&nbsp;&nbsp; </font><font face="arial" size=2>mpiexec -host a

-n 1 spawner : -host b -n 1 spawner<br>

</font></blockquote><br>

These are good questions.<br><br>

The key question is &quot;what is the scope of -machinefile&quot;?&nbsp;

Since this isn't one of the MPI-2 standard names, there isn't any correct

answer to this.&nbsp; I will use the term &quot;appnum&quot; to denote

one of the colon-separated clauses, since processes in each cluase get

the same appnum value and that value is different from processes in

another clause..&nbsp; Possibilities for interpreting -machinefile

include<br><br>

1. -machinefile has scope within the &quot;appnum&quot; during mpiexec

only.&nbsp; <br><br>

In this case, MPI_Comm_spawn draws from the pool of nodes in all 4

cases.&nbsp; In many ways, this is the interpretation most consistent

with the other arguments to mpiexec.&nbsp; For example, the -soft option

applies only to the creation of MPI_COMM_WORLD by mpiexec and does not

set a default &quot;soft&quot; value for MPI_Comm_spawn.&nbsp; Assuming

an MPI implementation had a &quot;machinefile&quot; info value for

MPI_Comm_spawn, it would be best if the behavior was consistent.&nbsp; Of

course, this begs the question of how the default may be set, but

overloading this mpiexec option may not be the right thing.&nbsp; For

example, if an implementation wants to control the placement of the first

few processes, but allow the spawns on the rest of the pool, you would

want the scope of the -machinefile option to be just the mpiexec

step.<br><br>

2. -machinefile has scope within the &quot;appnum&quot; for all time

(e.g., later MPI_Comm_spawn).<br><br>

This really opens up the question of the resource manager/selection

criteria, which the MPI Forum struggled with and eventually rejected

because of the complexity of the choices.&nbsp; As in case 1, it doesn't

explain how to achieve the other behavior.<br>

-------<br><br>

My preference would be to separate the information for mpiexec startup

from resource information available to the implementation of

MPI_Comm_spawn/spawn_multiple.&nbsp; A list of machines is only the first

step in this direction; a user may also want to know what the are

capabilities of the machines (memory, CPU speed, networks), software

environment (installed libraries, license availability), load, and use

restrictions (e.g., desktop machines may be available at night if idle

for an hour).&nbsp; In addition, there needs to be a mechanism (like the

environment variable handling used in the gforker and mpd mpiexec) to

apply to all &quot;appnums&quot; as well as to individual

&quot;appnums&quot;.&nbsp; <br><br>

Bill<br><br>

<blockquote type=cite class=cite cite=""><font face="arial" size=2>

&nbsp;<br>

A)<br>

In the first example I would expect the hosts to come from the main pool

like this:<br>

1 process on host a, five spawned processes on b,c,d,e,f.&nbsp; The order

of the hosts is irrelevant but they would all come from the big

pool.<br>

&nbsp;<br>

B)<br>

In the second example I would expect one of two possibilities:<br>

The first 3 processes must come from the machinefile, a,b,c and the 5

spawned processes would either come from the main pool, d,e,f,g,h, or

from the machinefile, d,a,b,c,d.<br>

&nbsp;<br>

If a machinefile is specified would you expect spawned processes to come

from the machinefile or the global pool?<br>

&nbsp;<br>

What if the global pool was unknown but a machinefile was

specified?&nbsp; So in that case the global pool would be the local

machine and you could get the following:<br>

3 processes on a,b,c and 5 spawned processes on a<br>

Or<br>

3 processes on a,b,c and 5 spawned processes on d,a,b,c,d<br>

&nbsp;<br>

C)<br>

The first three processes must be on a,b,m and the 5 spawned processes

could come from the main pool.&nbsp; If the spawned processes came from

the machine files which one would be used? mf1? mf2? mf1+mf2?<br>

&nbsp;<br>

D)<br>

The first two processes must be on a and b.&nbsp; The spawned processes

could come from the pool, c,d,e,f,g.&nbsp; But if the pool was unknown

would you expect all the spawned processes to be on the local host

a?&nbsp; Or would you expect the processes to be placed on both a and

b?<br>

&nbsp;<br>

I know you could just run each of the examples and see what happens but

I’m interested in what you think should happen, not what the current

implementation actually does.<br>

&nbsp;<br>

-David Ashton<br>

&nbsp;</blockquote>

<x-sigsep><p></x-sigsep>

William Gropp<br>

<a href="http://www.mcs.anl.gov/~gropp" eudora="autourl">

http://www.mcs.anl.gov/~gropp</a></font></body>

</html>