Mihail Spasov mspasov at iit.edu
Fri Aug 3 12:18:13 CDT 2007

Hi all:

I am running a simulation of a channel flow using Lattice Boltzmann  
Method. Each process in my simulation is responsible for a small cube  
of the domain. Each process has ghost layers and exchanges data only  
with the 26 neighboring processes (if it's not a boundary process). I  
use MPI::Compute_dims for the process decomposition,  
MPI::Cartcomm::Get_cart_rank to find the neighbors,  
MPI::Comm::Sendrecv for the information exchange between the  
processes. Below are some concerns that I would like to discuss with  
someone who has more experience with MPI:

1) I am getting an average load on each node of about 1.2 whereas the  
optimum, I guess, should be 2 (since I have dual core nodes). I have  
no idea if this is normal or I should work on optimizing the simulation.

2) It seems the simulation time depends very much on the kind of  
process decomposition I have. For example the simulation would run  
with 1.5 sec per time step with 24 = 12x2 processes and with 16 = 4x4  
processes. Anything in between 16 and 24 takes more time - 1.5 to 3.6  
sec. Thus, it seems like 8 processes are lost. Again, is this normal?  
Should I create my own process decomposition? How would I know what  
would be the optimum? The only thing I have noticed so far is that  
when the process grid is three dimensional (for example 32=4x4x2) I  
get good timings.

3) Right now I have created a separate program which combines the  
necessary data from the different processes into a separate matrix or  
3d array and saves it. I was wondering if there is an MPI 'neat' way  
to do that.

Mihail Spasov
mspasov at iit.edu

