[MPICH] To: mpich-discuss-digest at mcs.anl.gov

Tue Aug 7 09:33:56 CDT 2007

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Mihail Spasov
> Sent: Friday, August 03, 2007 12:18 PM
> Subject: [MPICH] To: mpich-discuss-digest at mcs.anl.gov
> 
> Hi all:
> 
> I am running a simulation of a channel flow using Lattice Boltzmann  
> Method. Each process in my simulation is responsible for a 
> small cube  
> of the domain. Each process has ghost layers and exchanges data only  
> with the 26 neighboring processes (if it's not a boundary 
> process). I  
> use MPI::Compute_dims for the process decomposition,  
> MPI::Cartcomm::Get_cart_rank to find the neighbors,  
> MPI::Comm::Sendrecv for the information exchange between the  
> processes. Below are some concerns that I would like to discuss with  
> someone who has more experience with MPI:

MPI_Sendrecv might be too synchronizing. You could try posting all the
Isends and Irecvs and then doing a Waitall.

> 1) I am getting an average load on each node of about 1.2 
> whereas the  
> optimum, I guess, should be 2 (since I have dual core nodes). I have  
> no idea if this is normal or I should work on optimizing the 
> simulation.
> 
> 2) It seems the simulation time depends very much on the kind of  
> process decomposition I have. For example the simulation would run  
> with 1.5 sec per time step with 24 = 12x2 processes and with 
> 16 = 4x4  
> processes. Anything in between 16 and 24 takes more time - 
> 1.5 to 3.6  
> sec. Thus, it seems like 8 processes are lost. Again, is this 
> normal?  
> Should I create my own process decomposition? How would I know what  
> would be the optimum? The only thing I have noticed so far is that  
> when the process grid is three dimensional (for example 32=4x4x2) I  
> get good timings.

You will need to experiment with different process decompositions and use
the one that works best. You can create your own if the one MPI gives you
does not perform the best.

> 3) Right now I have created a separate program which combines the  
> necessary data from the different processes into a separate 
> matrix or  
> 3d array and saves it. I was wondering if there is an MPI 'neat' way  
> to do that.

Yes! You can look at MPI-IO and the subarray/darray datatypes for file
views.

Rajeev