[MPICH] To: mpich-discuss-digest at mcs.anl.gov
Mihail Spasov
mspasov at iit.edu
Fri Aug 3 12:18:13 CDT 2007
Hi all:
I am running a simulation of a channel flow using Lattice Boltzmann
Method. Each process in my simulation is responsible for a small cube
of the domain. Each process has ghost layers and exchanges data only
with the 26 neighboring processes (if it's not a boundary process). I
use MPI::Compute_dims for the process decomposition,
MPI::Cartcomm::Get_cart_rank to find the neighbors,
MPI::Comm::Sendrecv for the information exchange between the
processes. Below are some concerns that I would like to discuss with
someone who has more experience with MPI:
1) I am getting an average load on each node of about 1.2 whereas the
optimum, I guess, should be 2 (since I have dual core nodes). I have
no idea if this is normal or I should work on optimizing the simulation.
2) It seems the simulation time depends very much on the kind of
process decomposition I have. For example the simulation would run
with 1.5 sec per time step with 24 = 12x2 processes and with 16 = 4x4
processes. Anything in between 16 and 24 takes more time - 1.5 to 3.6
sec. Thus, it seems like 8 processes are lost. Again, is this normal?
Should I create my own process decomposition? How would I know what
would be the optimum? The only thing I have noticed so far is that
when the process grid is three dimensional (for example 32=4x4x2) I
get good timings.
3) Right now I have created a separate program which combines the
necessary data from the different processes into a separate matrix or
3d array and saves it. I was wondering if there is an MPI 'neat' way
to do that.
Sincerely,
Mihail Spasov
mspasov at iit.edu
More information about the mpich-discuss
mailing list