[mpich-discuss] core 2 quad and other multiple core processors

Thu Jul 3 20:12:47 CDT 2008

Hello Ariovaldo and list

Ariovaldo, don't shy off.
Your questions are reasonable.
Beginners should be encouraged, not scoffed,
regardless of whether they are students or Nobel laureates.

I've always got a lot of knowledgeable and generous advice
from the MPICH developers and other list subscribers,
even when I asked naive questions.

Having said that, here are a few suggestions.
Take them with a grain of salt, because
I am an MPICH user, not an expert.

1) How are you launching your mpd daemon?

The processes are distributed across the machines in different ways,
depending on whether you specify or not the number of cpus on each machine,
when you start the mpd daemon.
For details, please read section 5.1.7 in the MPICH Installer's Guide 
(available in PDF format
on the MPICH site).

In your case, to group processes tightly on each machine, I would say
you could use -ncpus=8, since you have four physical slots per machine,
and each one is dual-core processor, right?
I am guessing a bit here, though.
I didn't understand your description of your machine configuration,
but since you know it well, you can figure out the right number, which 
may not be 8.
Anyway, if your OS is Linux,
you can get the exact number cores available on each machine by looking 
at /proc/cpuinfo
("more  /proc/cpuinfo").

However, beware that speedup in multi-core CPUs is not really linear, 
due to memory contention
and bus "congestion".
There are people who simply leave one or more cores out of the game 
because of this.

2) For tighter control of where the processes run you can use the 
"-configfile" feature of mpiexec.
See the manual page of mpiexec (man mpiexec) for details.
You may need to point to the exact man page path if you have many 
versions of mpiexec in your computer.

3) The cpi.c program may not be a great speedup test.
I guess the program was meant just to test if MPI is working, not as a 
performance benchmark.

If you look at the code you'll see that the number of iterations
in the program's single loop is10000, and the computations are very modest.
With such a small number of iterations, if you divide the task by, say, 
10 processors,
each one does only 1000 simple computations, which is very fas compared to
other tasks in the beginning and end of the code.

The execution time is likely to be dominated by non-parallel tasks 
(initialization, etc),
which don't scale with the number of processors.
This is an aspect of "Amdahl's Law":
http://en.wikipedia.org/wiki/Amdahl's_law

Actually, as you increase the number of processors,
the communication overhead during initialization and finalization
may worsen the execution time when the computations are so modest as 
they are in cpi.c.

I guess the cpi.c code also has an unnecessary call to mpi_broadcast, 
which may be a leftover
from a previous version, and which may add to the overhead.
But this only the MPICH developers can tell.

There are synthetic benchmarks out there.
However, since you want to run the molecular dynamics code,
maybe you could setup a simple speedup experiment with it.
This may take less effort, and will help you use and understand the code 
you are interested in.

I hope this helps.

Gus Correa

-- 
---------------------------------------------------------------------
Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
Oceanography Bldg., Rm. 103-D, ph. (845) 365-8911, fax (845) 365-8736
---------------------------------------------------------------------

Ariovaldo de Souza Junior wrote:

> Hi Chong,
>
> Yes, I'm a student. In the truth who will use mpi is my teacher, who 
> will run NAMD, for molecular dynamics. I was challenged about set it 
> up this cluster and I had no knowledge about Linux before start it. 
> Once it running now, I think that maybe I could go a bit more far. 
> Theoretically my work is over, the machines are working well. But yet 
> I would like to know how to extract the maximum performance of these 
> computers, that have already proof that they are good ones, once we 
> had utilized them already for run Gaussian 03 molecular calculations.
>
> I wanted to know a bit more because I love computers and now I was 
> introduced to this universe of clusters, I wanted to know a bit more. 
> Just it. I'm reading already the tips you gave me, even it is a bit 
> complicated to extract the information I want from there. thanks a lot 
> for your attention.
>
> Ari.
>
> 2008/7/2 chong tan <chong_guan_tan at yahoo.com 
> <mailto:chong_guan_tan at yahoo.com>>:
>
>
>     Ari,
>
>     Are you a student ? Anyway, I like to point you to the answer of
>     your problem:
>
>     mpiexec -help
>
>      
>
>     or  look at your mpich2 packge, under www/www1, there is a
>     mpiexec.html
>
>      
>
>     it is easier to give your the answer, but getting you to look for
>     the answer is better.
>
>      
>
>      
>
>     stan
>
>
>     --- On *Wed, 7/2/08, Ariovaldo de Souza Junior
>     /<ariovaldojunior at gmail.com <mailto:ariovaldojunior at gmail.com>>/*
>     wrote:
>
>         From: Ariovaldo de Souza Junior <ariovaldojunior at gmail.com
>         <mailto:ariovaldojunior at gmail.com>>
>         Subject: [mpich-discuss] core 2 quad and other multiple core
>         processors
>         To: mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
>         Date: Wednesday, July 2, 2008, 1:15 PM
>
>
>         Hello everybody!
>
>         I'm really a newbie on clustering, so I have some, let's say,
>         stupid questions. When I'm starting a job like this "mpiexec
>         -l -n 6 ./cpi" in my small cluster of (until now) 6 core 2
>         quad machines, I'm sending 1 process to each node, right?
>         Assuming that I'm correct, each process will utilize only 1
>         core of each node? and how to make 1 process run utilizing the
>         whole processing capacity of the processor, the 4 cores? is
>         there a way to do this? or I'll always utilize just one
>         processor for each process? if I change this submission to
>         "mpiexec -l -n 24 ./cpi" then the same process will run 24
>         times, 4 times per node (maybe simultaneously) and one process
>         per core, right?
>
>         I'm asking all this because I think it is a bit strange to see
>         the processing time increasing each time I put one more
>         process to run, once in my mind it should be the contrary.
>         I'll give some examples:
>
>         mpiexec -n 1 ./cpi
>         wall clock time = 0.000579
>
>         mpiexec -n 2 ./cpi
>         wall clock time = 0.002442
>
>         mpiexec -n 3 ./cpi
>         wall clock time = 0.004568
>
>         mpiexec -n 4 ./cpi
>         wall clock time = 0.005150
>
>         mpiexec -n 5 ./cpi
>         wall clock time = 0.008923
>
>         mpiexec -n 6 ./cpi
>         wall clock time = 0.009309
>
>         mpiexec -n 12 ./cpi
>         wall clock time = 0.019445
>
>         mpiexec -n 18 ./cpi
>         wall clock time = 0.032204
>
>         mpiexec -n 24 ./cpi
>         wall clock time = 0.045413
>
>         mpiexec -n 48 ./cpi
>         wall clock time = 0.089815
>
>         mpiexec -n 96 ./cpi
>         wall clock time = 0.218894
>
>         mpiexec -n 192 ./cpi
>         wall clock time = 0.492870
>
>         So, as you all can see is that as more processes I add, more
>         time it takes, what makes me think that mpi is performing this
>         test 192 times in the end and due to this the time increased.
>         Is that correct that mpi performed the same test 192? Or did
>         it divide the process into 192 pieces, calculated and then
>         gathered the results and mounted the output again? I really
>         would like to understand this relationship processor # x
>         process # x .
>
>         I have the feeling that my questions are a bit "poor" and
>         really from a newbie, but the answer will help me on utilizing
>         other programs that will need mpi to run.
>
>         Thanks to all!
>
>         Ari - UFAM - Brazil
>
>
>