[mpich-discuss] core 2 quad and other multiple core processors

Fri Jul 4 22:07:48 CDT 2008

Nemesis will be the default soon in 1.1. We should have made it the default
earlier, but it didn't support MPI-2 dynamic processes and it wasn't passing
all the extensive set of tests.

Rajeev

  _____  

From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Robert Kubrick
Sent: Friday, July 04, 2008 12:27 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] core 2 quad and other multiple core processors

I wonder why ch3:nemesis or ch3:ssm is the default in MPICH. Why ch3:socket?

Robert

On Jul 2, 2008, at 10:59 PM, Rajeev Thakur wrote:

For best performance, configure with --with-device=ch3:nemesis. It will use
the Nemesis device within MPICH2 that communicates using shared memory
within a node and TCP across nodes.

Rajeev

  _____  

From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Ariovaldo de Souza
Junior
Sent: Wednesday, July 02, 2008 3:15 PM
To: mpich-discuss at mcs.anl.gov
Subject: [mpich-discuss] core 2 quad and other multiple core processors

Hello everybody!

I'm really a newbie on clustering, so I have some, let's say, stupid
questions. When I'm starting a job like this "mpiexec -l -n 6 ./cpi" in my
small cluster of (until now) 6 core 2 quad machines, I'm sending 1 process
to each node, right? Assuming that I'm correct, each process will utilize
only 1 core of each node? and how to make 1 process run utilizing the whole
processing capacity of the processor, the 4 cores? is there a way to do
this? or I'll always utilize just one processor for each process? if I
change this submission to "mpiexec -l -n 24 ./cpi" then the same process
will run 24 times, 4 times per node (maybe simultaneously) and one process
per core, right?

I'm asking all this because I think it is a bit strange to see the
processing time increasing each time I put one more process to run, once in
my mind it should be the contrary. I'll give some examples:

mpiexec -n 1 ./cpi
wall clock time = 0.000579

mpiexec -n 2 ./cpi
wall clock time = 0.002442

mpiexec -n 3 ./cpi
wall clock time = 0.004568

mpiexec -n 4 ./cpi
wall clock time = 0.005150

mpiexec -n 5 ./cpi
wall clock time = 0.008923

mpiexec -n 6 ./cpi
wall clock time = 0.009309

mpiexec -n 12 ./cpi
wall clock time = 0.019445

mpiexec -n 18 ./cpi
wall clock time = 0.032204

mpiexec -n 24 ./cpi
wall clock time = 0.045413

mpiexec -n 48 ./cpi
wall clock time = 0.089815

mpiexec -n 96 ./cpi
wall clock time = 0.218894

mpiexec -n 192 ./cpi
wall clock time = 0.492870

So, as you all can see is that as more processes I add, more time it takes,
what makes me think that mpi is performing this test 192 times in the end
and due to this the time increased. Is that correct that mpi performed the
same test 192? Or did it divide the process into 192 pieces, calculated and
then gathered the results and mounted the output again? I really would like
to understand this relationship processor # x process # x .

I have the feeling that my questions are a bit "poor" and really from a
newbie, but the answer will help me on utilizing other programs that will
need mpi to run.

Thanks to all!

Ari - UFAM - Brazil

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080704/084e961a/attachment.htm>