[mpich-discuss] core 2 quad and other multiple core processors

Wed Jul 9 21:42:08 CDT 2008

Hello Prof. Gustavo,

Thank you a lot for your patience on answering my message. I confess you
that during the process I thought about give up, but then I watched a
documentary about an university here in Brazil that is going to construct
the biggest cluster system in South America, connecting about 5 universities
bases. So I thought, 'if they are connecting so many computers that is
impossible that I can't make just 9 work'. And then I I went on and now it
is finished and I'm in vacations (reason by which I didn't answer you
before, I want to apologize for this). Your words gave me new strenght to go
on, reading you learn a lot but like in the universities, sometimes you need
someone saying you where to go in the next step, and your orientations have
cleared a lot of this long way. Really thank you.
Now my professor is using the cluster, after some weeks trying to make it
work. soon I'll have some results for discuss with you and all the list.
I hope your work is going on ok and I wish all the best!

Ari

2008/7/3 Gus Correa <gus at ldeo.columbia.edu>:

> Hello Ariovaldo and list
>
> Ariovaldo, don't shy off.
> Your questions are reasonable.
> Beginners should be encouraged, not scoffed,
> regardless of whether they are students or Nobel laureates.
>
> I've always got a lot of knowledgeable and generous advice
> from the MPICH developers and other list subscribers,
> even when I asked naive questions.
>
> Having said that, here are a few suggestions.
> Take them with a grain of salt, because
> I am an MPICH user, not an expert.
>
> 1) How are you launching your mpd daemon?
>
> The processes are distributed across the machines in different ways,
> depending on whether you specify or not the number of cpus on each machine,
> when you start the mpd daemon.
> For details, please read section 5.1.7 in the MPICH Installer's Guide
> (available in PDF format
> on the MPICH site).
>
> In your case, to group processes tightly on each machine, I would say
> you could use -ncpus=8, since you have four physical slots per machine,
> and each one is dual-core processor, right?
> I am guessing a bit here, though.
> I didn't understand your description of your machine configuration,
> but since you know it well, you can figure out the right number, which may
> not be 8.
> Anyway, if your OS is Linux,
> you can get the exact number cores available on each machine by looking at
> /proc/cpuinfo
> ("more  /proc/cpuinfo").
>
> However, beware that speedup in multi-core CPUs is not really linear, due
> to memory contention
> and bus "congestion".
> There are people who simply leave one or more cores out of the game because
> of this.
>
> 2) For tighter control of where the processes run you can use the
> "-configfile" feature of mpiexec.
> See the manual page of mpiexec (man mpiexec) for details.
> You may need to point to the exact man page path if you have many versions
> of mpiexec in your computer.
>
> 3) The cpi.c program may not be a great speedup test.
> I guess the program was meant just to test if MPI is working, not as a
> performance benchmark.
>
> If you look at the code you'll see that the number of iterations
> in the program's single loop is10000, and the computations are very modest.
> With such a small number of iterations, if you divide the task by, say, 10
> processors,
> each one does only 1000 simple computations, which is very fas compared to
> other tasks in the beginning and end of the code.
>
> The execution time is likely to be dominated by non-parallel tasks
> (initialization, etc),
> which don't scale with the number of processors.
> This is an aspect of "Amdahl's Law":
> http://en.wikipedia.org/wiki/Amdahl's_law<http://en.wikipedia.org/wiki/Amdahl%27s_law>
>
> Actually, as you increase the number of processors,
> the communication overhead during initialization and finalization
> may worsen the execution time when the computations are so modest as they
> are in cpi.c.
>
> I guess the cpi.c code also has an unnecessary call to mpi_broadcast, which
> may be a leftover
> from a previous version, and which may add to the overhead.
> But this only the MPICH developers can tell.
>
> There are synthetic benchmarks out there.
> However, since you want to run the molecular dynamics code,
> maybe you could setup a simple speedup experiment with it.
> This may take less effort, and will help you use and understand the code
> you are interested in.
>
> I hope this helps.
>
> Gus Correa
>
> --
> ---------------------------------------------------------------------
> Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
> Lamont-Doherty Earth Observatory - Columbia University
> P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
> Oceanography Bldg., Rm. 103-D, ph. (845) 365-8911, fax (845) 365-8736
> ---------------------------------------------------------------------
>
>
> Ariovaldo de Souza Junior wrote:
>
>  Hi Chong,
>>
>> Yes, I'm a student. In the truth who will use mpi is my teacher, who will
>> run NAMD, for molecular dynamics. I was challenged about set it up this
>> cluster and I had no knowledge about Linux before start it. Once it running
>> now, I think that maybe I could go a bit more far. Theoretically my work is
>> over, the machines are working well. But yet I would like to know how to
>> extract the maximum performance of these computers, that have already proof
>> that they are good ones, once we had utilized them already for run Gaussian
>> 03 molecular calculations.
>>
>> I wanted to know a bit more because I love computers and now I was
>> introduced to this universe of clusters, I wanted to know a bit more. Just
>> it. I'm reading already the tips you gave me, even it is a bit complicated
>> to extract the information I want from there. thanks a lot for your
>> attention.
>>
>> Ari.
>>
>> 2008/7/2 chong tan <chong_guan_tan at yahoo.com <mailto:
>> chong_guan_tan at yahoo.com>>:
>>
>>
>>    Ari,
>>
>>    Are you a student ? Anyway, I like to point you to the answer of
>>    your problem:
>>
>>    mpiexec -help
>>
>>
>>    or  look at your mpich2 packge, under www/www1, there is a
>>    mpiexec.html
>>
>>
>>    it is easier to give your the answer, but getting you to look for
>>    the answer is better.
>>
>>
>>
>>    stan
>>
>>
>>    --- On *Wed, 7/2/08, Ariovaldo de Souza Junior
>>    /<ariovaldojunior at gmail.com <mailto:ariovaldojunior at gmail.com>>/*
>>    wrote:
>>
>>        From: Ariovaldo de Souza Junior <ariovaldojunior at gmail.com
>>        <mailto:ariovaldojunior at gmail.com>>
>>        Subject: [mpich-discuss] core 2 quad and other multiple core
>>        processors
>>        To: mpich-discuss at mcs.anl.gov <mailto:mpich-discuss at mcs.anl.gov>
>>
>>        Date: Wednesday, July 2, 2008, 1:15 PM
>>
>>
>>        Hello everybody!
>>
>>        I'm really a newbie on clustering, so I have some, let's say,
>>        stupid questions. When I'm starting a job like this "mpiexec
>>        -l -n 6 ./cpi" in my small cluster of (until now) 6 core 2
>>        quad machines, I'm sending 1 process to each node, right?
>>        Assuming that I'm correct, each process will utilize only 1
>>        core of each node? and how to make 1 process run utilizing the
>>        whole processing capacity of the processor, the 4 cores? is
>>        there a way to do this? or I'll always utilize just one
>>        processor for each process? if I change this submission to
>>        "mpiexec -l -n 24 ./cpi" then the same process will run 24
>>        times, 4 times per node (maybe simultaneously) and one process
>>        per core, right?
>>
>>        I'm asking all this because I think it is a bit strange to see
>>        the processing time increasing each time I put one more
>>        process to run, once in my mind it should be the contrary.
>>        I'll give some examples:
>>
>>        mpiexec -n 1 ./cpi
>>        wall clock time = 0.000579
>>
>>        mpiexec -n 2 ./cpi
>>        wall clock time = 0.002442
>>
>>        mpiexec -n 3 ./cpi
>>        wall clock time = 0.004568
>>
>>        mpiexec -n 4 ./cpi
>>        wall clock time = 0.005150
>>
>>        mpiexec -n 5 ./cpi
>>        wall clock time = 0.008923
>>
>>        mpiexec -n 6 ./cpi
>>        wall clock time = 0.009309
>>
>>        mpiexec -n 12 ./cpi
>>        wall clock time = 0.019445
>>
>>        mpiexec -n 18 ./cpi
>>        wall clock time = 0.032204
>>
>>        mpiexec -n 24 ./cpi
>>        wall clock time = 0.045413
>>
>>        mpiexec -n 48 ./cpi
>>        wall clock time = 0.089815
>>
>>        mpiexec -n 96 ./cpi
>>        wall clock time = 0.218894
>>
>>        mpiexec -n 192 ./cpi
>>        wall clock time = 0.492870
>>
>>        So, as you all can see is that as more processes I add, more
>>        time it takes, what makes me think that mpi is performing this
>>        test 192 times in the end and due to this the time increased.
>>        Is that correct that mpi performed the same test 192? Or did
>>        it divide the process into 192 pieces, calculated and then
>>        gathered the results and mounted the output again? I really
>>        would like to understand this relationship processor # x
>>        process # x .
>>
>>        I have the feeling that my questions are a bit "poor" and
>>        really from a newbie, but the answer will help me on utilizing
>>        other programs that will need mpi to run.
>>
>>        Thanks to all!
>>
>>        Ari - UFAM - Brazil
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20080709/ff1e9f34/attachment.htm>