[mpich-discuss] Why is my quad core slower than cluster
zach
zachlubin at gmail.com
Wed Jul 9 11:29:15 CDT 2008
Hi,
Thanks for the info.
Something is really wrong because comparing 1 to 4 core utilization on the code
on my home pc, the speed increase is *tiny*.
I am pretty sure my home pc recognizes all the ram and when I run top
my sim is the only one really taking up the majority of the resources.
I see four processes at 98% CPU (or more) for 4 processes. Memory at
like 11% or so...
One thing I noticed is then I type
"which mpirun"
"which mpd"
--The ones in /usr/bin come up.
I worked around this by using absolute path to the mpich install dir
versions when running mpd & before a sim and starting a sim with
mpirun.
Is it possible that the path setup is still causing an issue in the
communication?
Also, I am not sure how mpd and ssh are related, but do I need to
configure ssh settings in the mpich install in any way for a quad core
box?
Zach
On 7/9/08, Gus Correa <gus at ldeo.columbia.edu> wrote:
> Hi Zach and list
>
> Well, re-reading your message with more attention I saw your
> estimate or 1/3 speed on your home computer.
> That is indeed on the low side, although I would expect a multi-core machine
> to run slower than a cluster of single-cores, specially if the programs are
> memory-intensive.
> A lower performance factor on the ballpark of 0.6-0.8 is what I've seen so
> far on
> memory-intensive programs (climate models), but not your low value of
> 1/3=0.33.
>
> In any case, you are sure all four cores are working on your home PC,
> and the SMP kernel is running.
>
> Besides, you seem to have installed the MPICH2 nemesis device in your home
> PC,
> and hopefully on the cluster too, as recommended by Rajeev
>
> In addition, you don't seem to have a problem with memory size (which would
> trigger
> paging), because you said the total memory in your PC and in the cluster is
> the same.
> I would check the amount of memory in /proc/meminfo anyway (look for
> MemTotal). Some memory modules are tricky,
> and have to be sat on matched slots, in order to be recognized.
> We had this problem here with a Dell computer.
> The computer manual diagrams were misleading,
> and it took a few attempts to get the memory slots right, when we upgraded
> it.
> Before that, a system with 8GB of memory installed would recognize only 2GB.
>
> So, if all the considerations above are correct, a number of possibilities
> are removed.
> Let's try other possibilities.
>
> What is the other (concurrent) activity in your home PC that runs along
> with your mpich program?
> I have experienced significant performance degradation of MPI programs
> running in standalone PCs when other users login and start their programs.
> The memory-greedy Matlab is the first killer, but fancy desktops, intense
> web browsing,
> streaming video and music, etc, can compete with the mpich program for
> memory and CPU cycles,
> to the point that the mpich program can't really work,
> and spends most of the time switching context in/out.
> HPC and interactive workstation use don't really mix well.
> You can monitor this activity with the "top" command on your PC,
> and compare it with what you get from "top" on your cluster nodes.
>
> I would also suggest starting the system at runlevel 3 (no X-windows),
> and running the mpich program alone, if you want to make a fair performance
> comparison between your home PC and your office cluster (whose nodes are
> likely to be at runlevel 3 and be dedicated to run the mpich program only).
>
> Also, a fair comparison should take into account the cpu speeds of each
> computer.
> A 3.6GHz processor works faster than a 2.8GHz of similar architecture.
> Since both computers you use have Intel processors (comparing Intel with AMD
> seems to be more complicated),
> maybe you can just look at the raw processor speeds
> in /proc/cpuinfo (look for cpu MHz), and factor in the ratio of these values
> on both computers,
> when you compare their performance.
>
> I hope this helps.
>
> Gus Correa
>
> --
> ---------------------------------------------------------------------
> Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
> Lamont-Doherty Earth Observatory - Columbia University
> P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
> Oceanography Bldg., Rm. 103-D, ph. (845) 365-8911, fax (845) 365-8736
> ---------------------------------------------------------------------
>
>
>
> zach wrote:
>
>
> > Thanks for the info.
> > I tried all of these things but it does not look like it gave any
> improvement.
> > Zach
> >
> > On Tue, Jul 8, 2008 at 2:52 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:
> >
> >
> > > PS:
> > >
> > > Zach: A couple of obvious checks, besides Rajeev's important
> suggestion:
> > >
> > > 1) Make sure the SMP kernel is running on your home PC:
> > > "uname -a"
> > > (Should show "smp" as part of the string.)
> > >
> > > 2) Check if Ubuntu triggers all four cores:
> > > "cat /proc/cpuinfo". (Should show four "virtual" CPUs.)
> > >
> > > Gus Correa
> > >
> > > ##########
> > >
> > > Rajeev Thakur wrote:
> > >
> > > Try using the Nemesis device in MPICH2 if you aren't already. Configure
> with
> > > --with-device=ch3:nemesis.
> > >
> > > Rajeev
> > >
> > >
> > > Gus Correa wrote:
> > >
> > >
> > >
> > > > Hello Zach and list
> > > >
> > > > From all that I've observed on dual-processor dual-core PCs,
> > > > and from all that I've read on the web about dual-processor quad-core
> > > > machines,
> > > > your results are not alarming, but typical.
> > > > I was as disappointed as you are, when I saw my speedup results.
> > > > A lot of people out there had the same frustration too.
> > > >
> > > > My benchmarks using a standard climate atmospheric model (NCAR CAM3)
> on
> > > > a dual-processor dual-core Xeon workstation showed a speedup factor of
> 3
> > > > (not 4),
> > > > when I moved from one core to four cores.
> > > > Likewise for a dual-processor dual-core Opteron workstation,
> > > > I've got a speedup factor slightly below 3.5. (Better than Xeon, but
> still
> > > > not 4).
> > > >
> > > > The problem seems to get worse with quad-cores, again with the
> Opterons
> > > > slightly ahead of the game.
> > > > Memory/bus contention has been mentioned as the culprit by a lot of
> > > > people.
> > > > One core in a multicore doesn't scale as one (single-core) CPU.
> > > >
> > > > You will find plenty of references to this problem on the web and on
> many
> > > > mailing lists:
> > > > here in the MPICH list, on the Rocks Cluster list, on the MITgcm list,
> > > > etc, etc.
> > > >
> > > > I hope it heals (as helping it cannot)
> > > > Gus Correa
> > > >
> > > >
> > > >
> > > zach wrote:
> > >
> > >
> > >
> > > > I am using a cluster.
> > > > Each pc has two cpus and they are Xeon. Each cpu has 4GB, i think red
> > > > hat is running.
> > > >
> > > > I also use a pc at home- quad core intel chip, 8gb ram, ubuntu.
> > > >
> > > > Both are using mpich.
> > > >
> > > > I have found that my home pc is only running about 1/3 the speed of
> > > > the cluster, and the number of processes (4) and code is the same.
> > > >
> > > > Can anyone tell me if this is typical, and why, or am I not optimizing
> > > > something properly?
> > > >
> > > > Thanks
> > > > Zach
> > > >
> > > >
> > > >
> > >
> > >
> >
>
>
More information about the mpich-discuss
mailing list