PETSc runs slower on a shared memory machine than on a cluster

Fri Feb 2 15:22:21 CST 2007

Hi there,

I am fairly new to PETSc but have 5 years of MPI
programming already. I recently took on a project of
analyzing a finite element code written in C with
PETSc.
I found out that on a shared-memory machine (60GB RAM,
16    CPUS), the code runs around 4 times slower than
on a distributed memory cluster (4GB Ram, 4CPU/node),
although they yield identical results.
There are 1.6Million finite elements in the problem so
it is a fairly large calculation. The total memory
used is 3GBx16=48GB.

Both the two systems run Linux as OS and the same code
is compiled against the same version of MPICH-2 and
PETSc.

The shared-memory machine is actually a little faster
than the cluster machines in terms of single process
runs.

I am surprised at this result since we usually tend to
think that shared-memory would be much faster since
the in-memory operation is much faster that the
network communication.

However, I read the PETSc FAQ and found that "the
speed of sparse matrix computations is almost totally
determined by the speed of the memory, not the speed
of the CPU". 
This makes me wonder whether the poor performance of
my code on a shared-memory machine is due to the
competition of different process on the same memory
bus. Since the code is still MPI based, a lot of data
are moving around inside the memory. Is this a
reasonable explanation of what I observed?

Thank you very much.

Shi

____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com