PETSc runs slower on a shared memory machine than on a cluster

On 2/3/07, Shi Jin <jinzishuai at> wrote:
> Thank you
> I rebuilt MPICH-2 with --with-device=ch3:shm and
> --with-pm=gforker
> I did see a slight improvement in speed. However,
> compared with the cluster runs, the shared-memory
> performance is still not as good at all.
> So I think the problem is indeed in the memory
> subsystem as Satith said.

Shi, can you provide me some more info about all this?

- What kind of problem are you solving?
- Are you using MATMPIAIJ or MATMPIBAIJ?
- What do you use to partition your problem (ParMetis)?
- How many processes do you have in your run (-np option) ?
- When you run in your cluster, you launc 1 process in each CPU of
your node? I mean, do you have 4 processes runing in each node?
- What kind of network do you have in your cluster? GiE? or something better?

I ask all this regarding previous comments of Barry and Shatish. If
you have 4 processes running on each node, them surely communicate
each other using the loopback interface, and this will have a
bandwidth similar to your memory bandwidth, so in your case not all
communication will go through the wires...

Sorry for my English,

Lisandro Dalcín
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

