[petsc-users] performance surprise

Dominik Szczerba dominik at itis.ethz.ch
Fri Jan 20 15:27:25 CST 2012


I am running some performance tests on a distributed cluster each node
16 cores (Cray).
I am very surprised to find that my benchmark jobs are about 3x slower when
running on N nodes using all 16 cores than when running on N*16 nodes
using only one core. I find this using 2 independent petsc builds and
they both exibit the same behavior: my own gnu
build and the system module petsc, both 3.2. I was so far unable to
build my own petsc version with cray compilers to compare.

The scheme is relatively complex with a shell matrix and block
preconditioners, transient non-linear problem. I am using boomeramg
from hypre.

What do you think this unexpected performance may come from? Is it
possible that the node interconnect is faster than the shared memory
bus on the node? I was expecting the exact opposite.

Thanks for any opinions.

Dominik


More information about the petsc-users mailing list