[petsc-users] Slower performance in multi-node system
Luciano Siqueira
luciano.siqueira at usp.br
Wed Feb 3 13:41:10 CST 2021
Hello,
I'm evaluating the performance of an application in a distributed
environment and I notice that it's much slower when running in many
nodes/cores when compared to a single node with a fewer cores.
When running the application in 20 nodes, the Main Stage time reported
in PETSc's log is up to 10 times slower than it is when running the same
application in only 1 node, even with fewer cores per node.
The application I'm running is an example code provided by libmesh:
http://libmesh.github.io/examples/introduction_ex4.html
The application runs inside a Singularity container, with openmpi-4.0.3
and PETSc 3.14.3. The distributed processes are managed by slurm
17.02.11 and each node is equipped with two Intel CPU Xeon E5-2695v2 Ivy
Bridge (12c @2,4GHz) and 128Gb of RAM, all communications going through
infiniband.
My questions are: Is the slowdown expected? Should the application be
specially tailored to work well in distributed environments?
Also, where (maybe in PETSc documentation/source-code) can I find
information on how PETSc handles MPI communications? Do the KSP solvers
favor one-to-one process communication over broadcast messages or
vice-versa? I suspect inter-process communication must be the cause of
the poor performance when using many nodes, but not as much as I'm seeing.
Thank you in advance!
Luciano.
More information about the petsc-users
mailing list