[petsc-users] scaling problem
Aron Roland
roland at wb.tu-darmstadt.de
Sat May 26 17:13:33 CDT 2012
Dear All,
I have some question on some recent implementation of PETSc for solving
a large linear system from a 4d problem on hybrid unstructured meshes.
The point is that we have implemented all the mappings and the solution
is fine, the number of iterations too. The results are robust with
respect to the amount of CPU used but we have a scaling issue.
The system is an intel cluster of the latest generation on Infiniband.
We have attached the summary ... with hooefully a lot of informations.
Any comments, suggestions, ideas are very welcome.
We have been reading the threads with that are dealing with multi-core
and the bus-limitation stuff, so we are aware of this.
I am thinking now on an open/mpi hybrid stuff but I am not quite happy
with the bus-limitation explanation, most of the systems are multicore.
I hope the limitation are not the sparse matrix mapping that we are
using ...
Thanks in advance ...
Cheers
Aron
*
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120527/571f656d/attachment.html>
-------------- next part --------------
PuertoRico Ymir
Nodes: 124815
25.5.2012
Solver: KSPLGMRES
Preconditioner: PCMG
Matrix Size: 71,893,440 x 71,893,440
Total matrix NNZ: 482,500,000
MSC = MCD = 24
dt = 60sec
Simulationszeit 20min
20 Zeitschritte (20 calls to KSPSolve)
FLUCT Solver Solver Solver Eff. App App Eff.
Nodes Threads time[sec] Iter Time[sec] Speedup [%] Time[sec] Speedup [%]
4 8 33-38 7-9 18.8 3.2 40 998 4.5 56
5 25 10-11 8-10 6.85 8.8 35 358 12.6 50
4 32 10-12 8-11 6.45 9.3 29 317 14.2 44
8 32 12-14 8-11 8.75 6.9 22 392 11.5 36
10 40 11-13 9-10 7.10 8.5 21 355 12.7 32
10 50 7-8 8-11 5.10 11.8 24 252 17.9 36
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/Utilisateurs/aroland/bin/selfewwm_thomas on a linux-int named r1i3n7 with 40 processors, by aroland Fri May 25 19:17:22 2012
Using Petsc Release Version 3.2.0, Patch 6, Wed Jan 11 09:28:45 CST 2012
Max Max/Min Avg Total
Time (sec): 3.359e+02 1.00000 3.359e+02
Objects: 1.400e+02 1.00000 1.400e+02
Flops: 3.776e+10 1.14746 3.558e+10 1.423e+12
Flops/sec: 1.124e+08 1.14746 1.059e+08 4.237e+09
MPI Messages: 5.280e+03 6.00000 2.442e+03 9.768e+04
MPI Message Lengths: 5.991e+08 3.80128 1.469e+05 1.435e+10
MPI Reductions: 1.406e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.3594e+02 100.0% 1.4233e+12 100.0% 9.768e+04 100.0% 1.469e+05 100.0% 1.405e+03 99.9%
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 438 1.0 4.4104e+01 1.9 1.06e+10 1.1 9.7e+04 1.5e+05 0.0e+00 11 28100100 0 11 28100100 0 9131
MatSolve 458 1.0 3.6361e+01 2.8 1.09e+10 1.1 0.0e+00 0.0e+00 0.0e+00 10 29 0 0 0 10 29 0 0 0 11380
MatLUFactorNum 20 1.0 6.9722e+00 2.7 4.73e+0810.5 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 1534
MatILUFactorSym 1 1.0 2.2795e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 45 1.0 8.7023e-05 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 45 1.0 2.9763e+00 1.7 0.00e+00 0.0 4.4e+02 3.7e+04 8.0e+00 1 0 0 0 1 1 0 0 0 1 0
MatGetRowIJ 1 1.0 6.1989e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 20 1.0 5.9878e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatGetOrdering 1 1.0 1.8968e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 19 1.0 9.1826e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecMax 20 1.0 1.0160e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 1 0 0 0 0 1 0
VecMin 20 1.0 1.0970e+0012.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 0 0 0 0 1 0 0 0 0 1 0
VecMDot 418 1.0 3.4962e+01 4.9 4.69e+09 1.2 0.0e+00 0.0e+00 4.2e+02 4 12 0 0 30 4 12 0 0 30 5013
VecNorm 687 1.0 4.1451e+01 5.3 2.64e+09 1.2 0.0e+00 0.0e+00 6.9e+02 5 7 0 0 49 5 7 0 0 49 2383
VecScale 667 1.0 5.9862e+00 3.2 1.28e+09 1.2 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 8011
VecCopy 269 1.0 2.2040e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecSet 1225 1.0 8.5033e+00 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecAXPY 269 1.0 3.6360e+00 2.7 1.03e+09 1.2 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 10638
VecMAXPY 667 1.0 1.5898e+01 2.9 6.30e+09 1.2 0.0e+00 0.0e+00 0.0e+00 4 17 0 0 0 4 17 0 0 0 14806
VecAssemblyBegin 40 1.0 2.4674e+00297.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+02 1 0 0 0 9 1 0 0 0 9 0
VecAssemblyEnd 40 1.0 1.2517e-04 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 438 1.0 1.1889e+00 9.1 0.00e+00 0.0 9.7e+04 1.5e+05 0.0e+00 0 0100100 0 0 0100100 0 0
VecScatterEnd 438 1.0 2.2600e+0189.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecNormalize 458 1.0 3.5072e+01 3.8 2.64e+09 1.2 0.0e+00 0.0e+00 4.6e+02 5 7 0 0 33 5 7 0 0 33 2817
KSPGMRESOrthog 418 1.0 3.9814e+01 2.2 9.38e+09 1.2 0.0e+00 0.0e+00 4.2e+02 7 25 0 0 30 7 25 0 0 30 8805
KSPSetup 60 1.0 1.7510e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 1 0 0 0 0 1 0
KSPSolve 20 1.0 1.4208e+02 1.0 3.78e+10 1.1 9.7e+04 1.5e+05 1.2e+03 42100100100 82 42100100100 82 10017
PCSetUp 40 1.0 1.3285e+01 2.1 4.73e+0810.5 0.0e+00 0.0e+00 1.2e+01 3 1 0 0 1 3 1 0 0 1 805
PCSetUpOnBlocks 687 1.0 4.4281e+01 2.5 1.13e+10 1.1 0.0e+00 0.0e+00 3.0e+00 12 30 0 0 0 12 30 0 0 0 9586
PCApply 229 1.0 1.0103e+02 1.1 2.30e+10 1.1 5.1e+04 1.5e+05 6.9e+02 29 61 52 52 49 29 61 52 52 49 8567
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Index Set 10 10 7700472 0
IS L to G Mapping 1 1 564 0
Application Order 1 1 999160 0
Matrix 5 5 386024408 0
Vector 114 114 1062454512 0
Vector Scatter 2 2 2072 0
Krylov Solver 3 3 37800 0
Preconditioner 3 3 2800 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 3.93867e-05
Average time for zero size MPI_Send(): 4.75049e-06
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Thu Apr 26 22:05:34 2012
Configure options: --prefix=/Utilisateurs/aroland/thomas/opt/petsc_3.2-p6 --download-f-blas-lapack=1 --with-mpi-dir=/Utilisateurs/aroland/thomas/opt/mpich2/ --with-superlu_dist=true --download-superlu_dist=yes --with-parmetis-lib="[/Utilisateurs/aroland/opt/ParMetis-3.1-64bit/libparmetis.a,/Utilisateurs/aroland/opt/ParMetis-3.1-64bit/libmetis.a]" --with-parmetis-include=/Utilisateurs/aroland/opt/ParMetis-3.1-64bit/ --with-debugging=0 COPTFLAGS=-O3 FOPTFLAGS=-O3
-----------------------------------------
Libraries compiled on Thu Apr 26 22:05:34 2012 on service1
Machine characteristics: Linux-2.6.32.12-0.7-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6
Using PETSc arch: linux-intel-performance
-----------------------------------------
Using C compiler: /Utilisateurs/aroland/thomas/opt/mpich2/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /Utilisateurs/aroland/thomas/opt/mpich2/bin/mpif90 -O3 ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/include -I/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/include -I/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/include -I/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/include -I/Utilisateurs/aroland/opt/ParMetis-3.1-64bit -I/Utilisateurs/aroland/thomas/opt/mpich2/include -I/Utilisateurs/aroland/thomas/opt/mpich2-1.4.1p1/include
-----------------------------------------
Using C linker: /Utilisateurs/aroland/thomas/opt/mpich2/bin/mpicc
Using Fortran linker: /Utilisateurs/aroland/thomas/opt/mpich2/bin/mpif90
Using libraries: -Wl,-rpath,/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/lib -L/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/lib -L/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/lib -lsuperlu_dist_2.5 -Wl,-rpath,/Utilisateurs/aroland/opt/ParMetis-3.1-64bit -L/Utilisateurs/aroland/opt/ParMetis-3.1-64bit -lparmetis -lmetis -lflapack -lfblas -lm -L/Utilisateurs/aroland/thomas/opt/mpich2-1.4.1p1/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/Calcul/Apps2/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/Calcul/Apps2/intel/composer_xe_2011_sp1.6.233/ipp/lib/intel64 -L/Calcul/Apps2/intel/composer_xe_2011_sp1.6.233/mkl/lib/intel64 -L/Calcul/Apps2/intel/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21 -L/Calcul/Apps/intel/impi/4.0.3.008/lib64 -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -lmpichf90 -lifport -lifcore -limf -lsvml -lm -lipgo -lirc -lirc_s -lm -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl
-----------------------------------------
More information about the petsc-users
mailing list