[petsc-users] scaling problem

Aron Roland roland at wb.tu-darmstadt.de
Sat May 26 17:13:33 CDT 2012


Dear All,

I have some question on some recent implementation of PETSc for solving 
a large linear system from a 4d problem on hybrid unstructured meshes.

The point is that we have implemented all the mappings and the solution 
is fine, the number of iterations too. The results are robust with 
respect to the amount of CPU used but we have a scaling issue.

The system is an intel cluster of the latest generation on Infiniband.

We have attached the summary ... with hooefully a lot of informations.

Any comments, suggestions, ideas are very welcome.

We have been reading the threads with that are dealing with multi-core 
and the bus-limitation stuff, so we are aware of this.

I am thinking now on an open/mpi hybrid stuff but I am not quite happy 
with the bus-limitation explanation, most of the systems are multicore.

I hope the limitation are not the sparse matrix mapping that we are 
using ...

Thanks in advance ...

Cheers

Aron



*
*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120527/571f656d/attachment.html>
-------------- next part --------------
PuertoRico Ymir
Nodes:  124815

25.5.2012

Solver: KSPLGMRES
Preconditioner: PCMG

Matrix Size: 71,893,440 x 71,893,440
Total matrix NNZ: 482,500,000

MSC = MCD = 24
dt = 60sec
Simulationszeit 20min
20 Zeitschritte (20 calls to KSPSolve)

                    FLUCT        Solver    Solver       Solver    Eff.    App          App        Eff.
Nodes    Threads    time[sec]    Iter      Time[sec]    Speedup   [%]     Time[sec]    Speedup    [%]
  4         8         33-38        7-9        18.8        3.2     40         998          4.5      56
  5        25         10-11        8-10        6.85       8.8     35         358         12.6      50
  4        32         10-12        8-11        6.45       9.3     29         317         14.2      44
  8        32         12-14        8-11        8.75       6.9     22         392         11.5      36
 10        40         11-13        9-10        7.10       8.5     21         355         12.7      32
 10        50          7-8         8-11        5.10      11.8     24         252         17.9      36

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/Utilisateurs/aroland/bin/selfewwm_thomas on a linux-int named r1i3n7 with 40 processors, by aroland Fri May 25 19:17:22 2012
Using Petsc Release Version 3.2.0, Patch 6, Wed Jan 11 09:28:45 CST 2012

                         Max       Max/Min        Avg      Total
Time (sec):           3.359e+02      1.00000   3.359e+02
Objects:              1.400e+02      1.00000   1.400e+02
Flops:                3.776e+10      1.14746   3.558e+10  1.423e+12
Flops/sec:            1.124e+08      1.14746   1.059e+08  4.237e+09
MPI Messages:         5.280e+03      6.00000   2.442e+03  9.768e+04
MPI Message Lengths:  5.991e+08      3.80128   1.469e+05  1.435e+10
MPI Reductions:       1.406e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total
 0:      Main Stage: 3.3594e+02 100.0%  1.4233e+12 100.0%  9.768e+04 100.0%  1.469e+05      100.0%  1.405e+03  99.9%


------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult              438 1.0 4.4104e+01 1.9 1.06e+10 1.1 9.7e+04 1.5e+05 0.0e+00 11 28100100  0  11 28100100  0  9131
MatSolve             458 1.0 3.6361e+01 2.8 1.09e+10 1.1 0.0e+00 0.0e+00 0.0e+00 10 29  0  0  0  10 29  0  0  0 11380
MatLUFactorNum        20 1.0 6.9722e+00 2.7 4.73e+0810.5 0.0e+00 0.0e+00 0.0e+00  1  1  0  0  0   1  1  0  0  0  1534
MatILUFactorSym        1 1.0 2.2795e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      45 1.0 8.7023e-05 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd        45 1.0 2.9763e+00 1.7 0.00e+00 0.0 4.4e+02 3.7e+04 8.0e+00  1  0  0  0  1   1  0  0  0  1     0
MatGetRowIJ            1 1.0 6.1989e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice      20 1.0 5.9878e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  2  0  0  0  0   2  0  0  0  0     0
MatGetOrdering         1 1.0 1.8968e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        19 1.0 9.1826e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecMax                20 1.0 1.0160e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01  0  0  0  0  1   0  0  0  0  1     0
VecMin                20 1.0 1.0970e+0012.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01  0  0  0  0  1   0  0  0  0  1     0
VecMDot              418 1.0 3.4962e+01 4.9 4.69e+09 1.2 0.0e+00 0.0e+00 4.2e+02  4 12  0  0 30   4 12  0  0 30  5013
VecNorm              687 1.0 4.1451e+01 5.3 2.64e+09 1.2 0.0e+00 0.0e+00 6.9e+02  5  7  0  0 49   5  7  0  0 49  2383
VecScale             667 1.0 5.9862e+00 3.2 1.28e+09 1.2 0.0e+00 0.0e+00 0.0e+00  2  3  0  0  0   2  3  0  0  0  8011
VecCopy              269 1.0 2.2040e+00 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecSet              1225 1.0 8.5033e+00 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecAXPY              269 1.0 3.6360e+00 2.7 1.03e+09 1.2 0.0e+00 0.0e+00 0.0e+00  1  3  0  0  0   1  3  0  0  0 10638
VecMAXPY             667 1.0 1.5898e+01 2.9 6.30e+09 1.2 0.0e+00 0.0e+00 0.0e+00  4 17  0  0  0   4 17  0  0  0 14806
VecAssemblyBegin      40 1.0 2.4674e+00297.8 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+02  1  0  0  0  9   1  0  0  0  9     0
VecAssemblyEnd        40 1.0 1.2517e-04 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      438 1.0 1.1889e+00 9.1 0.00e+00 0.0 9.7e+04 1.5e+05 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd        438 1.0 2.2600e+0189.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecNormalize         458 1.0 3.5072e+01 3.8 2.64e+09 1.2 0.0e+00 0.0e+00 4.6e+02  5  7  0  0 33   5  7  0  0 33  2817
KSPGMRESOrthog       418 1.0 3.9814e+01 2.2 9.38e+09 1.2 0.0e+00 0.0e+00 4.2e+02  7 25  0  0 30   7 25  0  0 30  8805
KSPSetup              60 1.0 1.7510e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  1   0  0  0  0  1     0
KSPSolve              20 1.0 1.4208e+02 1.0 3.78e+10 1.1 9.7e+04 1.5e+05 1.2e+03 42100100100 82  42100100100 82 10017
PCSetUp               40 1.0 1.3285e+01 2.1 4.73e+0810.5 0.0e+00 0.0e+00 1.2e+01  3  1  0  0  1   3  1  0  0  1   805
PCSetUpOnBlocks      687 1.0 4.4281e+01 2.5 1.13e+10 1.1 0.0e+00 0.0e+00 3.0e+00 12 30  0  0  0  12 30  0  0  0  9586
PCApply              229 1.0 1.0103e+02 1.1 2.30e+10 1.1 5.1e+04 1.5e+05 6.9e+02 29 61 52 52 49  29 61 52 52 49  8567

------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Index Set    10             10      7700472     0
   IS L to G Mapping     1              1          564     0
   Application Order     1              1       999160     0
              Matrix     5              5    386024408     0
              Vector   114            114   1062454512     0
      Vector Scatter     2              2         2072     0
       Krylov Solver     3              3        37800     0
      Preconditioner     3              3         2800     0
              Viewer     1              0            0     0

========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 3.93867e-05
Average time for zero size MPI_Send(): 4.75049e-06
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Thu Apr 26 22:05:34 2012
Configure options: --prefix=/Utilisateurs/aroland/thomas/opt/petsc_3.2-p6 --download-f-blas-lapack=1 --with-mpi-dir=/Utilisateurs/aroland/thomas/opt/mpich2/ --with-superlu_dist=true --download-superlu_dist=yes --with-parmetis-lib="[/Utilisateurs/aroland/opt/ParMetis-3.1-64bit/libparmetis.a,/Utilisateurs/aroland/opt/ParMetis-3.1-64bit/libmetis.a]" --with-parmetis-include=/Utilisateurs/aroland/opt/ParMetis-3.1-64bit/ --with-debugging=0 COPTFLAGS=-O3 FOPTFLAGS=-O3
-----------------------------------------
Libraries compiled on Thu Apr 26 22:05:34 2012 on service1
Machine characteristics: Linux-2.6.32.12-0.7-default-x86_64-with-SuSE-11-x86_64
Using PETSc directory: /Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6
Using PETSc arch: linux-intel-performance
-----------------------------------------

Using C compiler: /Utilisateurs/aroland/thomas/opt/mpich2/bin/mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /Utilisateurs/aroland/thomas/opt/mpich2/bin/mpif90  -O3   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: -I/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/include -I/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/include -I/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/include -I/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/include -I/Utilisateurs/aroland/opt/ParMetis-3.1-64bit -I/Utilisateurs/aroland/thomas/opt/mpich2/include -I/Utilisateurs/aroland/thomas/opt/mpich2-1.4.1p1/include
-----------------------------------------

Using C linker: /Utilisateurs/aroland/thomas/opt/mpich2/bin/mpicc
Using Fortran linker: /Utilisateurs/aroland/thomas/opt/mpich2/bin/mpif90
Using libraries: -Wl,-rpath,/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/lib -L/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/lib -L/Utilisateurs/aroland/thomas/opt/downloads/petsc-3.2-p6/linux-intel-performance/lib -lsuperlu_dist_2.5 -Wl,-rpath,/Utilisateurs/aroland/opt/ParMetis-3.1-64bit -L/Utilisateurs/aroland/opt/ParMetis-3.1-64bit -lparmetis -lmetis -lflapack -lfblas -lm -L/Utilisateurs/aroland/thomas/opt/mpich2-1.4.1p1/lib -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -L/Calcul/Apps2/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/Calcul/Apps2/intel/composer_xe_2011_sp1.6.233/ipp/lib/intel64 -L/Calcul/Apps2/intel/composer_xe_2011_sp1.6.233/mkl/lib/intel64 -L/Calcul/Apps2/intel/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21 -L/Calcul/Apps/intel/impi/4.0.3.008/lib64 -L/usr/x86_64-suse-linux/lib -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -lmpichf90 -lifport -lifcore -limf -lsvml -lm -lipgo -lirc -lirc_s -lm -ldl -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl
-----------------------------------------


More information about the petsc-users mailing list