Hello,
<div><br></div><div>I'm looking for help reducing the time and communication of a parallel MatMatSolve using MUMPS. On a single processor I experience decent solve times (~9 seconds each), but when moving to multiple processors I see longer times with more cores. I've run with -log_summary and confirmed (practically) all the time is spent in MatMatSolve. I'm fairly certain it's all communication between nodes and I'm trying to figure out where I can make optimizations, or if it is even feasible for this type of problem. It is a parallel, dense, direct solve using MUMPS with an LU preconditioner. I know there are many smaller optimizations that can be done in other areas, but at the moment it is only the solve that concerns me.</div>
<div><br></div><div><div><font class="Apple-style-span" face="'courier new', monospace">---------------------------------------------- PETSc Performance Summary: ----------------------------------------------</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace">./cntor on a complex-c named hpc-1-0.local with 2 processors, by abyrd Mon Aug 1 16:25:51 2011</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Using Petsc Release Version 3.1.0, Patch 8, Thu Mar 17 13:37:48 CDT 2011</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br>
</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> Max Max/Min Avg Total </font></div><div><font class="Apple-style-span" face="'courier new', monospace">Time (sec): 1.307e+02 1.00000 1.307e+02</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Objects: 1.180e+02 1.00000 1.180e+02</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Flops: 0.000e+00 0.00000 0.000e+00 0.000e+00</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Flops/sec: 0.000e+00 0.00000 0.000e+00 0.000e+00</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Memory: 2.091e+08 1.00001 4.181e+08</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">MPI Messages: 7.229e+03 1.00000 7.229e+03 1.446e+04</font></div><div><font class="Apple-style-span" face="'courier new', monospace">MPI Message Lengths: 4.141e+08 1.00000 5.729e+04 8.283e+08</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">MPI Reductions: 1.464e+04 1.00000</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> e.g., VecAXPY() for real vectors of length N --> 2N flops</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> and VecAXPY() for complex vectors of length N --> 8N flops</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br>
</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --</font></div><div>
<font class="Apple-style-span" face="'courier new', monospace"> Avg %Total Avg %Total counts %Total Avg %Total counts %Total </font></div><div><font class="Apple-style-span" face="'courier new', monospace"> 0: Main Stage: 1.3072e+02 100.0% 0.0000e+00 0.0% 1.446e+04 100.0% 5.729e+04 100.0% 1.730e+02 1.2% </font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace">------------------------------------------------------------------------------------------------------------------------</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">See the 'Profiling' chapter of the users' manual for details on interpreting output.</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Phase summary info:</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> Count: number of times phase was executed</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> Time and Flops: Max - maximum over all processors</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> Ratio - ratio of maximum to minimum over all processors</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> Mess: number of messages sent</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> Avg. len: average message length</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> Reduct: number of global reductions</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> Global: entire computation</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> %T - percent time in this phase %F - percent flops in this phase</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> %M - percent messages in this phase %L - percent message lengths in this phase</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> %R - percent reductions in this phase</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">------------------------------------------------------------------------------------------------------------------------</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br>
</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace"> ##########################################################</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> # #</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> # WARNING!!! #</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> # #</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> # This code was compiled with a debugging option, #</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> # To get timing results run config/configure.py #</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> # using --with-debugging=no, the performance will #</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> # be generally two or three times faster. #</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> # #</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> ##########################################################</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br>
</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br>
</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> ##########################################################</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> # #</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> # WARNING!!! #</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> # #</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> # The code for various complex numbers numerical #</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> # kernels uses C++, which generally is not well #</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> # optimized. For performance that is about 4-5 times #</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> # faster, specify --with-fortran-kernels=1 #</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> # when running config/configure.py. #</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> # #</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> ##########################################################</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br>
</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace">Event Count Time (sec) Flops --- Global --- --- Stage --- Total</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s</font></div><div><font class="Apple-style-span" face="'courier new', monospace">------------------------------------------------------------------------------------------------------------------------</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace">--- Event Stage 0: Main Stage</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br>
</font></div><div><font class="Apple-style-span" face="'courier new', monospace">MatSolve 14400 1.0 1.2364e+02 1.0 0.00e+00 0.0 1.4e+04 5.7e+04 2.0e+01 95 0100100 0 95 0100100 12 0</font></div><div>
<font class="Apple-style-span" face="'courier new', monospace">MatLUFactorSym 4 1.0 2.0027e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0</font></div><div><font class="Apple-style-span" face="'courier new', monospace">MatLUFactorNum 4 1.0 3.4223e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 3 0 0 0 0 3 0 0 0 14 0</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">MatConvert 1 1.0 2.3644e-01 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 1.1e+01 0 0 0 0 0 0 0 0 0 6 0</font></div><div><font class="Apple-style-span" face="'courier new', monospace">MatAssemblyBegin 14 1.0 1.9959e-01 9.3 0.00e+00 0.0 3.0e+01 5.2e+04 1.2e+01 0 0 0 0 0 0 0 0 0 7 0</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">MatAssemblyEnd 14 1.0 1.9908e-01 1.1 0.00e+00 0.0 4.0e+00 2.8e+01 2.0e+01 0 0 0 0 0 0 0 0 0 12 0</font></div><div><font class="Apple-style-span" face="'courier new', monospace">MatGetRow 32 1.0 4.2677e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">MatGetSubMatrice 4 1.0 7.6661e-03 1.0 0.00e+00 0.0 1.6e+01 1.2e+05 2.4e+01 0 0 0 0 0 0 0 0 0 14 0</font></div><div><font class="Apple-style-span" face="'courier new', monospace">MatMatSolve 4 1.0 1.2380e+02 1.0 0.00e+00 0.0 1.4e+04 5.7e+04 2.0e+01 95 0100100 0 95 0100100 12 0</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">VecSet 4 1.0 1.8590e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0</font></div><div><font class="Apple-style-span" face="'courier new', monospace">VecScatterBegin 28800 1.0 2.2810e+00 2.2 0.00e+00 0.0 1.4e+04 5.7e+04 0.0e+00 1 0100100 0 1 0100100 0 0</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">VecScatterEnd 14400 1.0 4.1534e+00 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0</font></div><div><font class="Apple-style-span" face="'courier new', monospace">KSPSetup 4 1.0 1.1060e-0212.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">PCSetUp 4 1.0 3.4280e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 5.6e+01 3 0 0 0 0 3 0 0 0 32 0</font></div><div><font class="Apple-style-span" face="'courier new', monospace">------------------------------------------------------------------------------------------------------------------------</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace">Memory usage is given in bytes:</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br>
</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Object Type Creations Destructions Memory Descendants' Mem.</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Reports information only for process 0.</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"><br></font></div><div><font class="Apple-style-span" face="'courier new', monospace">--- Event Stage 0: Main Stage</font></div><div><font class="Apple-style-span" face="'courier new', monospace"><br>
</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> Matrix 27 27 208196712 0</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> Vec 36 36 1027376 0</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> Vec Scatter 11 11 7220 0</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> Index Set 42 42 22644 0</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace"> Krylov Solver 1 1 34432 0</font></div><div><font class="Apple-style-span" face="'courier new', monospace"> Preconditioner 1 1 752 0</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">========================================================================================================================</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Average time to get PetscTime(): 1.90735e-07</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Average time for MPI_Barrier(): 3.8147e-06</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Average time for zero size MPI_Send(): 7.51019e-06</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">#PETSc Option Table entries:</font></div><div><font class="Apple-style-span" face="'courier new', monospace">-log_summary</font></div><div>
<font class="Apple-style-span" face="'courier new', monospace">-pc_factor_mat_solver_package mumps</font></div><div><font class="Apple-style-span" face="'courier new', monospace">-pc_type lu</font></div><div>
<font class="Apple-style-span" face="'courier new', monospace">#End of PETSc Option Table entries</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Compiled without FORTRAN kernels</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Compiled with full precision matrices (default)</font></div><div><font class="Apple-style-span" face="'courier new', monospace">sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Configure run at: Mon Jul 11 15:28:42 2011</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Configure options: PETSC_ARCH=complex-cpp-mumps --with-cc=mpicc --with-fc=mpif90 --with-blas-lapack-dir=/usr/lib64 --with-shared --with-clanguage=c++ --with-scalar-type=complex --download-mumps=1 --download-blacs=1 --download-scalapack=1 --download-parmetis=1 --with-cxx=mpicxx</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">-----------------------------------------</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Libraries compiled on Mon Jul 11 15:39:58 EDT 2011 on sc.local </font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Machine characteristics: Linux sc.local 2.6.18-194.11.1.el5 #1 SMP Tue Aug 10 19:05:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux </font></div><div>
<font class="Apple-style-span" face="'courier new', monospace">Using PETSc directory: /panfs/storage.local/scs/home/abyrd/petsc-3.1-p8</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Using PETSc arch: complex-cpp-mumps</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">-----------------------------------------</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Using C compiler: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -g -fPIC </font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable -g </font></div><div><font class="Apple-style-span" face="'courier new', monospace">-----------------------------------------</font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Using include paths: -I/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/complex-cpp-mumps/include -I/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/include -I/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/complex-cpp-mumps/include -I/usr/mpi/gnu/openmpi-1.4.2/include -I/usr/mpi/gnu/openmpi-1.4.2/lib64 </font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">------------------------------------------</font></div><div><font class="Apple-style-span" face="'courier new', monospace">Using C linker: mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -g </font></div>
<div><font class="Apple-style-span" face="'courier new', monospace">Using Fortran linker: mpif90 -fPIC -Wall -Wno-unused-variable -g </font></div><div><font class="Apple-style-span" face="'courier new', monospace">Using libraries: -Wl,-rpath,/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/complex-cpp-mumps/lib -L/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/complex-cpp-mumps/lib -lpetsc -lX11 -Wl,-rpath,/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/complex-cpp-mumps/lib -L/panfs/storage.local/scs/home/abyrd/petsc-3.1-p8/complex-cpp-mumps/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs -Wl,-rpath,/usr/lib64 -L/usr/lib64 -llapack -lblas -lnsl -lrt -Wl,-rpath,/usr/mpi/gnu/openmpi-1.4.2/lib64 -L/usr/mpi/gnu/openmpi-1.4.2/lib64 -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2 -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl </font></div>
</div><div><br></div><div>Respectfully,</div><div>Adam Byrd</div>