Slow speed after changing from serial to parallel
Ben Tay
zonexo at gmail.com
Tue Apr 15 19:52:19 CDT 2008
Hi,
I was initially using LU and Hypre to solve my serial code. I switched
to the default GMRES when I converted the parallel code. I've now redo
the test using KSPBCGS and also Hypre BommerAMG. Seems like
MatAssemblyBegin, VecAYPX, VecScatterEnd (in bold) are the problems.
What should I be checking? Here's the results for 1 and 2 processor for
each solver. Thank you so much!
*1 processor KSPBCGS *
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------
./a.out on a atlas3-mp named atlas3-c45 with 1 processor, by g0306332
Wed Apr 16 08:32:21 2008
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007
HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
Max Max/Min Avg Total
Time (sec): 8.176e+01 1.00000 8.176e+01
Objects: 2.700e+01 1.00000 2.700e+01
Flops: 1.893e+10 1.00000 1.893e+10 1.893e+10
Flops/sec: 2.315e+08 1.00000 2.315e+08 2.315e+08
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 3.743e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length
N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 8.1756e+01 100.0% 1.8925e+10 100.0% 0.000e+00
0.0% 0.000e+00 0.0% 3.743e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops/sec: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all
processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message
lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was run without the PreLoadBegin() #
# macros. To get timing results we always recommend #
# preloading. otherwise timing numbers may be #
# meaningless. #
##########################################################
Event Count Time (sec)
Flops/sec --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 1498 1.0 1.6548e+01 1.0 3.55e+08 1.0 0.0e+00 0.0e+00
0.0e+00 20 31 0 0 0 20 31 0 0 0 355
MatSolve 1500 1.0 3.2228e+01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00
0.0e+00 39 31 0 0 0 39 31 0 0 0 183
MatLUFactorNum 2 1.0 2.0642e-01 1.0 1.02e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 102
MatILUFactorSym 2 1.0 2.0250e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 1.7963e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 2 1.0 3.8147e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 2 1.0 2.6301e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 2 1.0 1.0190e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetup 2 1.0 2.8230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 6.7238e+01 1.0 2.81e+08 1.0 0.0e+00 0.0e+00
3.7e+03 82100 0 0100 82100 0 0100 281
PCSetUp 2 1.0 4.3527e-01 1.0 4.85e+07 1.0 0.0e+00 0.0e+00
6.0e+00 1 0 0 0 0 1 0 0 0 0 48
PCApply 1500 1.0 3.2232e+01 1.0 1.83e+08 1.0 0.0e+00 0.0e+00
0.0e+00 39 31 0 0 0 39 31 0 0 0 183
VecDot 2984 1.0 5.3279e+00 1.0 4.84e+08 1.0 0.0e+00 0.0e+00
3.0e+03 7 14 0 0 80 7 14 0 0 80 484
VecNorm 754 1.0 1.1453e+00 1.0 5.74e+08 1.0 0.0e+00 0.0e+00
7.5e+02 1 3 0 0 20 1 3 0 0 20 574
VecCopy 2 1.0 3.2830e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 3 1.0 3.9389e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 2244 1.0 4.8304e+00 1.0 4.02e+08 1.0 0.0e+00 0.0e+00
0.0e+00 6 10 0 0 0 6 10 0 0 0 402
VecAYPX 752 1.0 1.5623e+00 1.0 4.19e+08 1.0 0.0e+00 0.0e+00
0.0e+00 2 3 0 0 0 2 3 0 0 0 419
VecWAXPY 1492 1.0 5.0827e+00 1.0 2.54e+08 1.0 0.0e+00 0.0e+00
0.0e+00 6 7 0 0 0 6 7 0 0 0 254
VecAssemblyBegin 2 1.0 2.6703e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 2 1.0 5.2452e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Matrix 4 4 300369852 0
Krylov Solver 2 2 8 0
Preconditioner 2 2 336 0
Index Set 6 6 15554064 0
Vec 13 13 44937496 0
========================================================================================================================
Average time to get PetscTime(): 3.09944e-07
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Jan 8 22:22:08 2008
Configure options: --with-memcmp-ok --sizeof_char=1 --sizeof_void_p=8
--sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --sizeof_long_long=8
--sizeof_float=4 --sizeof_double=8 --bits_per_byte=8 --sizeof_MPI_Comm=4
--sizeof_MPI_Fint=4 --with-vendor-compilers=intel --with-x=0
--with-hypre-dir=/home/enduser/g0306332/lib/hypre --with-debugging=0
--with-batch=1 --with-mpi-shared=0
--with-mpi-include=/usr/local/topspin/mpi/mpich/include
--with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a
--with-mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun
--with-blas-lapack-dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0
-----------------------------------------
Libraries compiled on Tue Jan 8 22:34:13 SGT 2008 on atlas3-c01
*2 processors KSPBCGS
*
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------
./a.out on a atlas3-mp named atlas3-c25 with 2 processors, by g0306332
Wed Apr 16 08:37:25 2008
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007
HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
Max Max/Min Avg Total
Time (sec): 3.795e+02 1.00000 3.795e+02
Objects: 3.800e+01 1.00000 3.800e+01
Flops: 8.592e+09 1.00000 8.592e+09 1.718e+10
Flops/sec: 2.264e+07 1.00000 2.264e+07 4.528e+07
MPI Messages: 1.335e+03 1.00000 1.335e+03 2.670e+03
MPI Message Lengths: 6.406e+06 1.00000 4.798e+03 1.281e+07
MPI Reductions: 1.678e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length
N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 3.7950e+02 100.0% 1.7185e+10 100.0% 2.670e+03
100.0% 4.798e+03 100.0% 3.357e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops/sec: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all
processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message
lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was run without the PreLoadBegin() #
# macros. To get timing results we always recommend #
# preloading. otherwise timing numbers may be #
# meaningless. #
##########################################################
Event Count Time (sec)
Flops/sec --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 1340 1.0 7.4356e+01 1.6 5.87e+07 1.6 2.7e+03 4.8e+03
0.0e+00 16 31100100 0 16 31100100 0 72
MatSolve 1342 1.0 4.3794e+01 1.2 7.08e+07 1.2 0.0e+00 0.0e+00
0.0e+00 11 31 0 0 0 11 31 0 0 0 123
MatLUFactorNum 2 1.0 2.5116e-01 1.0 7.68e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 153
MatILUFactorSym 2 1.0 2.3831e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
*MatAssemblyBegin 2 1.0 7.9380e-0116482.3 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0*
MatAssemblyEnd 2 1.0 2.4782e-01 1.0 0.00e+00 0.0 2.0e+00 2.4e+03
7.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 2 1.0 5.0068e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 2 1.0 1.8508e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 2 1.0 8.6530e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetup 3 1.0 1.9901e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 3.3575e+02 1.0 2.56e+07 1.0 2.7e+03 4.8e+03
3.3e+03 88100100100100 88100100100100 51
PCSetUp 3 1.0 5.0751e-01 1.0 3.79e+07 1.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 0 0 0 0 0 0 76
PCSetUpOnBlocks 1 1.0 4.4248e-02 1.0 4.39e+07 1.0 0.0e+00 0.0e+00
3.0e+00 0 0 0 0 0 0 0 0 0 0 88
PCApply 1342 1.0 4.9832e+01 1.2 6.56e+07 1.2 0.0e+00 0.0e+00
0.0e+00 12 31 0 0 0 12 31 0 0 0 108
VecDot 2668 1.0 2.0710e+02 1.2 6.70e+06 1.2 0.0e+00 0.0e+00
2.7e+03 50 13 0 0 79 50 13 0 0 79 11
VecNorm 675 1.0 2.9565e+01 3.3 3.33e+07 3.3 0.0e+00 0.0e+00
6.7e+02 5 3 0 0 20 5 3 0 0 20 20
VecCopy 2 1.0 2.4400e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1338 1.0 5.9052e+00 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 2007 1.0 2.2173e+01 2.6 1.03e+08 2.6 0.0e+00 0.0e+00
0.0e+00 4 10 0 0 0 4 10 0 0 0 79
*VecAYPX 673 1.0 2.8062e+00 4.0 4.29e+08 4.0 0.0e+00
0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 213*
VecWAXPY 1334 1.0 4.8052e+00 2.4 2.84e+08 2.4 0.0e+00 0.0e+00
0.0e+00 1 7 0 0 0 1 7 0 0 0 240
VecAssemblyBegin 2 1.0 1.4091e-04 3.1 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 2 1.0 5.0068e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
*VecScatterBegin 1334 1.0 1.1666e-01 5.9 0.00e+00 0.0 2.7e+03
4.8e+03 0.0e+00 0 0100100 0 0 0100100 0 0*
VecScatterEnd 1334 1.0 5.2569e+01 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 10 0 0 0 0 10 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Matrix 6 6 283964900 0
Krylov Solver 3 3 8 0
Preconditioner 3 3 424 0
Index Set 8 8 12965152 0
Vec 17 17 34577080 0
Vec Scatter 1 1 0 0
========================================================================================================================
Average time to get PetscTime(): 8.10623e-07
Average time for MPI_Barrier(): 5.72205e-07
Average time for zero size MPI_Send(): 1.90735e-06
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Jan 8 22:22:08 2008
@
@
*1 processor Hypre
*
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------
./a.out on a atlas3-mp named atlas3-c45 with 1 processor, by g0306332
Wed Apr 16 08:45:38 2008
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007
HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
Max Max/Min Avg Total
Time (sec): 2.059e+01 1.00000 2.059e+01
Objects: 3.400e+01 1.00000 3.400e+01
Flops: 3.151e+08 1.00000 3.151e+08 3.151e+08
Flops/sec: 1.530e+07 1.00000 1.530e+07 1.530e+07
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 2.400e+01 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length
N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 2.0590e+01 100.0% 3.1512e+08 100.0% 0.000e+00
0.0% 0.000e+00 0.0% 2.400e+01 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops/sec: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all
processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message
lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was run without the PreLoadBegin() #
# #
# This code was run without the PreLoadBegin() #
# macros. To get timing results we always recommend #
# preloading. otherwise timing numbers may be #
# meaningless. #
##########################################################
Event Count Time (sec)
Flops/sec --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 12 1.0 2.6237e-01 1.0 4.24e+08 1.0 0.0e+00 0.0e+00
0.0e+00 1 35 0 0 0 1 35 0 0 0 424
MatSolve 7 1.0 4.5932e-01 1.0 2.23e+08 1.0 0.0e+00 0.0e+00
0.0e+00 2 33 0 0 0 2 33 0 0 0 223
MatLUFactorNum 1 1.0 1.2635e-01 1.0 1.36e+08 1.0 0.0e+00 0.0e+00
0.0e+00 1 5 0 0 0 1 5 0 0 0 136
MatILUFactorSym 1 1.0 1.3007e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 1 0 0 0 4 1 0 0 0 4 0
MatConvert 1 1.0 4.1277e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatAssemblyBegin 2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 1.3946e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
MatGetRow 432000 1.0 8.4685e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 2 1.0 3.0994e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.6376e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 8 0 0 0 0 8 0
MatZeroEntries 2 1.0 8.2422e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 6 1.0 1.0955e-01 1.0 3.31e+08 1.0 0.0e+00 0.0e+00
6.0e+00 1 12 0 0 25 1 12 0 0 25 331
KSPSetup 2 1.0 2.5418e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 5.9363e+00 1.0 5.31e+07 1.0 0.0e+00 0.0e+00
1.8e+01 29100 0 0 75 29100 0 0 75 53
PCSetUp 2 1.0 1.5691e+00 1.0 1.10e+07 1.0 0.0e+00 0.0e+00
5.0e+00 8 5 0 0 21 8 5 0 0 21 11
PCApply 14 1.0 3.7548e+00 1.0 2.73e+07 1.0 0.0e+00 0.0e+00
0.0e+00 18 33 0 0 0 18 33 0 0 0 27
VecMDot 6 1.0 7.7139e-02 1.0 2.35e+08 1.0 0.0e+00 0.0e+00
6.0e+00 0 6 0 0 25 0 6 0 0 25 235
VecNorm 14 1.0 9.9192e-02 1.0 1.83e+08 1.0 0.0e+00 0.0e+00
7.0e+00 0 6 0 0 29 0 6 0 0 29 183
VecScale 7 1.0 5.4052e-03 1.0 5.59e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 559
VecCopy 1 1.0 2.0301e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 9 1.0 1.1883e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 7 1.0 2.8702e-02 1.0 3.91e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 4 0 0 0 0 4 0 0 0 391
VecAYPX 6 1.0 2.8528e-02 1.0 3.63e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 3 0 0 0 0 3 0 0 0 363
VecMAXPY 7 1.0 4.1699e-02 1.0 5.59e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 7 0 0 0 0 7 0 0 0 559
VecAssemblyBegin 2 1.0 2.3842e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 25 0 0 0 0 25 0
VecAssemblyEnd 2 1.0 4.0531e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 7 1.0 1.3958e-02 1.0 6.50e+08 1.0 0.0e+00 0.0e+00
7.0e+00 0 3 0 0 29 0 3 0 0 29 650
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Matrix 3 3 267569524 0
Krylov Solver 2 2 17224 0
Preconditioner 2 2 440 0
Index Set 3 3 10369032 0
Vec 24 24 82961752 0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
OptionTable: -log_summary
Compiled without FORTRAN kernels
*2 processors Hypre*
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------
./a.out on a atlas3-mp named atlas3-c48 with 2 processors, by g0306332
Wed Apr 16 08:46:56 2008
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST 2007
HG revision: 414581156e67e55c761739b0deb119f7590d0f4b
Max Max/Min Avg Total
Time (sec): 9.614e+01 1.02903 9.478e+01
Objects: 4.100e+01 1.00000 4.100e+01
Flops: 2.778e+08 1.00000 2.778e+08 5.555e+08
Flops/sec: 2.973e+06 1.02903 2.931e+06 5.862e+06
MPI Messages: 7.000e+00 1.00000 7.000e+00 1.400e+01
MPI Message Lengths: 3.120e+04 1.00000 4.457e+03 6.240e+04
MPI Reductions: 1.650e+01 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length
N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 9.4784e+01 100.0% 5.5553e+08 100.0% 1.400e+01
100.0% 4.457e+03 100.0% 3.300e+01 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops/sec: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all
processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this
phase
%M - percent messages in this phase %L - percent message
lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was run without the PreLoadBegin() #
# macros. To get timing results we always recommend #
# preloading. otherwise timing numbers may be #
# meaningless. #
##########################################################
Event Count Time (sec)
Flops/sec --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
--- Event Stage 0: Main Stage
MatMult 12 1.0 4.5412e-01 2.0 4.34e+08 2.0 1.2e+01 4.8e+03
0.0e+00 0 36 86 92 0 0 36 86 92 0 438
MatSolve 7 1.0 5.0386e-01 1.1 2.28e+08 1.1 0.0e+00 0.0e+00
0.0e+00 1 37 0 0 0 1 37 0 0 0 407
MatLUFactorNum 1 1.0 9.5120e-01 1.6 2.98e+07 1.6 0.0e+00 0.0e+00
0.0e+00 1 6 0 0 0 1 6 0 0 0 36
MatILUFactorSym 1 1.0 1.1285e+01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 9 0 0 0 3 9 0 0 0 3 0
MatConvert 1 1.0 6.2023e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
*MatAssemblyBegin 2 1.0 3.1003e+01246.4 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+00 16 0 0 0 6 16 0 0 0 6 0*
MatAssemblyEnd 2 1.0 2.2413e+00 1.9 0.00e+00 0.0 2.0e+00 2.4e+03
7.0e+00 2 0 14 8 21 2 0 14 8 21 0
MatGetRow 216000 1.0 9.2643e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 3 1.0 5.9605e-06 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 2.4464e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 6 0 0 0 0 6 0
MatZeroEntries 2 1.0 6.1072e+00 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 5 0 0 0 0 5 0 0 0 0 0
KSPGMRESOrthog 6 1.0 4.4529e-02 1.3 5.26e+08 1.3 0.0e+00 0.0e+00
6.0e+00 0 7 0 0 18 0 7 0 0 18 815
KSPSetup 2 1.0 1.8315e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 2 0 0 0 0 2 0 0 0 0 0
KSPSolve 2 1.0 3.0572e+01 1.1 9.64e+06 1.1 1.2e+01 4.8e+03
1.8e+01 31100 86 92 55 31100 86 92 55 18
PCSetUp 2 1.0 2.0424e+01 1.3 1.07e+06 1.3 0.0e+00 0.0e+00
5.0e+00 19 6 0 0 15 19 6 0 0 15 2
PCApply 14 1.0 2.9443e+00 1.0 3.56e+07 1.0 0.0e+00 0.0e+00
0.0e+00 3 37 0 0 0 3 37 0 0 0 70
VecMDot 6 1.0 2.7561e-02 1.6 5.15e+08 1.6 0.0e+00 0.0e+00
6.0e+00 0 3 0 0 18 0 3 0 0 18 658
*VecNorm 14 1.0 1.4223e+00 5.1 5.45e+07 5.1 0.0e+00
0.0e+00 7.0e+00 1 5 0 0 21 1 5 0 0 21 21*
VecScale 7 1.0 1.8604e-02 1.0 8.25e+07 1.0 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 163
VecCopy 1 1.0 3.0069e-03 2.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 9 1.0 3.2693e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 7 1.0 3.0581e-02 1.1 3.98e+08 1.1 0.0e+00 0.0e+00
0.0e+00 0 4 0 0 0 0 4 0 0 0 706
*VecAYPX 6 1.0 4.4344e+00147.6 3.45e+08147.6 0.0e+00
0.0e+00 0.0e+00 2 4 0 0 0 2 4 0 0 0 5*
VecMAXPY 7 1.0 2.1892e-02 1.0 5.34e+08 1.0 0.0e+00 0.0e+00
0.0e+00 0 4 0 0 0 0 4 0 0 0 1066
VecAssemblyBegin 2 1.0 9.2602e-0412.5 0.00e+00 0.0 0.0e+00 0.0e+00
6.0e+00 0 0 0 0 18 0 0 0 0 18 0
VecAssemblyEnd 2 1.0 7.8678e-06 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 6 1.0 9.3222e-05 1.1 0.00e+00 0.0 1.2e+01 4.8e+03
0.0e+00 0 0 86 92 0 0 0 86 92 0 0
*VecScatterEnd 6 1.0 1.9959e-011404.6 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0*
VecNormalize 7 1.0 2.3088e-02 1.0 1.98e+08 1.0 0.0e+00 0.0e+00
7.0e+00 0 2 0 0 21 0 2 0 0 21 393
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
--- Event Stage 0: Main Stage
Matrix 5 5 267571932 0
Krylov Solver 2 2 17224 0
Preconditioner 2 2 440 0
Index Set 5 5 10372120 0
Vec 26 26 53592184 0
Vec Scatter 1 1 0 0
========================================================================================================================
Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 8.10623e-07
Average time for zero size MPI_Send(): 1.43051e-06
OptionTable: -log_summary
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8
Configure run at: Tue Jan 8 22:22:08 2008
Matthew Knepley wrote:
> The convergence here is jsut horrendous. Have you tried using LU to check
> your implementation? All the time is in the solve right now. I would first
> try a direct method (at least on a small problem) and then try to understand
> the convergence behavior. MUMPS can actually scale very well for big problems.
>
> Matt
>
>
>>>>>
>>>>
>>>
>>>
>>
>
>
>
>
More information about the petsc-users
mailing list