[petsc-users] reusing LU factorization?
Tabrez Ali
stali at geology.wisc.edu
Wed Jan 29 11:13:29 CST 2014
I am getting the opposite result, i.e., MUMPS becomes slower when using
ParMETIS for parallel ordering. What did I mess up? Is the problem too
small?
Case 1 took 24.731s
$ rm -f *vtk; time mpiexec -n 16 ./defmod -f point.inp -pc_type lu
-pc_factor_mat_solver_package mumps -mat_mumps_icntl_4 1 -log_summary >
1.txt
Case 2 with "-mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2" took 34.720s
$ rm -f *vtk; time mpiexec -n 16 ./defmod -f point.inp -pc_type lu
-pc_factor_mat_solver_package mumps -mat_mumps_icntl_4 1 -log_summary
-mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2 > 2.txt
Both 1.txt and 2.txt are attached.
Regards,
Tabrez
On 01/29/2014 09:18 AM, Hong Zhang wrote:
> MUMPS now supports parallel symbolic factorization. With petsc-3.4
> interface, you can use runtime option
>
> -mat_mumps_icntl_28 <1>: ICNTL(28): use 1 for sequential analysis
> and ictnl(7) ordering, or 2 for parallel analysis and ictnl(29) ordering
> -mat_mumps_icntl_29 <0>: ICNTL(29): parallel ordering 1 = ptscotch 2
> = parmetis
>
> e.g, '-mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2' activates parallel
> symbolic factorization with pametis for matrix ordering.
> Give it a try and let us know what you get.
>
> Hong
>
>
> On Tue, Jan 28, 2014 at 5:48 PM, Smith, Barry F. <bsmith at mcs.anl.gov
> <mailto:bsmith at mcs.anl.gov>> wrote:
>
>
> On Jan 28, 2014, at 5:39 PM, Matthew Knepley <knepley at gmail.com
> <mailto:knepley at gmail.com>> wrote:
>
> > On Tue, Jan 28, 2014 at 5:25 PM, Tabrez Ali
> <stali at geology.wisc.edu <mailto:stali at geology.wisc.edu>> wrote:
> > Hello
> >
> > This is my observation as well (with MUMPS). The first solve
> (after assembly which is super fast) takes a few mins (for ~1
> million unknowns on 12/24 cores) but from then on only a few
> seconds for each subsequent solve for each time step.
> >
> > Perhaps symbolic factorization in MUMPS is all serial?
> >
> > Yes, it is.
>
> I missed this. I was just assuming a PETSc LU. Yes, I have no
> idea of relative time of symbolic and numeric for those other
> packages.
>
> Barry
> >
> > Matt
> >
> > Like the OP I often do multiple runs on the same problem but I
> dont know if MUMPS or any other direct solver can save the
> symbolic factorization info to a file that perhaps can be utilized
> in subsequent reruns to avoid the costly "first solves".
> >
> > Tabrez
> >
> >
> > On 01/28/2014 04:04 PM, Barry Smith wrote:
> > On Jan 28, 2014, at 1:36 PM, David Liu<daveliu at mit.edu
> <mailto:daveliu at mit.edu>> wrote:
> >
> > Hi, I'm writing an application that solves a sparse matrix many
> times using Pastix. I notice that the first solves takes a very
> long time,
> > Is it the first "solve" or the first time you put values into
> that matrix that "takes a long time"? If you are not properly
> preallocating the matrix then the initial setting of values will
> be slow and waste memory. See
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html
> >
> > The symbolic factorization is usually much faster than a
> numeric factorization so that is not the cause of the slow "first
> solve".
> >
> > Barry
> >
> >
> >
> > while the subsequent solves are very fast. I don't fully
> understand what's going on behind the curtains, but I'm guessing
> it's because the very first solve has to read in the non-zero
> structure for the LU factorization, while the subsequent solves
> are faster because the nonzero structure doesn't change.
> >
> > My question is, is there any way to save the information
> obtained from the very first solve, so that the next time I run
> the application, the very first solve can be fast too (provided
> that I still have the same nonzero structure)?
> >
> >
> > --
> > No one trusts a model except the one who wrote it; Everyone
> trusts an observation except the one who made it- Harlow Shapley
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to
> which their experiments lead.
> > -- Norbert Wiener
>
>
--
No one trusts a model except the one who wrote it; Everyone trusts an observation except the one who made it- Harlow Shapley
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140129/fc117a6f/attachment-0001.html>
-------------- next part --------------
Reading input ...
Partitioning mesh ...
Reading mesh data ...
Forming [K] ...
Forming RHS ...
Setting up solver ...
Solving ...
Entering DMUMPS driver with JOB, N, NZ = 1 107811 0
DMUMPS 4.10.0
L U Solver for unsymmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
** Max-trans not allowed because matrix is distributed
... Structural symmetry (in percent)= 100
Density: NBdense, Average, Median = 0 75 80
Ordering based on METIS
A root of estimated size 4851 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 201990159
-- (3) Storage of factors (REAL, estimated) = 202148895
-- (4) Storage of factors (INT , estimated) = 2404034
-- (5) Maximum frontal size (estimated) = 4851
-- (6) Number of nodes in the tree = 3963
-- (32) Type of analysis effectively used = 1
-- (7) Ordering option effectively used = 5
ICNTL(6) Maximum transversal option = 0
ICNTL(7) Pivot order option = 7
Percentage of memory relaxation (effective) = 35
Number of level 2 nodes = 24
Number of split nodes = 2
RINFOG(1) Operations during elimination (estim)= 4.395D+11
Distributed matrix entry format (ICNTL(18)) = 3
** Rank of proc needing largest memory in IC facto : 11
** Estimated corresponding MBYTES for IC facto : 325
** Estimated avg. MBYTES per work. proc at facto (IC) : 286
** TOTAL space in MBYTES for IC factorization : 4584
** Rank of proc needing largest memory for OOC facto : 10
** Estimated corresponding MBYTES for OOC facto : 254
** Estimated avg. MBYTES per work. proc at facto (OOC) : 214
** TOTAL space in MBYTES for OOC factorization : 3424
Entering DMUMPS driver with JOB, N, NZ = 2 107811 8214057
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
NUMBER OF WORKING PROCESSES = 16
OUT-OF-CORE OPTION (ICNTL(22)) = 0
REAL SPACE FOR FACTORS = 202148895
INTEGER SPACE FOR FACTORS = 2404034
MAXIMUM FRONTAL SIZE (ESTIMATED) = 4851
NUMBER OF NODES IN THE TREE = 3963
Convergence error after scaling for ONE-NORM (option 7/8) = 0.63D-01
Maximum effective relaxed size of S = 28811533
Average effective relaxed size of S = 23980595
REDISTRIB: TOTAL DATA LOCAL/SENT = 285226 11316386
GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.1374
** Memory relaxation parameter ( ICNTL(14) ) : 35
** Rank of processor needing largest memory in facto : 11
** Space in MBYTES used by this processor for facto : 325
** Avg. Space in MBYTES per working proc during facto : 286
ELAPSED TIME FOR FACTORIZATION = 18.6224
Maximum effective space used in S (KEEP8(67) = 19890302
Average effective space used in S (KEEP8(67) = 17657389
** EFF Min: Rank of processor needing largest memory : 11
** EFF Min: Space in MBYTES used by this processor : 254
** EFF Min: Avg. Space in MBYTES per working proc : 235
GLOBAL STATISTICS
RINFOG(2) OPERATIONS IN NODE ASSEMBLY = 5.463D+08
------(3) OPERATIONS IN NODE ELIMINATION= 4.395D+11
INFOG (9) REAL SPACE FOR FACTORS = 201990159
INFOG(10) INTEGER SPACE FOR FACTORS = 2402463
INFOG(11) MAXIMUM FRONT SIZE = 4851
INFOG(29) NUMBER OF ENTRIES IN FACTORS = 178457958
INFOG(12) NB OF OFF DIAGONAL PIVOTS = 0
INFOG(13) NUMBER OF DELAYED PIVOTS = 0
INFOG(14) NUMBER OF MEMORY COMPRESS = 16
KEEP8(108) Extra copies IP stacking = 0
Entering DMUMPS driver with JOB, N, NZ = 3 107811 8214057
****** SOLVE & CHECK STEP ********
STATISTICS PRIOR SOLVE PHASE ...........
NUMBER OF RIGHT-HAND-SIDES = 1
BLOCKING FACTOR FOR MULTIPLE RHS = 1
ICNTL (9) = 1
--- (10) = 0
--- (11) = 0
--- (20) = 0
--- (21) = 1
--- (30) = 0
** Rank of processor needing largest memory in solve : 11
** Space in MBYTES used by this processor for solve : 238
** Avg. Space in MBYTES per working proc during solve : 199
Entering DMUMPS driver with JOB, N, NZ = 3 107811 8214057
****** SOLVE & CHECK STEP ********
STATISTICS PRIOR SOLVE PHASE ...........
NUMBER OF RIGHT-HAND-SIDES = 1
BLOCKING FACTOR FOR MULTIPLE RHS = 1
ICNTL (9) = 1
--- (10) = 0
--- (11) = 0
--- (20) = 0
--- (21) = 1
--- (30) = 0
** Rank of processor needing largest memory in solve : 11
** Space in MBYTES used by this processor for solve : 238
** Avg. Space in MBYTES per working proc during solve : 199
Recovering stress ...
Entering DMUMPS driver with JOB, N, NZ = -2 107811 8214057
Cleaning up ...
Finished
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./defmod on a arch-linux2-c-opt named aci-048.chtc.wisc.edu with 16 processors, by stali2 Wed Jan 29 10:53:09 2014
Using Petsc Release Version 3.4.1, Jun, 10, 2013
Max Max/Min Avg Total
Time (sec): 2.091e+01 1.00000 2.091e+01
Objects: 3.900e+01 1.00000 3.900e+01
Flops: 1.144e+06 1.05187 1.114e+06 1.783e+07
Flops/sec: 5.471e+04 1.05187 5.330e+04 8.528e+05
MPI Messages: 1.300e+02 1.83099 1.009e+02 1.614e+03
MPI Message Lengths: 2.359e+06 1.90899 1.769e+04 2.855e+07
MPI Reductions: 5.800e+01 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.0907e+01 100.0% 1.7830e+07 100.0% 1.614e+03 100.0% 1.769e+04 100.0% 5.700e+01 98.3%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 1 1.0 1.1277e-04 7.1 1.35e+04 1.0 0.0e+00 0.0e+00 1.0e+00 0 1 0 0 2 0 1 0 0 2 1912
VecNorm 2 1.0 1.0610e-04 1.8 2.70e+04 1.0 0.0e+00 0.0e+00 2.0e+00 0 2 0 0 3 0 2 0 0 4 4065
VecScale 2 1.0 2.1935e-05 2.2 1.35e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 9830
VecCopy 1 1.0 2.7180e-05 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 5 1.0 3.4642e-0412.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1 1.0 1.0967e-05 1.2 1.35e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 19661
VecMAXPY 2 1.0 2.1219e-05 1.1 2.70e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 20323
VecAssemblyBegin 1 1.0 2.7704e-04 8.9 0.00e+00 0.0 2.0e+00 1.8e+01 3.0e+00 0 0 0 0 5 0 0 0 0 5 0
VecAssemblyEnd 1 1.0 2.8610e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 6 1.0 6.3896e-04 3.8 0.00e+00 0.0 5.2e+02 6.1e+03 2.0e+00 0 0 32 11 3 0 0 32 11 4 0
VecScatterEnd 4 1.0 2.9111e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 2 1.0 1.3494e-04 1.8 4.04e+04 1.0 0.0e+00 0.0e+00 2.0e+00 0 4 0 0 3 0 4 0 0 4 4794
MatMult 1 1.0 1.6370e-03 1.1 1.05e+06 1.1 1.3e+02 3.1e+03 0.0e+00 0 92 8 1 0 0 92 8 1 0 9970
MatSolve 2 1.0 9.7397e-02 1.0 0.00e+00 0.0 5.4e+02 5.5e+03 6.0e+00 0 0 34 10 10 0 0 34 10 11 0
MatLUFactorSym 1 1.0 1.2882e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 6 0 0 0 12 6 0 0 0 12 0
MatLUFactorNum 1 1.0 1.8813e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 90 0 0 0 2 90 0 0 0 2 0
MatAssemblyBegin 1 1.0 3.2401e-02 8.6 0.00e+00 0.0 3.4e+02 7.3e+04 2.0e+00 0 0 21 86 3 0 0 21 86 4 0
MatAssemblyEnd 1 1.0 1.7039e-02 1.2 0.00e+00 0.0 2.6e+02 7.7e+02 9.0e+00 0 0 16 1 16 0 0 16 1 16 0
MatGetRowIJ 1 1.0 3.0994e-06 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.5593e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 7 0 0 0 0 7 0
KSPGMRESOrthog 1 1.0 1.3494e-04 2.5 2.70e+04 1.0 0.0e+00 0.0e+00 1.0e+00 0 2 0 0 2 0 2 0 0 2 3196
KSPSetUp 1 1.0 9.6798e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.0205e+01 1.0 1.14e+06 1.1 6.8e+02 5.0e+03 2.9e+01 97100 42 12 50 97100 42 12 51 1
PCSetUp 1 1.0 2.0105e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 96 0 0 0 34 96 0 0 0 35 0
PCApply 2 1.0 9.7407e-02 1.0 0.00e+00 0.0 5.4e+02 5.5e+03 6.0e+00 0 0 34 10 10 0 0 34 10 11 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 13 13 1395792 0
Vector Scatter 4 4 3792 0
Matrix 6 6 13990176 0
Index Set 13 13 188444 0
Krylov Solver 1 1 18360 0
Preconditioner 1 1 1096 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 2.00272e-06
Average time for zero size MPI_Send(): 1.2517e-06
#PETSc Option Table entries:
-f point.inp
-log_summary
-mat_mumps_icntl_4 1
-pc_factor_mat_solver_package mumps
-pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Jul 17 17:20:12 2013
Configure options: --with-mpi-dir=/usr/mpi/gcc/mvapich2-1.9 --with-cmake=1 --download-cmake=1 --with-metis==1 --download-metis=1 --with-parmetis=1 --download-parmetis=1 --with-scalapack=1 --download-scalapack=1 --with-mumps=1 --download-mumps=1 --with-debugging=0 --with-shared-libraries=0 --CFLAGS=-O2 --FFLAGS=-O2
-----------------------------------------
Libraries compiled on Wed Jul 17 17:20:12 2013 on aci-service-1.chtc.wisc.edu
Machine characteristics: Linux-2.6.32-358.6.2.el6.x86_64-x86_64-with-redhat-6.3-Carbon
Using PETSc directory: /home/stali2/petsc-3.4.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: /usr/mpi/gcc/mvapich2-1.9/bin/mpicc -O2 -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /usr/mpi/gcc/mvapich2-1.9/bin/mpif90 -O2 -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/stali2/petsc-3.4.1/arch-linux2-c-opt/include -I/home/stali2/petsc-3.4.1/include -I/home/stali2/petsc-3.4.1/include -I/home/stali2/petsc-3.4.1/arch-linux2-c-opt/include -I/usr/mpi/gcc/mvapich2-1.9/include
-----------------------------------------
Using C linker: /usr/mpi/gcc/mvapich2-1.9/bin/mpicc
Using Fortran linker: /usr/mpi/gcc/mvapich2-1.9/bin/mpif90
Using libraries: -Wl,-rpath,/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -L/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -L/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lX11 -lparmetis -lmetis -lpthread -L/usr/mpi/gcc/mvapich2-1.9/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lmpichf90 -lgfortran -lm -lm -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrdmacm -libumad -libverbs -lrt -lnuma -lpthread -lgcc_s -ldl
-----------------------------------------
-------------- next part --------------
Reading input ...
Partitioning mesh ...
Reading mesh data ...
Forming [K] ...
Forming RHS ...
Setting up solver ...
Solving ...
Entering DMUMPS driver with JOB, N, NZ = 1 107811 0
DMUMPS 4.10.0
L U Solver for unsymmetric matrices
Type of parallelism: Working host
****** ANALYSIS STEP ********
** Max-trans not allowed because matrix is distributed
Using ParMETIS for parallel ordering.
Structual symmetry is:100%
A root of estimated size 6878 has been selected for Scalapack.
Leaving analysis phase with ...
INFOG(1) = 0
INFOG(2) = 0
-- (20) Number of entries in factors (estim.) = 221871415
-- (3) Storage of factors (REAL, estimated) = 221946297
-- (4) Storage of factors (INT , estimated) = 2296827
-- (5) Maximum frontal size (estimated) = 6878
-- (6) Number of nodes in the tree = 3695
-- (32) Type of analysis effectively used = 2
-- (7) Ordering option effectively used = 2
ICNTL(6) Maximum transversal option = 0
ICNTL(7) Pivot order option = 7
Percentage of memory relaxation (effective) = 35
Number of level 2 nodes = 24
Number of split nodes = 0
RINFOG(1) Operations during elimination (estim)= 5.795D+11
Distributed matrix entry format (ICNTL(18)) = 3
** Rank of proc needing largest memory in IC facto : 0
** Estimated corresponding MBYTES for IC facto : 407
** Estimated avg. MBYTES per work. proc at facto (IC) : 323
** TOTAL space in MBYTES for IC factorization : 5170
** Rank of proc needing largest memory for OOC facto : 0
** Estimated corresponding MBYTES for OOC facto : 291
** Estimated avg. MBYTES per work. proc at facto (OOC) : 247
** TOTAL space in MBYTES for OOC factorization : 3955
Entering DMUMPS driver with JOB, N, NZ = 2 107811 8214057
****** FACTORIZATION STEP ********
GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
NUMBER OF WORKING PROCESSES = 16
OUT-OF-CORE OPTION (ICNTL(22)) = 0
REAL SPACE FOR FACTORS = 221946297
INTEGER SPACE FOR FACTORS = 2296827
MAXIMUM FRONTAL SIZE (ESTIMATED) = 6878
NUMBER OF NODES IN THE TREE = 3695
Convergence error after scaling for ONE-NORM (option 7/8) = 0.63D-01
Maximum effective relaxed size of S = 38287711
Average effective relaxed size of S = 28596634
REDISTRIB: TOTAL DATA LOCAL/SENT = 383440 11281292
GLOBAL TIME FOR MATRIX DISTRIBUTION = 0.1332
** Memory relaxation parameter ( ICNTL(14) ) : 35
** Rank of processor needing largest memory in facto : 0
** Space in MBYTES used by this processor for facto : 407
** Avg. Space in MBYTES per working proc during facto : 323
ELAPSED TIME FOR FACTORIZATION = 21.6514
Maximum effective space used in S (KEEP8(67) = 32912724
Average effective space used in S (KEEP8(67) = 21712976
** EFF Min: Rank of processor needing largest memory : 0
** EFF Min: Space in MBYTES used by this processor : 364
** EFF Min: Avg. Space in MBYTES per working proc : 268
GLOBAL STATISTICS
RINFOG(2) OPERATIONS IN NODE ASSEMBLY = 5.123D+08
------(3) OPERATIONS IN NODE ELIMINATION= 5.795D+11
INFOG (9) REAL SPACE FOR FACTORS = 221871415
INFOG(10) INTEGER SPACE FOR FACTORS = 2294033
INFOG(11) MAXIMUM FRONT SIZE = 6878
INFOG(29) NUMBER OF ENTRIES IN FACTORS = 174564531
INFOG(12) NB OF OFF DIAGONAL PIVOTS = 0
INFOG(13) NUMBER OF DELAYED PIVOTS = 0
INFOG(14) NUMBER OF MEMORY COMPRESS = 15
KEEP8(108) Extra copies IP stacking = 0
Entering DMUMPS driver with JOB, N, NZ = 3 107811 8214057
****** SOLVE & CHECK STEP ********
STATISTICS PRIOR SOLVE PHASE ...........
NUMBER OF RIGHT-HAND-SIDES = 1
BLOCKING FACTOR FOR MULTIPLE RHS = 1
ICNTL (9) = 1
--- (10) = 0
--- (11) = 0
--- (20) = 0
--- (21) = 1
--- (30) = 0
** Rank of processor needing largest memory in solve : 0
** Space in MBYTES used by this processor for solve : 325
** Avg. Space in MBYTES per working proc during solve : 236
Entering DMUMPS driver with JOB, N, NZ = 3 107811 8214057
****** SOLVE & CHECK STEP ********
STATISTICS PRIOR SOLVE PHASE ...........
NUMBER OF RIGHT-HAND-SIDES = 1
BLOCKING FACTOR FOR MULTIPLE RHS = 1
ICNTL (9) = 1
--- (10) = 0
--- (11) = 0
--- (20) = 0
--- (21) = 1
--- (30) = 0
** Rank of processor needing largest memory in solve : 0
** Space in MBYTES used by this processor for solve : 325
** Avg. Space in MBYTES per working proc during solve : 236
Recovering stress ...
Entering DMUMPS driver with JOB, N, NZ = -2 107811 8214057
Cleaning up ...
Finished
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./defmod on a arch-linux2-c-opt named aci-048.chtc.wisc.edu with 16 processors, by stali2 Wed Jan 29 10:55:09 2014
Using Petsc Release Version 3.4.1, Jun, 10, 2013
Max Max/Min Avg Total
Time (sec): 2.455e+01 1.00000 2.455e+01
Objects: 3.900e+01 1.00000 3.900e+01
Flops: 1.144e+06 1.05187 1.114e+06 1.783e+07
Flops/sec: 4.660e+04 1.05187 4.540e+04 7.264e+05
MPI Messages: 1.240e+02 2.03279 9.762e+01 1.562e+03
MPI Message Lengths: 2.372e+06 1.81175 1.828e+04 2.855e+07
MPI Reductions: 5.800e+01 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.4547e+01 100.0% 1.7830e+07 100.0% 1.562e+03 100.0% 1.828e+04 100.0% 5.700e+01 98.3%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 1 1.0 8.5831e-05 5.1 1.35e+04 1.0 0.0e+00 0.0e+00 1.0e+00 0 1 0 0 2 0 1 0 0 2 2512
VecNorm 2 1.0 1.5712e-04 2.7 2.70e+04 1.0 0.0e+00 0.0e+00 2.0e+00 0 2 0 0 3 0 2 0 0 4 2745
VecScale 2 1.0 1.4067e-05 1.4 1.35e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 15329
VecCopy 1 1.0 3.6001e-05 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 5 1.0 3.3236e-0411.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1 1.0 1.2159e-05 1.3 1.35e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 17733
VecMAXPY 2 1.0 1.9789e-05 1.1 2.70e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 21792
VecAssemblyBegin 1 1.0 3.8505e-0412.8 0.00e+00 0.0 2.0e+00 1.8e+01 3.0e+00 0 0 0 0 5 0 0 0 0 5 0
VecAssemblyEnd 1 1.0 2.8610e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 6 1.0 5.0306e-04 3.8 0.00e+00 0.0 4.9e+02 6.5e+03 2.0e+00 0 0 31 11 3 0 0 31 11 4 0
VecScatterEnd 4 1.0 2.8682e-04 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 2 1.0 1.7118e-04 2.3 4.04e+04 1.0 0.0e+00 0.0e+00 2.0e+00 0 4 0 0 3 0 4 0 0 4 3779
MatMult 1 1.0 1.6229e-03 1.1 1.05e+06 1.1 1.3e+02 3.1e+03 0.0e+00 0 92 8 1 0 0 92 8 1 0 10056
MatSolve 2 1.0 1.0811e-01 1.0 0.00e+00 0.0 4.9e+02 6.1e+03 6.0e+00 0 0 31 10 10 0 0 31 10 11 0
MatLUFactorSym 1 1.0 1.8920e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00 8 0 0 0 12 8 0 0 0 12 0
MatLUFactorNum 1 1.0 2.1836e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 89 0 0 0 2 89 0 0 0 2 0
MatAssemblyBegin 1 1.0 3.2708e-02 7.5 0.00e+00 0.0 3.4e+02 7.3e+04 2.0e+00 0 0 22 86 3 0 0 22 86 4 0
MatAssemblyEnd 1 1.0 1.6633e-02 1.2 0.00e+00 0.0 2.6e+02 7.7e+02 9.0e+00 0 0 17 1 16 0 0 17 1 16 0
MatGetRowIJ 1 1.0 8.1062e-06 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.5593e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 7 0 0 0 0 7 0
KSPGMRESOrthog 1 1.0 1.3590e-04 2.4 2.70e+04 1.0 0.0e+00 0.0e+00 1.0e+00 0 2 0 0 2 0 2 0 0 2 3173
KSPSetUp 1 1.0 9.8944e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.3841e+01 1.0 1.14e+06 1.1 6.2e+02 5.4e+03 2.9e+01 97100 40 12 50 97100 40 12 51 1
PCSetUp 1 1.0 2.3731e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 97 0 0 0 34 97 0 0 0 35 0
PCApply 2 1.0 1.0812e-01 1.0 0.00e+00 0.0 4.9e+02 6.1e+03 6.0e+00 0 0 31 10 10 0 0 31 10 11 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 13 13 1395792 0
Vector Scatter 4 4 3792 0
Matrix 6 6 13990176 0
Index Set 13 13 218932 0
Krylov Solver 1 1 18360 0
Preconditioner 1 1 1096 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 2.19345e-06
Average time for zero size MPI_Send(): 1.2517e-06
#PETSc Option Table entries:
-f point.inp
-log_summary
-mat_mumps_icntl_28 2
-mat_mumps_icntl_29 2
-mat_mumps_icntl_4 1
-pc_factor_mat_solver_package mumps
-pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Jul 17 17:20:12 2013
Configure options: --with-mpi-dir=/usr/mpi/gcc/mvapich2-1.9 --with-cmake=1 --download-cmake=1 --with-metis==1 --download-metis=1 --with-parmetis=1 --download-parmetis=1 --with-scalapack=1 --download-scalapack=1 --with-mumps=1 --download-mumps=1 --with-debugging=0 --with-shared-libraries=0 --CFLAGS=-O2 --FFLAGS=-O2
-----------------------------------------
Libraries compiled on Wed Jul 17 17:20:12 2013 on aci-service-1.chtc.wisc.edu
Machine characteristics: Linux-2.6.32-358.6.2.el6.x86_64-x86_64-with-redhat-6.3-Carbon
Using PETSc directory: /home/stali2/petsc-3.4.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: /usr/mpi/gcc/mvapich2-1.9/bin/mpicc -O2 -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /usr/mpi/gcc/mvapich2-1.9/bin/mpif90 -O2 -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/stali2/petsc-3.4.1/arch-linux2-c-opt/include -I/home/stali2/petsc-3.4.1/include -I/home/stali2/petsc-3.4.1/include -I/home/stali2/petsc-3.4.1/arch-linux2-c-opt/include -I/usr/mpi/gcc/mvapich2-1.9/include
-----------------------------------------
Using C linker: /usr/mpi/gcc/mvapich2-1.9/bin/mpicc
Using Fortran linker: /usr/mpi/gcc/mvapich2-1.9/bin/mpif90
Using libraries: -Wl,-rpath,/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -L/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -L/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lX11 -lparmetis -lmetis -lpthread -L/usr/mpi/gcc/mvapich2-1.9/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lmpichf90 -lgfortran -lm -lm -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrdmacm -libumad -libverbs -lrt -lnuma -lpthread -lgcc_s -ldl
-----------------------------------------
More information about the petsc-users
mailing list