[petsc-users] reusing LU factorization?

Tabrez Ali stali at geology.wisc.edu
Wed Jan 29 11:13:29 CST 2014


I am getting the opposite result, i.e., MUMPS becomes slower when using 
ParMETIS for parallel ordering. What did I mess up? Is the problem too 
small?


Case 1 took 24.731s

$ rm -f *vtk; time mpiexec -n 16 ./defmod -f point.inp -pc_type lu 
-pc_factor_mat_solver_package mumps -mat_mumps_icntl_4 1 -log_summary > 
1.txt


Case 2 with "-mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2" took 34.720s

$ rm -f *vtk; time mpiexec -n 16 ./defmod -f point.inp -pc_type lu 
-pc_factor_mat_solver_package mumps -mat_mumps_icntl_4 1 -log_summary 
-mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2 > 2.txt


Both 1.txt and 2.txt are attached.

Regards,

Tabrez

On 01/29/2014 09:18 AM, Hong Zhang wrote:
> MUMPS now supports parallel symbolic factorization. With petsc-3.4 
> interface, you can use runtime option
>
>   -mat_mumps_icntl_28 <1>: ICNTL(28): use 1 for sequential analysis 
> and ictnl(7) ordering, or 2 for parallel analysis and ictnl(29) ordering
>   -mat_mumps_icntl_29 <0>: ICNTL(29): parallel ordering 1 = ptscotch 2 
> = parmetis
>
> e.g, '-mat_mumps_icntl_28 2 -mat_mumps_icntl_29 2' activates parallel 
> symbolic factorization with pametis for matrix ordering.
> Give it a try and let us know what you get.
>
> Hong
>
>
> On Tue, Jan 28, 2014 at 5:48 PM, Smith, Barry F. <bsmith at mcs.anl.gov 
> <mailto:bsmith at mcs.anl.gov>> wrote:
>
>
>     On Jan 28, 2014, at 5:39 PM, Matthew Knepley <knepley at gmail.com
>     <mailto:knepley at gmail.com>> wrote:
>
>     > On Tue, Jan 28, 2014 at 5:25 PM, Tabrez Ali
>     <stali at geology.wisc.edu <mailto:stali at geology.wisc.edu>> wrote:
>     > Hello
>     >
>     > This is my observation as well (with MUMPS). The first solve
>     (after assembly which is super fast) takes a few mins (for ~1
>     million unknowns on 12/24 cores) but from then on only a few
>     seconds for each subsequent solve for each time step.
>     >
>     > Perhaps symbolic factorization in MUMPS is all serial?
>     >
>     > Yes, it is.
>
>        I missed this. I was just assuming a PETSc LU. Yes, I have no
>     idea of relative time of symbolic and numeric for those other
>     packages.
>
>       Barry
>     >
>     >   Matt
>     >
>     > Like the OP I often do multiple runs on the same problem but I
>     dont know if MUMPS or any other direct solver can save the
>     symbolic factorization info to a file that perhaps can be utilized
>     in subsequent reruns to avoid the costly "first solves".
>     >
>     > Tabrez
>     >
>     >
>     > On 01/28/2014 04:04 PM, Barry Smith wrote:
>     > On Jan 28, 2014, at 1:36 PM, David Liu<daveliu at mit.edu
>     <mailto:daveliu at mit.edu>>  wrote:
>     >
>     > Hi, I'm writing an application that solves a sparse matrix many
>     times using Pastix. I notice that the first solves takes a very
>     long time,
>     >    Is it the first "solve" or the first time you put values into
>     that matrix that "takes a long time"? If you are not properly
>     preallocating the matrix then the initial setting of values will
>     be slow and waste memory.  See
>     http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html
>     >
>     >    The symbolic factorization is usually much faster than a
>     numeric factorization so that is not the cause of the slow "first
>     solve".
>     >
>     >     Barry
>     >
>     >
>     >
>     > while the subsequent solves are very fast. I don't fully
>     understand what's going on behind the curtains, but I'm guessing
>     it's because the very first solve has to read in the non-zero
>     structure for the LU factorization, while the subsequent solves
>     are faster because the nonzero structure doesn't change.
>     >
>     > My question is, is there any way to save the information
>     obtained from the very first solve, so that the next time I run
>     the application, the very first solve can be fast too (provided
>     that I still have the same nonzero structure)?
>     >
>     >
>     > --
>     > No one trusts a model except the one who wrote it; Everyone
>     trusts an observation except the one who made it- Harlow Shapley
>     >
>     >
>     >
>     >
>     > --
>     > What most experimenters take for granted before they begin their
>     experiments is infinitely more interesting than any results to
>     which their experiments lead.
>     > -- Norbert Wiener
>
>


-- 
No one trusts a model except the one who wrote it; Everyone trusts an observation except the one who made it- Harlow Shapley

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140129/fc117a6f/attachment-0001.html>
-------------- next part --------------
 Reading input ...
 Partitioning mesh ...
 Reading mesh data ...
 Forming [K] ...
 Forming RHS ...
 Setting up solver ...
 Solving ...
Entering DMUMPS driver with JOB, N, NZ =   1      107811              0

 DMUMPS 4.10.0        
L U Solver for unsymmetric matrices
Type of parallelism: Working host

 ****** ANALYSIS STEP ********

 ** Max-trans not allowed because matrix is distributed
 ... Structural symmetry (in percent)=  100
 Density: NBdense, Average, Median   =    0   75   80
 Ordering based on METIS 
 A root of estimated size         4851  has been selected for Scalapack.

Leaving analysis phase with  ...
INFOG(1)                                       =               0
INFOG(2)                                       =               0
 -- (20) Number of entries in factors (estim.) =       201990159
 --  (3) Storage of factors  (REAL, estimated) =       202148895
 --  (4) Storage of factors  (INT , estimated) =         2404034
 --  (5) Maximum frontal size      (estimated) =            4851
 --  (6) Number of nodes in the tree           =            3963
 -- (32) Type of analysis effectively used     =               1
 --  (7) Ordering option effectively used      =               5
ICNTL(6) Maximum transversal option            =               0
ICNTL(7) Pivot order option                    =               7
Percentage of memory relaxation (effective)    =              35
Number of level 2 nodes                        =              24
Number of split nodes                          =               2
RINFOG(1) Operations during elimination (estim)=   4.395D+11
Distributed matrix entry format (ICNTL(18))    =               3
 ** Rank of proc needing largest memory in IC facto        :        11
 ** Estimated corresponding MBYTES for IC facto            :       325
 ** Estimated avg. MBYTES per work. proc at facto (IC)     :       286
 ** TOTAL     space in MBYTES for IC factorization         :      4584
 ** Rank of proc needing largest memory for OOC facto      :        10
 ** Estimated corresponding MBYTES for OOC facto           :       254
 ** Estimated avg. MBYTES per work. proc at facto (OOC)    :       214
 ** TOTAL     space in MBYTES for OOC factorization        :      3424
Entering DMUMPS driver with JOB, N, NZ =   2      107811        8214057

 ****** FACTORIZATION STEP ********


 GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
 NUMBER OF WORKING PROCESSES              =          16
 OUT-OF-CORE OPTION (ICNTL(22))           =           0
 REAL SPACE FOR FACTORS                   =   202148895
 INTEGER SPACE FOR FACTORS                =     2404034
 MAXIMUM FRONTAL SIZE (ESTIMATED)         =        4851
 NUMBER OF NODES IN THE TREE              =        3963
 Convergence error after scaling for ONE-NORM (option 7/8)   = 0.63D-01
 Maximum effective relaxed size of S              =    28811533
 Average effective relaxed size of S              =    23980595

 REDISTRIB: TOTAL DATA LOCAL/SENT         =      285226    11316386
 GLOBAL TIME FOR MATRIX DISTRIBUTION       =      0.1374
 ** Memory relaxation parameter ( ICNTL(14)  )            :        35
 ** Rank of processor needing largest memory in facto     :        11
 ** Space in MBYTES used by this processor for facto      :       325
 ** Avg. Space in MBYTES per working proc during facto    :       286

 ELAPSED TIME FOR FACTORIZATION           =     18.6224
 Maximum effective space used in S   (KEEP8(67)   =    19890302
 Average effective space used in S   (KEEP8(67)   =    17657389
 ** EFF Min: Rank of processor needing largest memory :        11
 ** EFF Min: Space in MBYTES used by this processor   :       254
 ** EFF Min: Avg. Space in MBYTES per working proc    :       235

 GLOBAL STATISTICS 
 RINFOG(2)  OPERATIONS IN NODE ASSEMBLY   = 5.463D+08
 ------(3)  OPERATIONS IN NODE ELIMINATION= 4.395D+11
 INFOG (9)  REAL SPACE FOR FACTORS        =   201990159
 INFOG(10)  INTEGER SPACE FOR FACTORS     =     2402463
 INFOG(11)  MAXIMUM FRONT SIZE            =        4851
 INFOG(29)  NUMBER OF ENTRIES IN FACTORS  =   178457958
 INFOG(12) NB OF OFF DIAGONAL PIVOTS      =           0
 INFOG(13)  NUMBER OF DELAYED PIVOTS      =           0
 INFOG(14)  NUMBER OF MEMORY COMPRESS     =          16
 KEEP8(108) Extra copies IP stacking      =           0
Entering DMUMPS driver with JOB, N, NZ =   3      107811        8214057


 ****** SOLVE & CHECK STEP ********


 STATISTICS PRIOR SOLVE PHASE     ...........
 NUMBER OF RIGHT-HAND-SIDES                    =           1
 BLOCKING FACTOR FOR MULTIPLE RHS              =           1
 ICNTL (9)                                     =           1
  --- (10)                                     =           0
  --- (11)                                     =           0
  --- (20)                                     =           0
  --- (21)                                     =           1
  --- (30)                                     =           0
 ** Rank of processor needing largest memory in solve     :        11
 ** Space in MBYTES used by this processor for solve      :       238
 ** Avg. Space in MBYTES per working proc during solve    :       199
Entering DMUMPS driver with JOB, N, NZ =   3      107811        8214057


 ****** SOLVE & CHECK STEP ********


 STATISTICS PRIOR SOLVE PHASE     ...........
 NUMBER OF RIGHT-HAND-SIDES                    =           1
 BLOCKING FACTOR FOR MULTIPLE RHS              =           1
 ICNTL (9)                                     =           1
  --- (10)                                     =           0
  --- (11)                                     =           0
  --- (20)                                     =           0
  --- (21)                                     =           1
  --- (30)                                     =           0
 ** Rank of processor needing largest memory in solve     :        11
 ** Space in MBYTES used by this processor for solve      :       238
 ** Avg. Space in MBYTES per working proc during solve    :       199
 Recovering stress ...
Entering DMUMPS driver with JOB, N, NZ =  -2      107811        8214057
 Cleaning up ...
 Finished
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./defmod on a arch-linux2-c-opt named aci-048.chtc.wisc.edu with 16 processors, by stali2 Wed Jan 29 10:53:09 2014
Using Petsc Release Version 3.4.1, Jun, 10, 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.091e+01      1.00000   2.091e+01
Objects:              3.900e+01      1.00000   3.900e+01
Flops:                1.144e+06      1.05187   1.114e+06  1.783e+07
Flops/sec:            5.471e+04      1.05187   5.330e+04  8.528e+05
MPI Messages:         1.300e+02      1.83099   1.009e+02  1.614e+03
MPI Message Lengths:  2.359e+06      1.90899   1.769e+04  2.855e+07
MPI Reductions:       5.800e+01      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.0907e+01 100.0%  1.7830e+07 100.0%  1.614e+03 100.0%  1.769e+04      100.0%  5.700e+01  98.3% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMDot                1 1.0 1.1277e-04 7.1 1.35e+04 1.0 0.0e+00 0.0e+00 1.0e+00  0  1  0  0  2   0  1  0  0  2  1912
VecNorm                2 1.0 1.0610e-04 1.8 2.70e+04 1.0 0.0e+00 0.0e+00 2.0e+00  0  2  0  0  3   0  2  0  0  4  4065
VecScale               2 1.0 2.1935e-05 2.2 1.35e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0  9830
VecCopy                1 1.0 2.7180e-05 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                 5 1.0 3.4642e-0412.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                1 1.0 1.0967e-05 1.2 1.35e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 19661
VecMAXPY               2 1.0 2.1219e-05 1.1 2.70e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 20323
VecAssemblyBegin       1 1.0 2.7704e-04 8.9 0.00e+00 0.0 2.0e+00 1.8e+01 3.0e+00  0  0  0  0  5   0  0  0  0  5     0
VecAssemblyEnd         1 1.0 2.8610e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin        6 1.0 6.3896e-04 3.8 0.00e+00 0.0 5.2e+02 6.1e+03 2.0e+00  0  0 32 11  3   0  0 32 11  4     0
VecScatterEnd          4 1.0 2.9111e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize           2 1.0 1.3494e-04 1.8 4.04e+04 1.0 0.0e+00 0.0e+00 2.0e+00  0  4  0  0  3   0  4  0  0  4  4794
MatMult                1 1.0 1.6370e-03 1.1 1.05e+06 1.1 1.3e+02 3.1e+03 0.0e+00  0 92  8  1  0   0 92  8  1  0  9970
MatSolve               2 1.0 9.7397e-02 1.0 0.00e+00 0.0 5.4e+02 5.5e+03 6.0e+00  0  0 34 10 10   0  0 34 10 11     0
MatLUFactorSym         1 1.0 1.2882e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00  6  0  0  0 12   6  0  0  0 12     0
MatLUFactorNum         1 1.0 1.8813e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 90  0  0  0  2  90  0  0  0  2     0
MatAssemblyBegin       1 1.0 3.2401e-02 8.6 0.00e+00 0.0 3.4e+02 7.3e+04 2.0e+00  0  0 21 86  3   0  0 21 86  4     0
MatAssemblyEnd         1 1.0 1.7039e-02 1.2 0.00e+00 0.0 2.6e+02 7.7e+02 9.0e+00  0  0 16  1 16   0  0 16  1 16     0
MatGetRowIJ            1 1.0 3.0994e-06 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.5593e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  7   0  0  0  0  7     0
KSPGMRESOrthog         1 1.0 1.3494e-04 2.5 2.70e+04 1.0 0.0e+00 0.0e+00 1.0e+00  0  2  0  0  2   0  2  0  0  2  3196
KSPSetUp               1 1.0 9.6798e-05 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.0205e+01 1.0 1.14e+06 1.1 6.8e+02 5.0e+03 2.9e+01 97100 42 12 50  97100 42 12 51     1
PCSetUp                1 1.0 2.0105e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 96  0  0  0 34  96  0  0  0 35     0
PCApply                2 1.0 9.7407e-02 1.0 0.00e+00 0.0 5.4e+02 5.5e+03 6.0e+00  0  0 34 10 10   0  0 34 10 11     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    13             13      1395792     0
      Vector Scatter     4              4         3792     0
              Matrix     6              6     13990176     0
           Index Set    13             13       188444     0
       Krylov Solver     1              1        18360     0
      Preconditioner     1              1         1096     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 2.00272e-06
Average time for zero size MPI_Send(): 1.2517e-06
#PETSc Option Table entries:
-f point.inp
-log_summary
-mat_mumps_icntl_4 1
-pc_factor_mat_solver_package mumps
-pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Jul 17 17:20:12 2013
Configure options: --with-mpi-dir=/usr/mpi/gcc/mvapich2-1.9 --with-cmake=1 --download-cmake=1 --with-metis==1 --download-metis=1 --with-parmetis=1 --download-parmetis=1 --with-scalapack=1 --download-scalapack=1 --with-mumps=1 --download-mumps=1 --with-debugging=0 --with-shared-libraries=0 --CFLAGS=-O2 --FFLAGS=-O2
-----------------------------------------
Libraries compiled on Wed Jul 17 17:20:12 2013 on aci-service-1.chtc.wisc.edu 
Machine characteristics: Linux-2.6.32-358.6.2.el6.x86_64-x86_64-with-redhat-6.3-Carbon
Using PETSc directory: /home/stali2/petsc-3.4.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /usr/mpi/gcc/mvapich2-1.9/bin/mpicc -O2 -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /usr/mpi/gcc/mvapich2-1.9/bin/mpif90 -O2 -O   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/stali2/petsc-3.4.1/arch-linux2-c-opt/include -I/home/stali2/petsc-3.4.1/include -I/home/stali2/petsc-3.4.1/include -I/home/stali2/petsc-3.4.1/arch-linux2-c-opt/include -I/usr/mpi/gcc/mvapich2-1.9/include
-----------------------------------------

Using C linker: /usr/mpi/gcc/mvapich2-1.9/bin/mpicc
Using Fortran linker: /usr/mpi/gcc/mvapich2-1.9/bin/mpif90
Using libraries: -Wl,-rpath,/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -L/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -L/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lX11 -lparmetis -lmetis -lpthread -L/usr/mpi/gcc/mvapich2-1.9/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lmpichf90 -lgfortran -lm -lm -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrdmacm -libumad -libverbs -lrt -lnuma -lpthread -lgcc_s -ldl 
-----------------------------------------

-------------- next part --------------
 Reading input ...
 Partitioning mesh ...
 Reading mesh data ...
 Forming [K] ...
 Forming RHS ...
 Setting up solver ...
 Solving ...
Entering DMUMPS driver with JOB, N, NZ =   1      107811              0

 DMUMPS 4.10.0        
L U Solver for unsymmetric matrices
Type of parallelism: Working host

 ****** ANALYSIS STEP ********

 ** Max-trans not allowed because matrix is distributed
Using ParMETIS for parallel ordering.
Structual symmetry is:100%
 A root of estimated size         6878  has been selected for Scalapack.

Leaving analysis phase with  ...
INFOG(1)                                       =               0
INFOG(2)                                       =               0
 -- (20) Number of entries in factors (estim.) =       221871415
 --  (3) Storage of factors  (REAL, estimated) =       221946297
 --  (4) Storage of factors  (INT , estimated) =         2296827
 --  (5) Maximum frontal size      (estimated) =            6878
 --  (6) Number of nodes in the tree           =            3695
 -- (32) Type of analysis effectively used     =               2
 --  (7) Ordering option effectively used      =               2
ICNTL(6) Maximum transversal option            =               0
ICNTL(7) Pivot order option                    =               7
Percentage of memory relaxation (effective)    =              35
Number of level 2 nodes                        =              24
Number of split nodes                          =               0
RINFOG(1) Operations during elimination (estim)=   5.795D+11
Distributed matrix entry format (ICNTL(18))    =               3
 ** Rank of proc needing largest memory in IC facto        :         0
 ** Estimated corresponding MBYTES for IC facto            :       407
 ** Estimated avg. MBYTES per work. proc at facto (IC)     :       323
 ** TOTAL     space in MBYTES for IC factorization         :      5170
 ** Rank of proc needing largest memory for OOC facto      :         0
 ** Estimated corresponding MBYTES for OOC facto           :       291
 ** Estimated avg. MBYTES per work. proc at facto (OOC)    :       247
 ** TOTAL     space in MBYTES for OOC factorization        :      3955
Entering DMUMPS driver with JOB, N, NZ =   2      107811        8214057

 ****** FACTORIZATION STEP ********


 GLOBAL STATISTICS PRIOR NUMERICAL FACTORIZATION ...
 NUMBER OF WORKING PROCESSES              =          16
 OUT-OF-CORE OPTION (ICNTL(22))           =           0
 REAL SPACE FOR FACTORS                   =   221946297
 INTEGER SPACE FOR FACTORS                =     2296827
 MAXIMUM FRONTAL SIZE (ESTIMATED)         =        6878
 NUMBER OF NODES IN THE TREE              =        3695
 Convergence error after scaling for ONE-NORM (option 7/8)   = 0.63D-01
 Maximum effective relaxed size of S              =    38287711
 Average effective relaxed size of S              =    28596634

 REDISTRIB: TOTAL DATA LOCAL/SENT         =      383440    11281292
 GLOBAL TIME FOR MATRIX DISTRIBUTION       =      0.1332
 ** Memory relaxation parameter ( ICNTL(14)  )            :        35
 ** Rank of processor needing largest memory in facto     :         0
 ** Space in MBYTES used by this processor for facto      :       407
 ** Avg. Space in MBYTES per working proc during facto    :       323

 ELAPSED TIME FOR FACTORIZATION           =     21.6514
 Maximum effective space used in S   (KEEP8(67)   =    32912724
 Average effective space used in S   (KEEP8(67)   =    21712976
 ** EFF Min: Rank of processor needing largest memory :         0
 ** EFF Min: Space in MBYTES used by this processor   :       364
 ** EFF Min: Avg. Space in MBYTES per working proc    :       268

 GLOBAL STATISTICS 
 RINFOG(2)  OPERATIONS IN NODE ASSEMBLY   = 5.123D+08
 ------(3)  OPERATIONS IN NODE ELIMINATION= 5.795D+11
 INFOG (9)  REAL SPACE FOR FACTORS        =   221871415
 INFOG(10)  INTEGER SPACE FOR FACTORS     =     2294033
 INFOG(11)  MAXIMUM FRONT SIZE            =        6878
 INFOG(29)  NUMBER OF ENTRIES IN FACTORS  =   174564531
 INFOG(12) NB OF OFF DIAGONAL PIVOTS      =           0
 INFOG(13)  NUMBER OF DELAYED PIVOTS      =           0
 INFOG(14)  NUMBER OF MEMORY COMPRESS     =          15
 KEEP8(108) Extra copies IP stacking      =           0
Entering DMUMPS driver with JOB, N, NZ =   3      107811        8214057


 ****** SOLVE & CHECK STEP ********


 STATISTICS PRIOR SOLVE PHASE     ...........
 NUMBER OF RIGHT-HAND-SIDES                    =           1
 BLOCKING FACTOR FOR MULTIPLE RHS              =           1
 ICNTL (9)                                     =           1
  --- (10)                                     =           0
  --- (11)                                     =           0
  --- (20)                                     =           0
  --- (21)                                     =           1
  --- (30)                                     =           0
 ** Rank of processor needing largest memory in solve     :         0
 ** Space in MBYTES used by this processor for solve      :       325
 ** Avg. Space in MBYTES per working proc during solve    :       236
Entering DMUMPS driver with JOB, N, NZ =   3      107811        8214057


 ****** SOLVE & CHECK STEP ********


 STATISTICS PRIOR SOLVE PHASE     ...........
 NUMBER OF RIGHT-HAND-SIDES                    =           1
 BLOCKING FACTOR FOR MULTIPLE RHS              =           1
 ICNTL (9)                                     =           1
  --- (10)                                     =           0
  --- (11)                                     =           0
  --- (20)                                     =           0
  --- (21)                                     =           1
  --- (30)                                     =           0
 ** Rank of processor needing largest memory in solve     :         0
 ** Space in MBYTES used by this processor for solve      :       325
 ** Avg. Space in MBYTES per working proc during solve    :       236
 Recovering stress ...
Entering DMUMPS driver with JOB, N, NZ =  -2      107811        8214057
 Cleaning up ...
 Finished
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./defmod on a arch-linux2-c-opt named aci-048.chtc.wisc.edu with 16 processors, by stali2 Wed Jan 29 10:55:09 2014
Using Petsc Release Version 3.4.1, Jun, 10, 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.455e+01      1.00000   2.455e+01
Objects:              3.900e+01      1.00000   3.900e+01
Flops:                1.144e+06      1.05187   1.114e+06  1.783e+07
Flops/sec:            4.660e+04      1.05187   4.540e+04  7.264e+05
MPI Messages:         1.240e+02      2.03279   9.762e+01  1.562e+03
MPI Message Lengths:  2.372e+06      1.81175   1.828e+04  2.855e+07
MPI Reductions:       5.800e+01      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.4547e+01 100.0%  1.7830e+07 100.0%  1.562e+03 100.0%  1.828e+04      100.0%  5.700e+01  98.3% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMDot                1 1.0 8.5831e-05 5.1 1.35e+04 1.0 0.0e+00 0.0e+00 1.0e+00  0  1  0  0  2   0  1  0  0  2  2512
VecNorm                2 1.0 1.5712e-04 2.7 2.70e+04 1.0 0.0e+00 0.0e+00 2.0e+00  0  2  0  0  3   0  2  0  0  4  2745
VecScale               2 1.0 1.4067e-05 1.4 1.35e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 15329
VecCopy                1 1.0 3.6001e-05 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                 5 1.0 3.3236e-0411.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                1 1.0 1.2159e-05 1.3 1.35e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 17733
VecMAXPY               2 1.0 1.9789e-05 1.1 2.70e+04 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 21792
VecAssemblyBegin       1 1.0 3.8505e-0412.8 0.00e+00 0.0 2.0e+00 1.8e+01 3.0e+00  0  0  0  0  5   0  0  0  0  5     0
VecAssemblyEnd         1 1.0 2.8610e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin        6 1.0 5.0306e-04 3.8 0.00e+00 0.0 4.9e+02 6.5e+03 2.0e+00  0  0 31 11  3   0  0 31 11  4     0
VecScatterEnd          4 1.0 2.8682e-04 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize           2 1.0 1.7118e-04 2.3 4.04e+04 1.0 0.0e+00 0.0e+00 2.0e+00  0  4  0  0  3   0  4  0  0  4  3779
MatMult                1 1.0 1.6229e-03 1.1 1.05e+06 1.1 1.3e+02 3.1e+03 0.0e+00  0 92  8  1  0   0 92  8  1  0 10056
MatSolve               2 1.0 1.0811e-01 1.0 0.00e+00 0.0 4.9e+02 6.1e+03 6.0e+00  0  0 31 10 10   0  0 31 10 11     0
MatLUFactorSym         1 1.0 1.8920e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 7.0e+00  8  0  0  0 12   8  0  0  0 12     0
MatLUFactorNum         1 1.0 2.1836e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 89  0  0  0  2  89  0  0  0  2     0
MatAssemblyBegin       1 1.0 3.2708e-02 7.5 0.00e+00 0.0 3.4e+02 7.3e+04 2.0e+00  0  0 22 86  3   0  0 22 86  4     0
MatAssemblyEnd         1 1.0 1.6633e-02 1.2 0.00e+00 0.0 2.6e+02 7.7e+02 9.0e+00  0  0 17  1 16   0  0 17  1 16     0
MatGetRowIJ            1 1.0 8.1062e-06 6.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.5593e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  7   0  0  0  0  7     0
KSPGMRESOrthog         1 1.0 1.3590e-04 2.4 2.70e+04 1.0 0.0e+00 0.0e+00 1.0e+00  0  2  0  0  2   0  2  0  0  2  3173
KSPSetUp               1 1.0 9.8944e-05 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.3841e+01 1.0 1.14e+06 1.1 6.2e+02 5.4e+03 2.9e+01 97100 40 12 50  97100 40 12 51     1
PCSetUp                1 1.0 2.3731e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+01 97  0  0  0 34  97  0  0  0 35     0
PCApply                2 1.0 1.0812e-01 1.0 0.00e+00 0.0 4.9e+02 6.1e+03 6.0e+00  0  0 31 10 10   0  0 31 10 11     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    13             13      1395792     0
      Vector Scatter     4              4         3792     0
              Matrix     6              6     13990176     0
           Index Set    13             13       218932     0
       Krylov Solver     1              1        18360     0
      Preconditioner     1              1         1096     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 0
Average time for MPI_Barrier(): 2.19345e-06
Average time for zero size MPI_Send(): 1.2517e-06
#PETSc Option Table entries:
-f point.inp
-log_summary
-mat_mumps_icntl_28 2
-mat_mumps_icntl_29 2
-mat_mumps_icntl_4 1
-pc_factor_mat_solver_package mumps
-pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Jul 17 17:20:12 2013
Configure options: --with-mpi-dir=/usr/mpi/gcc/mvapich2-1.9 --with-cmake=1 --download-cmake=1 --with-metis==1 --download-metis=1 --with-parmetis=1 --download-parmetis=1 --with-scalapack=1 --download-scalapack=1 --with-mumps=1 --download-mumps=1 --with-debugging=0 --with-shared-libraries=0 --CFLAGS=-O2 --FFLAGS=-O2
-----------------------------------------
Libraries compiled on Wed Jul 17 17:20:12 2013 on aci-service-1.chtc.wisc.edu 
Machine characteristics: Linux-2.6.32-358.6.2.el6.x86_64-x86_64-with-redhat-6.3-Carbon
Using PETSc directory: /home/stali2/petsc-3.4.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /usr/mpi/gcc/mvapich2-1.9/bin/mpicc -O2 -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /usr/mpi/gcc/mvapich2-1.9/bin/mpif90 -O2 -O   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/stali2/petsc-3.4.1/arch-linux2-c-opt/include -I/home/stali2/petsc-3.4.1/include -I/home/stali2/petsc-3.4.1/include -I/home/stali2/petsc-3.4.1/arch-linux2-c-opt/include -I/usr/mpi/gcc/mvapich2-1.9/include
-----------------------------------------

Using C linker: /usr/mpi/gcc/mvapich2-1.9/bin/mpicc
Using Fortran linker: /usr/mpi/gcc/mvapich2-1.9/bin/mpif90
Using libraries: -Wl,-rpath,/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -L/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -L/home/stali2/petsc-3.4.1/arch-linux2-c-opt/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -llapack -lblas -lX11 -lparmetis -lmetis -lpthread -L/usr/mpi/gcc/mvapich2-1.9/lib -L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lmpichf90 -lgfortran -lm -lm -lm -lm -lmpichcxx -lstdc++ -lmpichcxx -lstdc++ -ldl -lmpich -lopa -lmpl -lrdmacm -libumad -libverbs -lrt -lnuma -lpthread -lgcc_s -ldl 
-----------------------------------------



More information about the petsc-users mailing list