[petsc-users] reusing LU factorization?
David Liu
daveliu at mit.edu
Wed Jan 29 10:48:06 CST 2014
Sure thing. Here is a run where the first solve took about 34s, while the
second solve took about 2 s.
===========================
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./SaltOut on a REAL named nid00407 with 12 processors, by Unknown Wed Jan
29 11:33:44 2014
Using Petsc Release Version 3.3.0, Patch 4, Fri Oct 26 10:46:51 CDT 2012
Max Max/Min Avg Total
Time (sec): 3.979e+01 1.00000 3.979e+01
Objects: 1.070e+02 1.00000 1.070e+02
Flops: 2.431e+07 1.75268 1.478e+07 1.773e+08
Flops/sec: 6.108e+05 1.75268 3.714e+05 4.456e+06
MPI Messages: 4.845e+02 1.53081 3.612e+02 4.335e+03
MPI Message Lengths: 1.770e+07 1.45602 3.721e+04 1.613e+08
MPI Reductions: 2.810e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
--> 2N flops
and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
-- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 3.9795e+01 100.0% 1.7734e+08 100.0% 4.335e+03 100.0%
3.721e+04 100.0% 2.800e+02 99.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this
phase
%M - percent messages in this phase %L - percent message lengths
in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len
Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecView 1 1.0 1.6816e+0011.5 0.00e+00 0.0 1.1e+01 1.7e+05
0.0e+00 2 0 0 1 0 2 0 0 1 0 0
VecMDot 6 1.0 2.8601e-03 1.3 5.62e+05 1.0 0.0e+00 0.0e+00
6.0e+00 0 4 0 0 2 0 4 0 0 2 2356
VecNorm 12 1.0 1.5430e-03 1.3 5.18e+05 1.0 0.0e+00 0.0e+00
1.2e+01 0 4 0 0 4 0 4 0 0 4 4032
VecScale 14 1.0 2.8896e-04 1.2 3.02e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 2 0 0 0 0 2 0 0 0 12558
VecCopy 26 1.0 3.6035e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 52 1.0 3.0789e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 6 1.0 5.8293e-04 1.2 2.59e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 2 0 0 0 0 2 0 0 0 5336
VecMAXPY 8 1.0 9.5987e-04 1.2 8.21e+05 1.0 0.0e+00 0.0e+00
0.0e+00 0 6 0 0 0 0 6 0 0 0 10261
VecAssemblyBegin 41 1.0 7.6625e-0131.1 0.00e+00 0.0 6.6e+01 4.9e+04
1.2e+02 2 0 2 2 44 2 0 2 2 44 0
VecAssemblyEnd 41 1.0 4.5586e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 1 1.0 2.1911e-04 1.2 2.16e+04 1.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 1183
VecScatterBegin 60 1.0 2.2238e-02 1.2 0.00e+00 0.0 3.5e+03 4.2e+04
0.0e+00 0 0 81 92 0 0 0 81 92 0 0
VecScatterEnd 60 1.0 6.2252e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 8 1.0 1.0526e-03 1.4 5.18e+05 1.0 0.0e+00 0.0e+00
8.0e+00 0 4 0 0 3 0 4 0 0 3 5910
MatMult 10 1.0 1.3628e-01 1.3 2.18e+07 1.9 1.3e+03 8.7e+04
0.0e+00 0 83 30 72 0 0 83 30 72 0 1083
MatSolve 8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 7 0 0 0 0 7 0 0 0 0 0
MatLUFactorSym 1 1.0 4.4394e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatLUFactorNum 1 1.0 3.3686e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 85 0 0 0 0 85 0 0 0 0 0
MatAssemblyBegin 7 1.0 4.4827e-02 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
1.4e+01 0 0 0 0 5 0 0 0 0 5 0
MatAssemblyEnd 7 1.0 2.1890e-01 1.1 0.00e+00 0.0 2.6e+02 2.2e+04
8.0e+00 1 0 6 4 3 1 0 6 4 3 0
KSPGMRESOrthog 6 1.0 3.5851e-03 1.2 1.12e+06 1.0 0.0e+00 0.0e+00
6.0e+00 0 8 0 0 2 0 8 0 0 2 3760
KSPSetUp 1 1.0 8.0609e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 3.6561e+01 1.0 1.51e+07 1.7 7.9e+02 8.7e+04
2.2e+01 92 63 18 43 8 92 63 18 43 8 3
PCSetUp 1 1.0 3.3688e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
8.0e+00 85 0 0 0 3 85 0 0 0 3 0
PCApply 8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 7 0 0 0 0 7 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 50 50 6210712 0
Vector Scatter 15 15 15540 0
Index Set 32 32 465324 0
Matrix 6 6 21032760 0
Krylov Solver 1 1 18288 0
Preconditioner 1 1 936 0
Viewer 2 1 712 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 5.38826e-06
Average time for zero size MPI_Send(): 9.57648e-06
#PETSc Option Table entries:
-Dmax 0.01
-LowerPML 0
-Mx 72
-My 60
-Mz 4
-Nc 3
-Npmlx 0
-Npmly 0
-Npmlz 3
-Nx 72
-Ny 60
-Nz 10
-dD 0.01
-epsfile eps3dhi.txt
-fproffile fprof3dhi.txt
-gamma 2.0
-hx 0.0625
-hy 0.061858957413174
-hz 0.2
-in0 pass
-log_summary
-norm 0.01
-out0 above
-pc_factor_mat_solver_package superlu_dist
-pc_type lu
-printnewton 1
-wa 1.5
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Apr 17 13:30:40 2013
Configure options: --prefix=/sw/xt/petsc/3.3/cnl3.1_pgi11.9.0
--PETSC_ARCH=REAL --known-level1-dcache-assoc=2
--known-level1-dcache-size=65536 --known-level1-dcache-linesize=64
--with-cc=cc --with-fc=ftn --with-cxx=CC --with-fortran
--with-fortran-interfaces=1 --with-single-library=1
--with-shared-libraries=0 --with-scalar-type=real --with-debugging=0
--known-mpi-shared-libraries=0 --with-clib-autodetect=0
--with-fortranlib-autodetect=0 --known-memcmp-ok=1 --COPTFLAGS=-fast
-Mipa=fast --FOPTFLAGS=-fast -Mipa=fast
--with-blas-lapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
--with-scalapack=1
--with-scalapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
--with-scalapack-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
--with-ptscotch=1 --download-ptscotch=yes --with-blacs=1
--with-blacs-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
--with-blacs-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
--with-superlu_dist=1 --download-superlu_dist=yes --with-metis=1
--download-metis=yes --with-parmetis=1 --download-parmetis=yes
--with-mumps=1 --download-mumps=yes --with-hypre=1 --download-hypre
--with-fftw=1 --with-fftw-dir=/opt/fftw/3.3.0.0/x86_64 --with-hdf5=1
--with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.6/pgi/109
-----------------------------------------
Libraries compiled on Wed Apr 17 13:30:40 2013 on krakenpf3
Machine characteristics:
Linux-2.6.27.48-0.12.1_1.0301.5943-cray_ss_s-x86_64-with-SuSE-11-x86_64
Using PETSc directory:
/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4
Using PETSc arch: REAL
-----------------------------------------
Using C compiler: cc -fast ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn -fast ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths:
-I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
-I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
-I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
-I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
-I/opt/fftw/3.3.0.0/x86_64/include-I/opt/cray/hdf5-parallel/1.8.6/pgi/109/include
-I/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/include
-I/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/include
-I/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/include
-I/opt/xt-libsci/11.0.06/pgi/109/istanbul/include -I/usr/include/alps
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries:
-Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
-L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
-lpetsc
-Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
-L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
-lsuperlu_dist_3.1 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common
-lpord -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr
-Wl,-rpath,/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib
-L/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib -lsci_pgi -lsci_pgi -lpthread
-Wl,-rpath,/opt/fftw/3.3.0.0/x86_64/lib
-L/opt/fftw/3.3.0.0/x86_64/lib-lfftw3_mpi -lfftw3 -lHYPRE
-L/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/lib64
-L/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/lib64
-L/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/lib
-L/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib -L/usr/lib/alps
-L/opt/pgi/11.9.0/linux86-64/11.9/lib
-L/usr/lib64/gcc/x86_64-suse-linux/4.3 -ldl -L/opt/cray/atp/1.4.1/lib
-lAtpSigHandler -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi
-lsci_pgi_mp -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi
-lportals -lalpslli -lalpsutil -lpthread
-Wl,-rpath,/opt/pgi/11.9.0/linux86-64/11.9/lib -lzceh -lgcc_eh -lstdmpz
-lCmpz -lpgmp -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc
-lpgc -lsci_pgi -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib
-lhdf5_fortran -lhdf5 -lhdf5hl_fortran -lhdf5_hl -ldl -lAtpSigHandler
-lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi -lsci_pgi_mp
-lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi -lportals
-lalpslli -lalpsutil -lpthread -lzceh -lgcc_eh -lstdmpz -lCmpz -lpgmp
-lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc -lpgc -lrt -lm
-lz -lz -ldl
-----------------------------------------
On Tue, Jan 28, 2014 at 6:34 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
> Ok then use -log_summary and put the first solve in a separate stage
> (see PetscStageRegister()) and send the results of a run back demonstrating
> the slow first solver and we may be able to see what the story is.
>
> Barry
>
> On Jan 28, 2014, at 5:23 PM, David Liu <daveliu at mit.edu> wrote:
>
> > wow that is news to me. I always assumed that this is normal.
> >
> > I'm pretty certain it's not the preallocation. I'm using MatCreateMPI,
> and to my knowledge I wouldn't even be able to set the values without
> crashing if I didn't preallocate. (If I'm not mistaken, the setting values
> slowly without preallocating is only possible if you create the Mat using
> MatCreate + MatSetup).
> >
> > Also, I'm certain that the time is taken on the first solve, not the
> setting of values, because I use the matrix in a MatMult first to get the
> RHS before solving, and the MatMult happens before the long first solve.
> >
> >
> > On Tue, Jan 28, 2014 at 5:04 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> >
> > On Jan 28, 2014, at 1:36 PM, David Liu <daveliu at mit.edu> wrote:
> >
> > > Hi, I'm writing an application that solves a sparse matrix many times
> using Pastix. I notice that the first solves takes a very long time,
> >
> > Is it the first "solve" or the first time you put values into that
> matrix that "takes a long time"? If you are not properly preallocating the
> matrix then the initial setting of values will be slow and waste memory.
> See
> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html
> >
> > The symbolic factorization is usually much faster than a numeric
> factorization so that is not the cause of the slow "first solve".
> >
> > Barry
> >
> >
> >
> > > while the subsequent solves are very fast. I don't fully understand
> what's going on behind the curtains, but I'm guessing it's because the very
> first solve has to read in the non-zero structure for the LU factorization,
> while the subsequent solves are faster because the nonzero structure
> doesn't change.
> > >
> > > My question is, is there any way to save the information obtained from
> the very first solve, so that the next time I run the application, the very
> first solve can be fast too (provided that I still have the same nonzero
> structure)?
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140129/ef6e8028/attachment-0001.html>
More information about the petsc-users
mailing list