[petsc-users] reusing LU factorization?
Hong Zhang
hzhang at mcs.anl.gov
Wed Jan 29 15:29:00 CST 2014
David :
Thanks.
> MatSolve 7 1.0 1.2316e+00 1.0 0.00e+00 0.0 1.0e+03 2.1e+04
> 1.1e+01 5 0 20 12 4 5 0 20 12 4 0
> MatLUFactorSym 1 1.0 6.9763e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 7.0e+00 26 0 0 0 2 26 0 0 0 2 0
> MatLUFactorNum 1 1.0 1.4693e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00 55 0 0 0 0 55 0 0 0 0 0
>
Here it shows LU factorization is called once, takes (26+55)% of total
time :-(
MatLUFactorSym takes 1/2 of MatLUFactorNum time.
'-ksp_view' may show what ordering is being used.
I usually use
'-mat_mumps_icntl_7 2'. Small acceleration may be achieved by experimenting
matrix orderings.
Direct solver is expensive in memory, execution time and scalability. Do
you have option of using
other preconditioner?
Hong
>
>
>
> On Wed, Jan 29, 2014 at 12:30 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
>
>> David,
>> The 1st solve calls LU factorization once which took 3.3686e+01 sec.
>> The remaining solves do not call LU factorization at all, thus fast.
>>
>>
>>> MatSolve 8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 7 0 0 0 0 7 0 0 0 0 0
>>> MatLUFactorSym 1 1.0 4.4394e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> MatLUFactorNum 1 1.0 3.3686e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 85 0 0 0 0 85 0 0 0 0 0
>>>
>>
>> For petsc/Superlu_dist interface, MatLUFactorNum actually includes
>> time for MatLUFactorSym
>> because Superlu_dist's API design., i.e., 3.3686e+01 includes time spent
>> on MatLUFactorSym.
>> Can you send us '-log_summary' from mumps?
>>
>> Hong
>>
>>
>>
>>> MatAssemblyBegin 7 1.0 4.4827e-02 3.4 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 1.4e+01 0 0 0 0 5 0 0 0 0 5 0
>>> MatAssemblyEnd 7 1.0 2.1890e-01 1.1 0.00e+00 0.0 2.6e+02 2.2e+04
>>> 8.0e+00 1 0 6 4 3 1 0 6 4 3 0
>>> KSPGMRESOrthog 6 1.0 3.5851e-03 1.2 1.12e+06 1.0 0.0e+00 0.0e+00
>>> 6.0e+00 0 8 0 0 2 0 8 0 0 2 3760
>>> KSPSetUp 1 1.0 8.0609e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>> KSPSolve 2 1.0 3.6561e+01 1.0 1.51e+07 1.7 7.9e+02 8.7e+04
>>> 2.2e+01 92 63 18 43 8 92 63 18 43 8 3
>>> PCSetUp 1 1.0 3.3688e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 8.0e+00 85 0 0 0 3 85 0 0 0 3 0
>>> PCApply 8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 7 0 0 0 0 7 0 0 0 0 0
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>> Memory usage is given in bytes:
>>> Object Type Creations Destructions Memory Descendants'
>>> Mem.
>>> Reports information only for process 0.
>>> --- Event Stage 0: Main Stage
>>> Vector 50 50 6210712 0
>>> Vector Scatter 15 15 15540 0
>>> Index Set 32 32 465324 0
>>> Matrix 6 6 21032760 0
>>> Krylov Solver 1 1 18288 0
>>> Preconditioner 1 1 936 0
>>> Viewer 2 1 712 0
>>>
>>> ========================================================================================================================
>>> Average time to get PetscTime(): 9.53674e-08
>>> Average time for MPI_Barrier(): 5.38826e-06
>>> Average time for zero size MPI_Send(): 9.57648e-06
>>> #PETSc Option Table entries:
>>> -Dmax 0.01
>>> -LowerPML 0
>>> -Mx 72
>>> -My 60
>>> -Mz 4
>>> -Nc 3
>>> -Npmlx 0
>>> -Npmly 0
>>> -Npmlz 3
>>> -Nx 72
>>> -Ny 60
>>> -Nz 10
>>> -dD 0.01
>>> -epsfile eps3dhi.txt
>>> -fproffile fprof3dhi.txt
>>> -gamma 2.0
>>> -hx 0.0625
>>> -hy 0.061858957413174
>>> -hz 0.2
>>> -in0 pass
>>> -log_summary
>>> -norm 0.01
>>> -out0 above
>>> -pc_factor_mat_solver_package superlu_dist
>>> -pc_type lu
>>> -printnewton 1
>>> -wa 1.5
>>> #End of PETSc Option Table entries
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>> Configure run at: Wed Apr 17 13:30:40 2013
>>> Configure options: --prefix=/sw/xt/petsc/3.3/cnl3.1_pgi11.9.0
>>> --PETSC_ARCH=REAL --known-level1-dcache-assoc=2
>>> --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64
>>> --with-cc=cc --with-fc=ftn --with-cxx=CC --with-fortran
>>> --with-fortran-interfaces=1 --with-single-library=1
>>> --with-shared-libraries=0 --with-scalar-type=real --with-debugging=0
>>> --known-mpi-shared-libraries=0 --with-clib-autodetect=0
>>> --with-fortranlib-autodetect=0 --known-memcmp-ok=1 --COPTFLAGS=-fast
>>> -Mipa=fast --FOPTFLAGS=-fast -Mipa=fast
>>> --with-blas-lapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
>>> --with-scalapack=1
>>> --with-scalapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
>>> --with-scalapack-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
>>> --with-ptscotch=1 --download-ptscotch=yes --with-blacs=1
>>> --with-blacs-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
>>> --with-blacs-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
>>> --with-superlu_dist=1 --download-superlu_dist=yes --with-metis=1
>>> --download-metis=yes --with-parmetis=1 --download-parmetis=yes
>>> --with-mumps=1 --download-mumps=yes --with-hypre=1 --download-hypre
>>> --with-fftw=1 --with-fftw-dir=/opt/fftw/3.3.0.0/x86_64 --with-hdf5=1
>>> --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.6/pgi/109
>>> -----------------------------------------
>>> Libraries compiled on Wed Apr 17 13:30:40 2013 on krakenpf3
>>> Machine characteristics: Linux-2.6.27.48-0.12.1
>>> _1.0301.5943-cray_ss_s-x86_64-with-SuSE-11-x86_64
>>> Using PETSc directory:
>>> /nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4
>>> Using PETSc arch: REAL
>>> -----------------------------------------
>>> Using C compiler: cc -fast ${COPTFLAGS} ${CFLAGS}
>>> Using Fortran compiler: ftn -fast ${FOPTFLAGS} ${FFLAGS}
>>> -----------------------------------------
>>> Using include paths:
>>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
>>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
>>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
>>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
>>> -I/opt/fftw/3.3.0.0/x86_64/include-I/opt/cray/hdf5-parallel/1.8.6/pgi/109/include
>>> -I/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/include
>>> -I/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/include
>>> -I/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/include
>>> -I/opt/xt-libsci/11.0.06/pgi/109/istanbul/include -I/usr/include/alps
>>> -----------------------------------------
>>> Using C linker: cc
>>> Using Fortran linker: ftn
>>> Using libraries:
>>> -Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>>> -L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>>> -lpetsc
>>> -Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>>> -L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>>> -lsuperlu_dist_3.1 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common
>>> -lpord -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr
>>> -Wl,-rpath,/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib
>>> -L/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib -lsci_pgi -lsci_pgi -lpthread
>>> -Wl,-rpath,/opt/fftw/3.3.0.0/x86_64/lib -L/opt/fftw/3.3.0.0/x86_64/lib-lfftw3_mpi -lfftw3 -lHYPRE
>>> -L/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/lib64
>>> -L/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/lib64
>>> -L/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/lib
>>> -L/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib -L/usr/lib/alps
>>> -L/opt/pgi/11.9.0/linux86-64/11.9/lib
>>> -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -ldl -L/opt/cray/atp/1.4.1/lib
>>> -lAtpSigHandler -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi
>>> -lsci_pgi_mp -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi
>>> -lportals -lalpslli -lalpsutil -lpthread
>>> -Wl,-rpath,/opt/pgi/11.9.0/linux86-64/11.9/lib -lzceh -lgcc_eh -lstdmpz
>>> -lCmpz -lpgmp -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc
>>> -lpgc -lsci_pgi -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib
>>> -lhdf5_fortran -lhdf5 -lhdf5hl_fortran -lhdf5_hl -ldl -lAtpSigHandler
>>> -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi -lsci_pgi_mp
>>> -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi -lportals
>>> -lalpslli -lalpsutil -lpthread -lzceh -lgcc_eh -lstdmpz -lCmpz -lpgmp
>>> -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc -lpgc -lrt -lm
>>> -lz -lz -ldl
>>> -----------------------------------------
>>>
>>>
>>>
>>>
>>> On Tue, Jan 28, 2014 at 6:34 PM, Barry Smith <bsmith at mcs.anl.gov>wrote:
>>>
>>>>
>>>> Ok then use -log_summary and put the first solve in a separate stage
>>>> (see PetscStageRegister()) and send the results of a run back demonstrating
>>>> the slow first solver and we may be able to see what the story is.
>>>>
>>>> Barry
>>>>
>>>> On Jan 28, 2014, at 5:23 PM, David Liu <daveliu at mit.edu> wrote:
>>>>
>>>> > wow that is news to me. I always assumed that this is normal.
>>>> >
>>>> > I'm pretty certain it's not the preallocation. I'm using
>>>> MatCreateMPI, and to my knowledge I wouldn't even be able to set the values
>>>> without crashing if I didn't preallocate. (If I'm not mistaken, the setting
>>>> values slowly without preallocating is only possible if you create the Mat
>>>> using MatCreate + MatSetup).
>>>> >
>>>> > Also, I'm certain that the time is taken on the first solve, not the
>>>> setting of values, because I use the matrix in a MatMult first to get the
>>>> RHS before solving, and the MatMult happens before the long first solve.
>>>> >
>>>> >
>>>> > On Tue, Jan 28, 2014 at 5:04 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>> wrote:
>>>> >
>>>> > On Jan 28, 2014, at 1:36 PM, David Liu <daveliu at mit.edu> wrote:
>>>> >
>>>> > > Hi, I'm writing an application that solves a sparse matrix many
>>>> times using Pastix. I notice that the first solves takes a very long time,
>>>> >
>>>> > Is it the first "solve" or the first time you put values into that
>>>> matrix that "takes a long time"? If you are not properly preallocating the
>>>> matrix then the initial setting of values will be slow and waste memory.
>>>> See
>>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html
>>>> >
>>>> > The symbolic factorization is usually much faster than a numeric
>>>> factorization so that is not the cause of the slow "first solve".
>>>> >
>>>> > Barry
>>>> >
>>>> >
>>>> >
>>>> > > while the subsequent solves are very fast. I don't fully understand
>>>> what's going on behind the curtains, but I'm guessing it's because the very
>>>> first solve has to read in the non-zero structure for the LU factorization,
>>>> while the subsequent solves are faster because the nonzero structure
>>>> doesn't change.
>>>> > >
>>>> > > My question is, is there any way to save the information obtained
>>>> from the very first solve, so that the next time I run the application, the
>>>> very first solve can be fast too (provided that I still have the same
>>>> nonzero structure)?
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140129/deaf1721/attachment.html>
More information about the petsc-users
mailing list