[petsc-users] reusing LU factorization?
David Liu
daveliu at mit.edu
Wed Jan 29 15:42:35 CST 2014
Hi Hong,
Thanks a lot. I ran it with -ksp_view, and here are the relevant lines:
matrix ordering: natural
ICNTL(7) (sequentia matrix ordering):7
ICNTL(28) (use parallel or sequential ordering): 1
ICNTL(29) (parallel ordering): 0
INFOG(7) (ordering option effectively use after analysis): 5
INFOG(11) (order of largest frontal matrix after
factorization): 4384
Not too sure if there's anything I can do about this. Also, I am definately
willing to use other preconditioners. If direct solvers are expensive, what
are the cheaper alternatives?
best,
David
On Wed, Jan 29, 2014 at 4:29 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
> David :
> Thanks.
>
>
>> MatSolve 7 1.0 1.2316e+00 1.0 0.00e+00 0.0 1.0e+03 2.1e+04
>> 1.1e+01 5 0 20 12 4 5 0 20 12 4 0
>> MatLUFactorSym 1 1.0 6.9763e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 7.0e+00 26 0 0 0 2 26 0 0 0 2 0
>> MatLUFactorNum 1 1.0 1.4693e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.0e+00 55 0 0 0 0 55 0 0 0 0 0
>>
>
> Here it shows LU factorization is called once, takes (26+55)% of total
> time :-(
> MatLUFactorSym takes 1/2 of MatLUFactorNum time.
> '-ksp_view' may show what ordering is being used.
> I usually use
> '-mat_mumps_icntl_7 2'. Small acceleration may be achieved by
> experimenting matrix orderings.
>
> Direct solver is expensive in memory, execution time and scalability. Do
> you have option of using
> other preconditioner?
>
> Hong
>
>>
>>
>>
>> On Wed, Jan 29, 2014 at 12:30 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
>>
>>> David,
>>> The 1st solve calls LU factorization once which took 3.3686e+01 sec.
>>> The remaining solves do not call LU factorization at all, thus fast.
>>>
>>>
>>>> MatSolve 8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0
>>>> MatLUFactorSym 1 1.0 4.4394e-04 1.1 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>>> MatLUFactorNum 1 1.0 3.3686e+01 1.0 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00 85 0 0 0 0 85 0 0 0 0 0
>>>>
>>>
>>> For petsc/Superlu_dist interface, MatLUFactorNum actually includes
>>> time for MatLUFactorSym
>>> because Superlu_dist's API design., i.e., 3.3686e+01 includes time spent
>>> on MatLUFactorSym.
>>> Can you send us '-log_summary' from mumps?
>>>
>>> Hong
>>>
>>>
>>>
>>>> MatAssemblyBegin 7 1.0 4.4827e-02 3.4 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 1.4e+01 0 0 0 0 5 0 0 0 0 5 0
>>>> MatAssemblyEnd 7 1.0 2.1890e-01 1.1 0.00e+00 0.0 2.6e+02
>>>> 2.2e+04 8.0e+00 1 0 6 4 3 1 0 6 4 3 0
>>>> KSPGMRESOrthog 6 1.0 3.5851e-03 1.2 1.12e+06 1.0 0.0e+00
>>>> 0.0e+00 6.0e+00 0 8 0 0 2 0 8 0 0 2 3760
>>>> KSPSetUp 1 1.0 8.0609e-04 1.4 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
>>>> KSPSolve 2 1.0 3.6561e+01 1.0 1.51e+07 1.7 7.9e+02
>>>> 8.7e+04 2.2e+01 92 63 18 43 8 92 63 18 43 8 3
>>>> PCSetUp 1 1.0 3.3688e+01 1.0 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 8.0e+00 85 0 0 0 3 85 0 0 0 3 0
>>>> PCApply 8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00
>>>> 0.0e+00 0.0e+00 7 0 0 0 0 7 0 0 0 0 0
>>>>
>>>> ------------------------------------------------------------------------------------------------------------------------
>>>> Memory usage is given in bytes:
>>>> Object Type Creations Destructions Memory Descendants'
>>>> Mem.
>>>> Reports information only for process 0.
>>>> --- Event Stage 0: Main Stage
>>>> Vector 50 50 6210712 0
>>>> Vector Scatter 15 15 15540 0
>>>> Index Set 32 32 465324 0
>>>> Matrix 6 6 21032760 0
>>>> Krylov Solver 1 1 18288 0
>>>> Preconditioner 1 1 936 0
>>>> Viewer 2 1 712 0
>>>>
>>>> ========================================================================================================================
>>>> Average time to get PetscTime(): 9.53674e-08
>>>> Average time for MPI_Barrier(): 5.38826e-06
>>>> Average time for zero size MPI_Send(): 9.57648e-06
>>>> #PETSc Option Table entries:
>>>> -Dmax 0.01
>>>> -LowerPML 0
>>>> -Mx 72
>>>> -My 60
>>>> -Mz 4
>>>> -Nc 3
>>>> -Npmlx 0
>>>> -Npmly 0
>>>> -Npmlz 3
>>>> -Nx 72
>>>> -Ny 60
>>>> -Nz 10
>>>> -dD 0.01
>>>> -epsfile eps3dhi.txt
>>>> -fproffile fprof3dhi.txt
>>>> -gamma 2.0
>>>> -hx 0.0625
>>>> -hy 0.061858957413174
>>>> -hz 0.2
>>>> -in0 pass
>>>> -log_summary
>>>> -norm 0.01
>>>> -out0 above
>>>> -pc_factor_mat_solver_package superlu_dist
>>>> -pc_type lu
>>>> -printnewton 1
>>>> -wa 1.5
>>>> #End of PETSc Option Table entries
>>>> Compiled without FORTRAN kernels
>>>> Compiled with full precision matrices (default)
>>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>>>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>>> Configure run at: Wed Apr 17 13:30:40 2013
>>>> Configure options: --prefix=/sw/xt/petsc/3.3/cnl3.1_pgi11.9.0
>>>> --PETSC_ARCH=REAL --known-level1-dcache-assoc=2
>>>> --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64
>>>> --with-cc=cc --with-fc=ftn --with-cxx=CC --with-fortran
>>>> --with-fortran-interfaces=1 --with-single-library=1
>>>> --with-shared-libraries=0 --with-scalar-type=real --with-debugging=0
>>>> --known-mpi-shared-libraries=0 --with-clib-autodetect=0
>>>> --with-fortranlib-autodetect=0 --known-memcmp-ok=1 --COPTFLAGS=-fast
>>>> -Mipa=fast --FOPTFLAGS=-fast -Mipa=fast
>>>> --with-blas-lapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
>>>> --with-scalapack=1
>>>> --with-scalapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
>>>> --with-scalapack-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
>>>> --with-ptscotch=1 --download-ptscotch=yes --with-blacs=1
>>>> --with-blacs-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
>>>> --with-blacs-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
>>>> --with-superlu_dist=1 --download-superlu_dist=yes --with-metis=1
>>>> --download-metis=yes --with-parmetis=1 --download-parmetis=yes
>>>> --with-mumps=1 --download-mumps=yes --with-hypre=1 --download-hypre
>>>> --with-fftw=1 --with-fftw-dir=/opt/fftw/3.3.0.0/x86_64 --with-hdf5=1
>>>> --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.6/pgi/109
>>>> -----------------------------------------
>>>> Libraries compiled on Wed Apr 17 13:30:40 2013 on krakenpf3
>>>> Machine characteristics: Linux-2.6.27.48-0.12.1
>>>> _1.0301.5943-cray_ss_s-x86_64-with-SuSE-11-x86_64
>>>> Using PETSc directory:
>>>> /nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4
>>>> Using PETSc arch: REAL
>>>> -----------------------------------------
>>>> Using C compiler: cc -fast ${COPTFLAGS} ${CFLAGS}
>>>> Using Fortran compiler: ftn -fast ${FOPTFLAGS} ${FFLAGS}
>>>> -----------------------------------------
>>>> Using include paths:
>>>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
>>>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
>>>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
>>>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
>>>> -I/opt/fftw/3.3.0.0/x86_64/include-I/opt/cray/hdf5-parallel/1.8.6/pgi/109/include
>>>> -I/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/include
>>>> -I/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/include
>>>> -I/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/include
>>>> -I/opt/xt-libsci/11.0.06/pgi/109/istanbul/include -I/usr/include/alps
>>>> -----------------------------------------
>>>> Using C linker: cc
>>>> Using Fortran linker: ftn
>>>> Using libraries:
>>>> -Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>>>> -L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>>>> -lpetsc
>>>> -Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>>>> -L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>>>> -lsuperlu_dist_3.1 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common
>>>> -lpord -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr
>>>> -Wl,-rpath,/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib
>>>> -L/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib -lsci_pgi -lsci_pgi -lpthread
>>>> -Wl,-rpath,/opt/fftw/3.3.0.0/x86_64/lib -L/opt/fftw/3.3.0.0/x86_64/lib-lfftw3_mpi -lfftw3 -lHYPRE
>>>> -L/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/lib64
>>>> -L/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/lib64
>>>> -L/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/lib
>>>> -L/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib -L/usr/lib/alps
>>>> -L/opt/pgi/11.9.0/linux86-64/11.9/lib
>>>> -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -ldl -L/opt/cray/atp/1.4.1/lib
>>>> -lAtpSigHandler -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi
>>>> -lsci_pgi_mp -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi
>>>> -lportals -lalpslli -lalpsutil -lpthread
>>>> -Wl,-rpath,/opt/pgi/11.9.0/linux86-64/11.9/lib -lzceh -lgcc_eh -lstdmpz
>>>> -lCmpz -lpgmp -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc
>>>> -lpgc -lsci_pgi -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib
>>>> -lhdf5_fortran -lhdf5 -lhdf5hl_fortran -lhdf5_hl -ldl -lAtpSigHandler
>>>> -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi -lsci_pgi_mp
>>>> -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi -lportals
>>>> -lalpslli -lalpsutil -lpthread -lzceh -lgcc_eh -lstdmpz -lCmpz -lpgmp
>>>> -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc -lpgc -lrt -lm
>>>> -lz -lz -ldl
>>>> -----------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jan 28, 2014 at 6:34 PM, Barry Smith <bsmith at mcs.anl.gov>wrote:
>>>>
>>>>>
>>>>> Ok then use -log_summary and put the first solve in a separate stage
>>>>> (see PetscStageRegister()) and send the results of a run back demonstrating
>>>>> the slow first solver and we may be able to see what the story is.
>>>>>
>>>>> Barry
>>>>>
>>>>> On Jan 28, 2014, at 5:23 PM, David Liu <daveliu at mit.edu> wrote:
>>>>>
>>>>> > wow that is news to me. I always assumed that this is normal.
>>>>> >
>>>>> > I'm pretty certain it's not the preallocation. I'm using
>>>>> MatCreateMPI, and to my knowledge I wouldn't even be able to set the values
>>>>> without crashing if I didn't preallocate. (If I'm not mistaken, the setting
>>>>> values slowly without preallocating is only possible if you create the Mat
>>>>> using MatCreate + MatSetup).
>>>>> >
>>>>> > Also, I'm certain that the time is taken on the first solve, not the
>>>>> setting of values, because I use the matrix in a MatMult first to get the
>>>>> RHS before solving, and the MatMult happens before the long first solve.
>>>>> >
>>>>> >
>>>>> > On Tue, Jan 28, 2014 at 5:04 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>>> wrote:
>>>>> >
>>>>> > On Jan 28, 2014, at 1:36 PM, David Liu <daveliu at mit.edu> wrote:
>>>>> >
>>>>> > > Hi, I'm writing an application that solves a sparse matrix many
>>>>> times using Pastix. I notice that the first solves takes a very long time,
>>>>> >
>>>>> > Is it the first "solve" or the first time you put values into that
>>>>> matrix that "takes a long time"? If you are not properly preallocating the
>>>>> matrix then the initial setting of values will be slow and waste memory.
>>>>> See
>>>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html
>>>>> >
>>>>> > The symbolic factorization is usually much faster than a numeric
>>>>> factorization so that is not the cause of the slow "first solve".
>>>>> >
>>>>> > Barry
>>>>> >
>>>>> >
>>>>> >
>>>>> > > while the subsequent solves are very fast. I don't fully
>>>>> understand what's going on behind the curtains, but I'm guessing it's
>>>>> because the very first solve has to read in the non-zero structure for the
>>>>> LU factorization, while the subsequent solves are faster because the
>>>>> nonzero structure doesn't change.
>>>>> > >
>>>>> > > My question is, is there any way to save the information obtained
>>>>> from the very first solve, so that the next time I run the application, the
>>>>> very first solve can be fast too (provided that I still have the same
>>>>> nonzero structure)?
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140129/b39bb70e/attachment-0001.html>
More information about the petsc-users
mailing list