[petsc-users] reusing LU factorization?

Hong Zhang hzhang at mcs.anl.gov
Wed Jan 29 15:29:00 CST 2014


David :
Thanks.


> MatSolve               7 1.0 1.2316e+00 1.0 0.00e+00 0.0 1.0e+03 2.1e+04
> 1.1e+01  5  0 20 12  4   5  0 20 12  4     0
> MatLUFactorSym         1 1.0 6.9763e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 7.0e+00 26  0  0  0  2  26  0  0  0  2     0
> MatLUFactorNum         1 1.0 1.4693e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.0e+00 55  0  0  0  0  55  0  0  0  0     0
>

 Here it shows LU factorization is called once, takes (26+55)% of total
time :-(
MatLUFactorSym takes 1/2 of MatLUFactorNum time.
'-ksp_view' may show what ordering is being used.
I usually use
'-mat_mumps_icntl_7 2'. Small acceleration may be achieved by experimenting
matrix orderings.

Direct solver is expensive in memory, execution time and scalability. Do
you have option of using
other preconditioner?

Hong

>
>
>
> On Wed, Jan 29, 2014 at 12:30 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:
>
>> David,
>> The 1st solve calls LU factorization once which took 3.3686e+01 sec.
>> The remaining solves do not call LU factorization at all, thus fast.
>>
>>
>>> MatSolve               8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  7  0  0  0  0   7  0  0  0  0     0
>>> MatLUFactorSym         1 1.0 4.4394e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> MatLUFactorNum         1 1.0 3.3686e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00 85  0  0  0  0  85  0  0  0  0     0
>>>
>>
>>  For petsc/Superlu_dist  interface,  MatLUFactorNum actually includes
>> time for MatLUFactorSym
>> because Superlu_dist's API design., i.e., 3.3686e+01 includes time spent
>> on MatLUFactorSym.
>> Can you send us '-log_summary' from mumps?
>>
>>  Hong
>>
>>
>>
>>>    MatAssemblyBegin       7 1.0 4.4827e-02 3.4 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 1.4e+01  0  0  0  0  5   0  0  0  0  5     0
>>> MatAssemblyEnd         7 1.0 2.1890e-01 1.1 0.00e+00 0.0 2.6e+02 2.2e+04
>>> 8.0e+00  1  0  6  4  3   1  0  6  4  3     0
>>> KSPGMRESOrthog         6 1.0 3.5851e-03 1.2 1.12e+06 1.0 0.0e+00 0.0e+00
>>> 6.0e+00  0  8  0  0  2   0  8  0  0  2  3760
>>> KSPSetUp               1 1.0 8.0609e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>>> KSPSolve               2 1.0 3.6561e+01 1.0 1.51e+07 1.7 7.9e+02 8.7e+04
>>> 2.2e+01 92 63 18 43  8  92 63 18 43  8     3
>>> PCSetUp                1 1.0 3.3688e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 8.0e+00 85  0  0  0  3  85  0  0  0  3     0
>>> PCApply                8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>>> 0.0e+00  7  0  0  0  0   7  0  0  0  0     0
>>>
>>> ------------------------------------------------------------------------------------------------------------------------
>>> Memory usage is given in bytes:
>>> Object Type          Creations   Destructions     Memory  Descendants'
>>> Mem.
>>> Reports information only for process 0.
>>> --- Event Stage 0: Main Stage
>>>               Vector    50             50      6210712     0
>>>       Vector Scatter    15             15        15540     0
>>>            Index Set    32             32       465324     0
>>>               Matrix     6              6     21032760     0
>>>        Krylov Solver     1              1        18288     0
>>>       Preconditioner     1              1          936     0
>>>               Viewer     2              1          712     0
>>>
>>> ========================================================================================================================
>>> Average time to get PetscTime(): 9.53674e-08
>>> Average time for MPI_Barrier(): 5.38826e-06
>>> Average time for zero size MPI_Send(): 9.57648e-06
>>> #PETSc Option Table entries:
>>> -Dmax 0.01
>>> -LowerPML 0
>>> -Mx 72
>>> -My 60
>>> -Mz 4
>>> -Nc 3
>>> -Npmlx 0
>>> -Npmly 0
>>> -Npmlz 3
>>> -Nx 72
>>> -Ny 60
>>> -Nz 10
>>> -dD 0.01
>>> -epsfile eps3dhi.txt
>>> -fproffile fprof3dhi.txt
>>> -gamma 2.0
>>> -hx 0.0625
>>> -hy 0.061858957413174
>>> -hz 0.2
>>> -in0 pass
>>> -log_summary
>>> -norm 0.01
>>> -out0 above
>>> -pc_factor_mat_solver_package superlu_dist
>>> -pc_type lu
>>> -printnewton 1
>>> -wa 1.5
>>> #End of PETSc Option Table entries
>>> Compiled without FORTRAN kernels
>>> Compiled with full precision matrices (default)
>>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>>> Configure run at: Wed Apr 17 13:30:40 2013
>>> Configure options: --prefix=/sw/xt/petsc/3.3/cnl3.1_pgi11.9.0
>>> --PETSC_ARCH=REAL --known-level1-dcache-assoc=2
>>> --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64
>>> --with-cc=cc --with-fc=ftn --with-cxx=CC --with-fortran
>>> --with-fortran-interfaces=1 --with-single-library=1
>>> --with-shared-libraries=0 --with-scalar-type=real --with-debugging=0
>>> --known-mpi-shared-libraries=0 --with-clib-autodetect=0
>>> --with-fortranlib-autodetect=0 --known-memcmp-ok=1 --COPTFLAGS=-fast
>>> -Mipa=fast --FOPTFLAGS=-fast -Mipa=fast
>>> --with-blas-lapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
>>> --with-scalapack=1
>>> --with-scalapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
>>> --with-scalapack-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
>>> --with-ptscotch=1 --download-ptscotch=yes --with-blacs=1
>>> --with-blacs-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
>>> --with-blacs-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
>>> --with-superlu_dist=1 --download-superlu_dist=yes --with-metis=1
>>> --download-metis=yes --with-parmetis=1 --download-parmetis=yes
>>> --with-mumps=1 --download-mumps=yes --with-hypre=1 --download-hypre
>>> --with-fftw=1 --with-fftw-dir=/opt/fftw/3.3.0.0/x86_64 --with-hdf5=1
>>> --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.6/pgi/109
>>> -----------------------------------------
>>> Libraries compiled on Wed Apr 17 13:30:40 2013 on krakenpf3
>>> Machine characteristics: Linux-2.6.27.48-0.12.1
>>> _1.0301.5943-cray_ss_s-x86_64-with-SuSE-11-x86_64
>>> Using PETSc directory:
>>> /nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4
>>> Using PETSc arch: REAL
>>> -----------------------------------------
>>> Using C compiler: cc  -fast  ${COPTFLAGS} ${CFLAGS}
>>> Using Fortran compiler: ftn  -fast   ${FOPTFLAGS} ${FFLAGS}
>>> -----------------------------------------
>>> Using include paths:
>>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
>>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
>>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
>>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
>>> -I/opt/fftw/3.3.0.0/x86_64/include-I/opt/cray/hdf5-parallel/1.8.6/pgi/109/include
>>> -I/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/include
>>> -I/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/include
>>> -I/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/include
>>> -I/opt/xt-libsci/11.0.06/pgi/109/istanbul/include -I/usr/include/alps
>>> -----------------------------------------
>>> Using C linker: cc
>>> Using Fortran linker: ftn
>>> Using libraries:
>>> -Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>>> -L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>>> -lpetsc
>>> -Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>>> -L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>>> -lsuperlu_dist_3.1 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common
>>> -lpord -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr
>>> -Wl,-rpath,/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib
>>> -L/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib -lsci_pgi -lsci_pgi -lpthread
>>> -Wl,-rpath,/opt/fftw/3.3.0.0/x86_64/lib -L/opt/fftw/3.3.0.0/x86_64/lib-lfftw3_mpi -lfftw3 -lHYPRE
>>> -L/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/lib64
>>> -L/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/lib64
>>> -L/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/lib
>>> -L/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib -L/usr/lib/alps
>>> -L/opt/pgi/11.9.0/linux86-64/11.9/lib
>>> -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -ldl -L/opt/cray/atp/1.4.1/lib
>>> -lAtpSigHandler -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi
>>> -lsci_pgi_mp -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi
>>> -lportals -lalpslli -lalpsutil -lpthread
>>> -Wl,-rpath,/opt/pgi/11.9.0/linux86-64/11.9/lib -lzceh -lgcc_eh -lstdmpz
>>> -lCmpz -lpgmp -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc
>>> -lpgc -lsci_pgi -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib
>>> -lhdf5_fortran -lhdf5 -lhdf5hl_fortran -lhdf5_hl -ldl -lAtpSigHandler
>>> -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi -lsci_pgi_mp
>>> -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi -lportals
>>> -lalpslli -lalpsutil -lpthread -lzceh -lgcc_eh -lstdmpz -lCmpz -lpgmp
>>> -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc -lpgc -lrt -lm
>>> -lz -lz -ldl
>>> -----------------------------------------
>>>
>>>
>>>
>>>
>>>  On Tue, Jan 28, 2014 at 6:34 PM, Barry Smith <bsmith at mcs.anl.gov>wrote:
>>>
>>>>
>>>>   Ok then use -log_summary and put the first solve in a separate stage
>>>> (see PetscStageRegister()) and send the results of a run back demonstrating
>>>> the slow first solver and we may be able to see what the story is.
>>>>
>>>>    Barry
>>>>
>>>> On Jan 28, 2014, at 5:23 PM, David Liu <daveliu at mit.edu> wrote:
>>>>
>>>> > wow that is news to me. I always assumed that this is normal.
>>>> >
>>>> > I'm pretty certain it's not the preallocation. I'm using
>>>> MatCreateMPI, and to my knowledge I wouldn't even be able to set the values
>>>> without crashing if I didn't preallocate. (If I'm not mistaken, the setting
>>>> values slowly without preallocating is only possible if you create the Mat
>>>> using MatCreate + MatSetup).
>>>> >
>>>> > Also, I'm certain that the time is taken on the first solve, not the
>>>> setting of values, because I use the matrix in a MatMult first to get the
>>>> RHS before solving, and the MatMult happens before the long first solve.
>>>> >
>>>> >
>>>>  > On Tue, Jan 28, 2014 at 5:04 PM, Barry Smith <bsmith at mcs.anl.gov>
>>>> wrote:
>>>> >
>>>> > On Jan 28, 2014, at 1:36 PM, David Liu <daveliu at mit.edu> wrote:
>>>> >
>>>> > > Hi, I'm writing an application that solves a sparse matrix many
>>>> times using Pastix. I notice that the first solves takes a very long time,
>>>> >
>>>> >   Is it the first "solve" or the first time you put values into that
>>>> matrix that "takes a long time"? If you are not properly preallocating the
>>>> matrix then the initial setting of values will be slow and waste memory.
>>>>  See
>>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html
>>>> >
>>>> >   The symbolic factorization is usually much faster than a numeric
>>>> factorization so that is not the cause of the slow "first solve".
>>>> >
>>>> >    Barry
>>>> >
>>>> >
>>>> >
>>>> > > while the subsequent solves are very fast. I don't fully understand
>>>> what's going on behind the curtains, but I'm guessing it's because the very
>>>> first solve has to read in the non-zero structure for the LU factorization,
>>>> while the subsequent solves are faster because the nonzero structure
>>>> doesn't change.
>>>> > >
>>>> > > My question is, is there any way to save the information obtained
>>>> from the very first solve, so that the next time I run the application, the
>>>> very first solve can be fast too (provided that I still have the same
>>>> nonzero structure)?
>>>> >
>>>> >
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140129/deaf1721/attachment.html>


More information about the petsc-users mailing list