[petsc-users] reusing LU factorization?

Hong Zhang hzhang at mcs.anl.gov
Wed Jan 29 11:30:10 CST 2014


David,
The 1st solve calls LU factorization once which took 3.3686e+01 sec.
The remaining solves do not call LU factorization at all, thus fast.


> MatSolve               8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  7  0  0  0  0   7  0  0  0  0     0
> MatLUFactorSym         1 1.0 4.4394e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatLUFactorNum         1 1.0 3.3686e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 85  0  0  0  0  85  0  0  0  0     0
>

For petsc/Superlu_dist  interface,  MatLUFactorNum actually includes time
for MatLUFactorSym
because Superlu_dist's API design., i.e., 3.3686e+01 includes time spent on
MatLUFactorSym.
Can you send us '-log_summary' from mumps?

Hong



> MatAssemblyBegin       7 1.0 4.4827e-02 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.4e+01  0  0  0  0  5   0  0  0  0  5     0
> MatAssemblyEnd         7 1.0 2.1890e-01 1.1 0.00e+00 0.0 2.6e+02 2.2e+04
> 8.0e+00  1  0  6  4  3   1  0  6  4  3     0
> KSPGMRESOrthog         6 1.0 3.5851e-03 1.2 1.12e+06 1.0 0.0e+00 0.0e+00
> 6.0e+00  0  8  0  0  2   0  8  0  0  2  3760
> KSPSetUp               1 1.0 8.0609e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               2 1.0 3.6561e+01 1.0 1.51e+07 1.7 7.9e+02 8.7e+04
> 2.2e+01 92 63 18 43  8  92 63 18 43  8     3
> PCSetUp                1 1.0 3.3688e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 8.0e+00 85  0  0  0  3  85  0  0  0  3     0
> PCApply                8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  7  0  0  0  0   7  0  0  0  0     0
>
> ------------------------------------------------------------------------------------------------------------------------
> Memory usage is given in bytes:
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
> --- Event Stage 0: Main Stage
>               Vector    50             50      6210712     0
>       Vector Scatter    15             15        15540     0
>            Index Set    32             32       465324     0
>               Matrix     6              6     21032760     0
>        Krylov Solver     1              1        18288     0
>       Preconditioner     1              1          936     0
>               Viewer     2              1          712     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> Average time for MPI_Barrier(): 5.38826e-06
> Average time for zero size MPI_Send(): 9.57648e-06
> #PETSc Option Table entries:
> -Dmax 0.01
> -LowerPML 0
> -Mx 72
> -My 60
> -Mz 4
> -Nc 3
> -Npmlx 0
> -Npmly 0
> -Npmlz 3
> -Nx 72
> -Ny 60
> -Nz 10
> -dD 0.01
> -epsfile eps3dhi.txt
> -fproffile fprof3dhi.txt
> -gamma 2.0
> -hx 0.0625
> -hy 0.061858957413174
> -hz 0.2
> -in0 pass
> -log_summary
> -norm 0.01
> -out0 above
> -pc_factor_mat_solver_package superlu_dist
> -pc_type lu
> -printnewton 1
> -wa 1.5
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Wed Apr 17 13:30:40 2013
> Configure options: --prefix=/sw/xt/petsc/3.3/cnl3.1_pgi11.9.0
> --PETSC_ARCH=REAL --known-level1-dcache-assoc=2
> --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64
> --with-cc=cc --with-fc=ftn --with-cxx=CC --with-fortran
> --with-fortran-interfaces=1 --with-single-library=1
> --with-shared-libraries=0 --with-scalar-type=real --with-debugging=0
> --known-mpi-shared-libraries=0 --with-clib-autodetect=0
> --with-fortranlib-autodetect=0 --known-memcmp-ok=1 --COPTFLAGS=-fast
> -Mipa=fast --FOPTFLAGS=-fast -Mipa=fast
> --with-blas-lapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
> --with-scalapack=1
> --with-scalapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
> --with-scalapack-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
> --with-ptscotch=1 --download-ptscotch=yes --with-blacs=1
> --with-blacs-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
> --with-blacs-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
> --with-superlu_dist=1 --download-superlu_dist=yes --with-metis=1
> --download-metis=yes --with-parmetis=1 --download-parmetis=yes
> --with-mumps=1 --download-mumps=yes --with-hypre=1 --download-hypre
> --with-fftw=1 --with-fftw-dir=/opt/fftw/3.3.0.0/x86_64 --with-hdf5=1
> --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.6/pgi/109
> -----------------------------------------
> Libraries compiled on Wed Apr 17 13:30:40 2013 on krakenpf3
> Machine characteristics: Linux-2.6.27.48-0.12.1
> _1.0301.5943-cray_ss_s-x86_64-with-SuSE-11-x86_64
> Using PETSc directory:
> /nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4
> Using PETSc arch: REAL
> -----------------------------------------
> Using C compiler: cc  -fast  ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: ftn  -fast   ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
> Using include paths:
> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
> -I/opt/fftw/3.3.0.0/x86_64/include-I/opt/cray/hdf5-parallel/1.8.6/pgi/109/include
> -I/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/include
> -I/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/include
> -I/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/include
> -I/opt/xt-libsci/11.0.06/pgi/109/istanbul/include -I/usr/include/alps
> -----------------------------------------
> Using C linker: cc
> Using Fortran linker: ftn
> Using libraries:
> -Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
> -L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
> -lpetsc
> -Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
> -L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
> -lsuperlu_dist_3.1 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common
> -lpord -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr
> -Wl,-rpath,/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib
> -L/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib -lsci_pgi -lsci_pgi -lpthread
> -Wl,-rpath,/opt/fftw/3.3.0.0/x86_64/lib -L/opt/fftw/3.3.0.0/x86_64/lib-lfftw3_mpi -lfftw3 -lHYPRE
> -L/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/lib64
> -L/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/lib64
> -L/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/lib
> -L/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib -L/usr/lib/alps
> -L/opt/pgi/11.9.0/linux86-64/11.9/lib
> -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -ldl -L/opt/cray/atp/1.4.1/lib
> -lAtpSigHandler -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi
> -lsci_pgi_mp -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi
> -lportals -lalpslli -lalpsutil -lpthread
> -Wl,-rpath,/opt/pgi/11.9.0/linux86-64/11.9/lib -lzceh -lgcc_eh -lstdmpz
> -lCmpz -lpgmp -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc
> -lpgc -lsci_pgi -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib
> -lhdf5_fortran -lhdf5 -lhdf5hl_fortran -lhdf5_hl -ldl -lAtpSigHandler
> -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi -lsci_pgi_mp
> -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi -lportals
> -lalpslli -lalpsutil -lpthread -lzceh -lgcc_eh -lstdmpz -lCmpz -lpgmp
> -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc -lpgc -lrt -lm
> -lz -lz -ldl
> -----------------------------------------
>
>
>
>
> On Tue, Jan 28, 2014 at 6:34 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>>   Ok then use -log_summary and put the first solve in a separate stage
>> (see PetscStageRegister()) and send the results of a run back demonstrating
>> the slow first solver and we may be able to see what the story is.
>>
>>    Barry
>>
>> On Jan 28, 2014, at 5:23 PM, David Liu <daveliu at mit.edu> wrote:
>>
>> > wow that is news to me. I always assumed that this is normal.
>> >
>> > I'm pretty certain it's not the preallocation. I'm using MatCreateMPI,
>> and to my knowledge I wouldn't even be able to set the values without
>> crashing if I didn't preallocate. (If I'm not mistaken, the setting values
>> slowly without preallocating is only possible if you create the Mat using
>> MatCreate + MatSetup).
>> >
>> > Also, I'm certain that the time is taken on the first solve, not the
>> setting of values, because I use the matrix in a MatMult first to get the
>> RHS before solving, and the MatMult happens before the long first solve.
>> >
>> >
>> > On Tue, Jan 28, 2014 at 5:04 PM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> >
>> > On Jan 28, 2014, at 1:36 PM, David Liu <daveliu at mit.edu> wrote:
>> >
>> > > Hi, I'm writing an application that solves a sparse matrix many times
>> using Pastix. I notice that the first solves takes a very long time,
>> >
>> >   Is it the first "solve" or the first time you put values into that
>> matrix that "takes a long time"? If you are not properly preallocating the
>> matrix then the initial setting of values will be slow and waste memory.
>>  See
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html
>> >
>> >   The symbolic factorization is usually much faster than a numeric
>> factorization so that is not the cause of the slow "first solve".
>> >
>> >    Barry
>> >
>> >
>> >
>> > > while the subsequent solves are very fast. I don't fully understand
>> what's going on behind the curtains, but I'm guessing it's because the very
>> first solve has to read in the non-zero structure for the LU factorization,
>> while the subsequent solves are faster because the nonzero structure
>> doesn't change.
>> > >
>> > > My question is, is there any way to save the information obtained
>> from the very first solve, so that the next time I run the application, the
>> very first solve can be fast too (provided that I still have the same
>> nonzero structure)?
>> >
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140129/9dfc563f/attachment.html>


More information about the petsc-users mailing list