[petsc-users] reusing LU factorization?

Matthew Knepley knepley at gmail.com
Wed Jan 29 11:24:25 CST 2014


On Wed, Jan 29, 2014 at 10:48 AM, David Liu <daveliu at mit.edu> wrote:

> Sure thing. Here is a run where the first solve took about 34s, while the
> second solve took about 2 s.
>

Comments below:


> ===========================
>
>
> ************************************************************************************************************************
> ***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document            ***
>
> ************************************************************************************************************************
> ---------------------------------------------- PETSc Performance Summary:
> ----------------------------------------------
> ./SaltOut on a REAL named nid00407 with 12 processors, by Unknown Wed Jan
> 29 11:33:44 2014
> Using Petsc Release Version 3.3.0, Patch 4, Fri Oct 26 10:46:51 CDT 2012
>                          Max       Max/Min        Avg      Total
> Time (sec):           3.979e+01      1.00000   3.979e+01
> Objects:              1.070e+02      1.00000   1.070e+02
> Flops:                2.431e+07      1.75268   1.478e+07  1.773e+08
> Flops/sec:            6.108e+05      1.75268   3.714e+05  4.456e+06
> MPI Messages:         4.845e+02      1.53081   3.612e+02  4.335e+03
> MPI Message Lengths:  1.770e+07      1.45602   3.721e+04  1.613e+08
> MPI Reductions:       2.810e+02      1.00000
> Flop counting convention: 1 flop = 1 real number operation of type
> (multiply/divide/add/subtract)
>                             e.g., VecAXPY() for real vectors of length N
> --> 2N flops
>                             and VecAXPY() for complex vectors of length N
> --> 8N flops
> Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages
> ---  -- Message Lengths --  -- Reductions --
>                         Avg     %Total     Avg     %Total   counts
> %Total     Avg         %Total   counts   %Total
>  0:      Main Stage: 3.9795e+01 100.0%  1.7734e+08 100.0%  4.335e+03
> 100.0%  3.721e+04      100.0%  2.800e+02  99.6%
>
> ------------------------------------------------------------------------------------------------------------------------
> See the 'Profiling' chapter of the users' manual for details on
> interpreting output.
> Phase summary info:
>    Count: number of times phase was executed
>    Time and Flops: Max - maximum over all processors
>                    Ratio - ratio of maximum to minimum over all processors
>    Mess: number of messages sent
>    Avg. len: average message length
>    Reduct: number of global reductions
>    Global: entire computation
>    Stage: stages of a computation. Set stages with PetscLogStagePush() and
> PetscLogStagePop().
>       %T - percent time in this phase         %f - percent flops in this
> phase
>       %M - percent messages in this phase     %L - percent message lengths
> in this phase
>       %R - percent reductions in this phase
>    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
> over all processors)
>
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops
>       --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
>
> ------------------------------------------------------------------------------------------------------------------------
> --- Event Stage 0: Main Stage
> VecView                1 1.0 1.6816e+0011.5 0.00e+00 0.0 1.1e+01 1.7e+05
> 0.0e+00  2  0  0  1  0   2  0  0  1  0     0
> VecMDot                6 1.0 2.8601e-03 1.3 5.62e+05 1.0 0.0e+00 0.0e+00
> 6.0e+00  0  4  0  0  2   0  4  0  0  2  2356
> VecNorm               12 1.0 1.5430e-03 1.3 5.18e+05 1.0 0.0e+00 0.0e+00
> 1.2e+01  0  4  0  0  4   0  4  0  0  4  4032
> VecScale              14 1.0 2.8896e-04 1.2 3.02e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 12558
> VecCopy               26 1.0 3.6035e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecSet                52 1.0 3.0789e-03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY                6 1.0 5.8293e-04 1.2 2.59e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0  5336
> VecMAXPY               8 1.0 9.5987e-04 1.2 8.21e+05 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  6  0  0  0   0  6  0  0  0 10261
> VecAssemblyBegin      41 1.0 7.6625e-0131.1 0.00e+00 0.0 6.6e+01 4.9e+04
> 1.2e+02  2  0  2  2 44   2  0  2  2 44     0
> VecAssemblyEnd        41 1.0 4.5586e-04 2.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecPointwiseMult       1 1.0 2.1911e-04 1.2 2.16e+04 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0  1183
> VecScatterBegin       60 1.0 2.2238e-02 1.2 0.00e+00 0.0 3.5e+03 4.2e+04
> 0.0e+00  0  0 81 92  0   0  0 81 92  0     0
> VecScatterEnd         60 1.0 6.2252e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNormalize           8 1.0 1.0526e-03 1.4 5.18e+05 1.0 0.0e+00 0.0e+00
> 8.0e+00  0  4  0  0  3   0  4  0  0  3  5910
> MatMult               10 1.0 1.3628e-01 1.3 2.18e+07 1.9 1.3e+03 8.7e+04
> 0.0e+00  0 83 30 72  0   0 83 30 72  0  1083
> MatSolve               8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  7  0  0  0  0   7  0  0  0  0     0
> MatLUFactorSym         1 1.0 4.4394e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatLUFactorNum         1 1.0 3.3686e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 85  0  0  0  0  85  0  0  0  0     0
>

1) All the time is in the numerical factorization

2) You only factor it once, so the next solve just applies the factors. If
you wanted it to factor again, you
     have to call KSPSetOperators(..., SAME_NONZERO_PATTERN)

   Matt


> MatAssemblyBegin       7 1.0 4.4827e-02 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 1.4e+01  0  0  0  0  5   0  0  0  0  5     0
> MatAssemblyEnd         7 1.0 2.1890e-01 1.1 0.00e+00 0.0 2.6e+02 2.2e+04
> 8.0e+00  1  0  6  4  3   1  0  6  4  3     0
> KSPGMRESOrthog         6 1.0 3.5851e-03 1.2 1.12e+06 1.0 0.0e+00 0.0e+00
> 6.0e+00  0  8  0  0  2   0  8  0  0  2  3760
> KSPSetUp               1 1.0 8.0609e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               2 1.0 3.6561e+01 1.0 1.51e+07 1.7 7.9e+02 8.7e+04
> 2.2e+01 92 63 18 43  8  92 63 18 43  8     3
> PCSetUp                1 1.0 3.3688e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 8.0e+00 85  0  0  0  3  85  0  0  0  3     0
> PCApply                8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  7  0  0  0  0   7  0  0  0  0     0
>
> ------------------------------------------------------------------------------------------------------------------------
> Memory usage is given in bytes:
> Object Type          Creations   Destructions     Memory  Descendants' Mem.
> Reports information only for process 0.
> --- Event Stage 0: Main Stage
>               Vector    50             50      6210712     0
>       Vector Scatter    15             15        15540     0
>            Index Set    32             32       465324     0
>               Matrix     6              6     21032760     0
>        Krylov Solver     1              1        18288     0
>       Preconditioner     1              1          936     0
>               Viewer     2              1          712     0
>
> ========================================================================================================================
> Average time to get PetscTime(): 9.53674e-08
> Average time for MPI_Barrier(): 5.38826e-06
> Average time for zero size MPI_Send(): 9.57648e-06
> #PETSc Option Table entries:
> -Dmax 0.01
> -LowerPML 0
> -Mx 72
> -My 60
> -Mz 4
> -Nc 3
> -Npmlx 0
> -Npmly 0
> -Npmlz 3
> -Nx 72
> -Ny 60
> -Nz 10
> -dD 0.01
> -epsfile eps3dhi.txt
> -fproffile fprof3dhi.txt
> -gamma 2.0
> -hx 0.0625
> -hy 0.061858957413174
> -hz 0.2
> -in0 pass
> -log_summary
> -norm 0.01
> -out0 above
> -pc_factor_mat_solver_package superlu_dist
> -pc_type lu
> -printnewton 1
> -wa 1.5
> #End of PETSc Option Table entries
> Compiled without FORTRAN kernels
> Compiled with full precision matrices (default)
> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> Configure run at: Wed Apr 17 13:30:40 2013
> Configure options: --prefix=/sw/xt/petsc/3.3/cnl3.1_pgi11.9.0
> --PETSC_ARCH=REAL --known-level1-dcache-assoc=2
> --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64
> --with-cc=cc --with-fc=ftn --with-cxx=CC --with-fortran
> --with-fortran-interfaces=1 --with-single-library=1
> --with-shared-libraries=0 --with-scalar-type=real --with-debugging=0
> --known-mpi-shared-libraries=0 --with-clib-autodetect=0
> --with-fortranlib-autodetect=0 --known-memcmp-ok=1 --COPTFLAGS=-fast
> -Mipa=fast --FOPTFLAGS=-fast -Mipa=fast
> --with-blas-lapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
> --with-scalapack=1
> --with-scalapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
> --with-scalapack-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
> --with-ptscotch=1 --download-ptscotch=yes --with-blacs=1
> --with-blacs-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
> --with-blacs-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
> --with-superlu_dist=1 --download-superlu_dist=yes --with-metis=1
> --download-metis=yes --with-parmetis=1 --download-parmetis=yes
> --with-mumps=1 --download-mumps=yes --with-hypre=1 --download-hypre
> --with-fftw=1 --with-fftw-dir=/opt/fftw/3.3.0.0/x86_64 --with-hdf5=1
> --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.6/pgi/109
> -----------------------------------------
> Libraries compiled on Wed Apr 17 13:30:40 2013 on krakenpf3
> Machine characteristics: Linux-2.6.27.48-0.12.1
> _1.0301.5943-cray_ss_s-x86_64-with-SuSE-11-x86_64
> Using PETSc directory:
> /nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4
> Using PETSc arch: REAL
> -----------------------------------------
> Using C compiler: cc  -fast  ${COPTFLAGS} ${CFLAGS}
> Using Fortran compiler: ftn  -fast   ${FOPTFLAGS} ${FFLAGS}
> -----------------------------------------
> Using include paths:
> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
> -I/opt/fftw/3.3.0.0/x86_64/include-I/opt/cray/hdf5-parallel/1.8.6/pgi/109/include
> -I/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/include
> -I/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/include
> -I/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/include
> -I/opt/xt-libsci/11.0.06/pgi/109/istanbul/include -I/usr/include/alps
> -----------------------------------------
> Using C linker: cc
> Using Fortran linker: ftn
> Using libraries:
> -Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
> -L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
> -lpetsc
> -Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
> -L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
> -lsuperlu_dist_3.1 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common
> -lpord -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr
> -Wl,-rpath,/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib
> -L/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib -lsci_pgi -lsci_pgi -lpthread
> -Wl,-rpath,/opt/fftw/3.3.0.0/x86_64/lib -L/opt/fftw/3.3.0.0/x86_64/lib-lfftw3_mpi -lfftw3 -lHYPRE
> -L/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/lib64
> -L/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/lib64
> -L/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/lib
> -L/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib -L/usr/lib/alps
> -L/opt/pgi/11.9.0/linux86-64/11.9/lib
> -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -ldl -L/opt/cray/atp/1.4.1/lib
> -lAtpSigHandler -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi
> -lsci_pgi_mp -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi
> -lportals -lalpslli -lalpsutil -lpthread
> -Wl,-rpath,/opt/pgi/11.9.0/linux86-64/11.9/lib -lzceh -lgcc_eh -lstdmpz
> -lCmpz -lpgmp -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc
> -lpgc -lsci_pgi -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib
> -lhdf5_fortran -lhdf5 -lhdf5hl_fortran -lhdf5_hl -ldl -lAtpSigHandler
> -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi -lsci_pgi_mp
> -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi -lportals
> -lalpslli -lalpsutil -lpthread -lzceh -lgcc_eh -lstdmpz -lCmpz -lpgmp
> -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc -lpgc -lrt -lm
> -lz -lz -ldl
> -----------------------------------------
>
>
>
>
> On Tue, Jan 28, 2014 at 6:34 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>
>>   Ok then use -log_summary and put the first solve in a separate stage
>> (see PetscStageRegister()) and send the results of a run back demonstrating
>> the slow first solver and we may be able to see what the story is.
>>
>>    Barry
>>
>> On Jan 28, 2014, at 5:23 PM, David Liu <daveliu at mit.edu> wrote:
>>
>> > wow that is news to me. I always assumed that this is normal.
>> >
>> > I'm pretty certain it's not the preallocation. I'm using MatCreateMPI,
>> and to my knowledge I wouldn't even be able to set the values without
>> crashing if I didn't preallocate. (If I'm not mistaken, the setting values
>> slowly without preallocating is only possible if you create the Mat using
>> MatCreate + MatSetup).
>> >
>> > Also, I'm certain that the time is taken on the first solve, not the
>> setting of values, because I use the matrix in a MatMult first to get the
>> RHS before solving, and the MatMult happens before the long first solve.
>> >
>> >
>> > On Tue, Jan 28, 2014 at 5:04 PM, Barry Smith <bsmith at mcs.anl.gov>
>> wrote:
>> >
>> > On Jan 28, 2014, at 1:36 PM, David Liu <daveliu at mit.edu> wrote:
>> >
>> > > Hi, I'm writing an application that solves a sparse matrix many times
>> using Pastix. I notice that the first solves takes a very long time,
>> >
>> >   Is it the first "solve" or the first time you put values into that
>> matrix that "takes a long time"? If you are not properly preallocating the
>> matrix then the initial setting of values will be slow and waste memory.
>>  See
>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html
>> >
>> >   The symbolic factorization is usually much faster than a numeric
>> factorization so that is not the cause of the slow "first solve".
>> >
>> >    Barry
>> >
>> >
>> >
>> > > while the subsequent solves are very fast. I don't fully understand
>> what's going on behind the curtains, but I'm guessing it's because the very
>> first solve has to read in the non-zero structure for the LU factorization,
>> while the subsequent solves are faster because the nonzero structure
>> doesn't change.
>> > >
>> > > My question is, is there any way to save the information obtained
>> from the very first solve, so that the next time I run the application, the
>> very first solve can be fast too (provided that I still have the same
>> nonzero structure)?
>> >
>> >
>>
>>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140129/3e987ce2/attachment-0001.html>


More information about the petsc-users mailing list