[petsc-users] reusing LU factorization?

David Liu daveliu at mit.edu
Wed Jan 29 12:53:30 CST 2014


sure thing, here's the log summary from mumps

=======

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
-fCourier9' to print this document            ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./SaltOut on a REAL named nid15905 with 12 processors, by Unknown Wed Jan
29 13:51:15 2014
Using Petsc Release Version 3.3.0, Patch 4, Fri Oct 26 10:46:51 CDT 2012
                         Max       Max/Min        Avg      Total
Time (sec):           2.674e+01      1.00000   2.674e+01
Objects:              1.150e+02      1.00000   1.150e+02
Flops:                2.184e+07      1.75464   1.327e+07  1.592e+08
Flops/sec:            8.168e+05      1.75464   4.961e+05  5.954e+06
MPI Messages:         5.365e+02      1.37212   4.358e+02  5.229e+03
MPI Message Lengths:  1.813e+07      1.39374   3.270e+04  1.710e+08
MPI Reductions:       2.980e+02      1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N
--> 2N flops
                            and VecAXPY() for complex vectors of length N
--> 8N flops
Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---
 -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total
    Avg         %Total   counts   %Total
 0:      Main Stage: 2.6743e+01 100.0%  1.5922e+08 100.0%  5.229e+03 100.0%
 3.270e+04      100.0%  2.970e+02  99.7%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this
phase
      %M - percent messages in this phase     %L - percent message lengths
in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over
all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops
      --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecView                1 1.0 1.7160e+00 9.1 0.00e+00 0.0 1.1e+01 1.7e+05
0.0e+00  4  0  0  1  0   4  0  0  1  0     0
VecMDot                5 1.0 2.0933e-03 1.7 4.75e+05 1.0 0.0e+00 0.0e+00
5.0e+00  0  4  0  0  2   0  4  0  0  2  2724
VecNorm               11 1.0 1.3332e-03 2.1 4.75e+05 1.0 0.0e+00 0.0e+00
1.1e+01  0  4  0  0  4   0  4  0  0  4  4277
VecScale              13 1.0 2.6584e-04 1.3 2.81e+05 1.0 0.0e+00 0.0e+00
0.0e+00  0  2  0  0  0   0  2  0  0  0 12676
VecCopy               18 1.0 2.1093e-03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                53 1.0 3.2377e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                6 1.0 6.5589e-04 1.2 2.59e+05 1.0 0.0e+00 0.0e+00
0.0e+00  0  2  0  0  0   0  2  0  0  0  4742
VecMAXPY               7 1.0 8.7738e-04 1.3 6.91e+05 1.0 0.0e+00 0.0e+00
0.0e+00  0  5  0  0  0   0  5  0  0  0  9454
VecAssemblyBegin      41 1.0 9.8640e-0110.4 0.00e+00 0.0 6.6e+01 4.9e+04
1.2e+02  3  0  1  2 41   3  0  1  2 41     0
VecAssemblyEnd        41 1.0 4.4966e-04 2.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult       1 1.0 2.2316e-04 1.2 2.16e+04 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  1162
VecScatterBegin       73 1.0 3.7682e-02 1.7 0.00e+00 0.0 4.2e+03 3.7e+04
7.0e+00  0  0 80 91  2   0  0 80 91  2     0
VecScatterEnd         66 1.0 5.7186e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize           7 1.0 1.0850e-03 2.4 4.54e+05 1.0 0.0e+00 0.0e+00
7.0e+00  0  3  0  0  2   0  3  0  0  2  5017
MatMult                9 1.0 1.1722e-01 1.2 1.96e+07 1.9 1.2e+03 8.7e+04
0.0e+00  0 83 23 61  0   0 83 23 61  0  1133
MatSolve               7 1.0 1.2316e+00 1.0 0.00e+00 0.0 1.0e+03 2.1e+04
1.1e+01  5  0 20 12  4   5  0 20 12  4     0
MatLUFactorSym         1 1.0 6.9763e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
7.0e+00 26  0  0  0  2  26  0  0  0  2     0
MatLUFactorNum         1 1.0 1.4693e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 55  0  0  0  0  55  0  0  0  0     0
MatAssemblyBegin       7 1.0 3.9971e-02 3.3 0.00e+00 0.0 0.0e+00 0.0e+00
1.4e+01  0  0  0  0  5   0  0  0  0  5     0
MatAssemblyEnd         7 1.0 2.1923e-01 1.1 0.00e+00 0.0 2.6e+02 2.2e+04
8.0e+00  1  0  5  3  3   1  0  5  3  3     0
KSPGMRESOrthog         5 1.0 2.8822e-03 1.5 9.50e+05 1.0 0.0e+00 0.0e+00
5.0e+00  0  7  0  0  2   0  7  0  0  2  3957
KSPSetUp               1 1.0 8.2898e-04 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               2 1.0 2.2964e+01 1.0 1.26e+07 1.7 1.7e+03 4.7e+04
3.9e+01 86 59 32 46 13  86 59 32 46 13     4
PCSetUp                1 1.0 2.1672e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
1.6e+01 81  0  0  0  5  81  0  0  0  5     0
PCApply                7 1.0 1.2317e+00 1.0 0.00e+00 0.0 1.0e+03 2.1e+04
1.1e+01  5  0 20 12  4   5  0 20 12  4     0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
              Vector    53             53      8461496     0
      Vector Scatter    17             17        17196     0
           Index Set    35             35       574428     0
              Matrix     6              6     21034632     0
       Krylov Solver     1              1        18288     0
      Preconditioner     1              1          936     0
              Viewer     2              1          712     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 6.21796e-05
Average time for zero size MPI_Send(): 1.08282e-05
#PETSc Option Table entries:
-Dmax 0.01
-LowerPML 0
-Mx 72
-My 60
-Mz 4
-Nc 3
-Npmlx 0
-Npmly 0
-Npmlz 3
-Nx 72
-Ny 60
-Nz 10
-dD 0.01
-epsfile eps3dhi.txt
-fproffile fprof3dhi.txt
-gamma 2.0
-hx 0.0625
-hy 0.061858957413174
-hz 0.2
-in0 pass
-log_summary
-norm 0.01
-out0 temp
-pc_factor_mat_solver_package mumps
-pc_type lu
-printnewton 1
-wa 1.5
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Apr 17 13:30:40 2013
Configure options: --prefix=/sw/xt/petsc/3.3/cnl3.1_pgi11.9.0
--PETSC_ARCH=REAL --known-level1-dcache-assoc=2
--known-level1-dcache-size=65536 --known-level1-dcache-linesize=64
--with-cc=cc --with-fc=ftn --with-cxx=CC --with-fortran
--with-fortran-interfaces=1 --with-single-library=1
--with-shared-libraries=0 --with-scalar-type=real --with-debugging=0
--known-mpi-shared-libraries=0 --with-clib-autodetect=0
--with-fortranlib-autodetect=0 --known-memcmp-ok=1 --COPTFLAGS=-fast
-Mipa=fast --FOPTFLAGS=-fast -Mipa=fast
--with-blas-lapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
--with-scalapack=1
--with-scalapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
--with-scalapack-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
--with-ptscotch=1 --download-ptscotch=yes --with-blacs=1
--with-blacs-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
--with-blacs-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
--with-superlu_dist=1 --download-superlu_dist=yes --with-metis=1
--download-metis=yes --with-parmetis=1 --download-parmetis=yes
--with-mumps=1 --download-mumps=yes --with-hypre=1 --download-hypre
--with-fftw=1 --with-fftw-dir=/opt/fftw/3.3.0.0/x86_64 --with-hdf5=1
--with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.6/pgi/109
-----------------------------------------
Libraries compiled on Wed Apr 17 13:30:40 2013 on krakenpf3
Machine characteristics:
Linux-2.6.27.48-0.12.1_1.0301.5943-cray_ss_s-x86_64-with-SuSE-11-x86_64
Using PETSc directory:
/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4
Using PETSc arch: REAL
-----------------------------------------
Using C compiler: cc  -fast  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: ftn  -fast   ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths:
-I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
-I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
-I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
-I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
-I/opt/fftw/3.3.0.0/x86_64/include-I/opt/cray/hdf5-parallel/1.8.6/pgi/109/include
-I/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/include
-I/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/include
-I/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/include
-I/opt/xt-libsci/11.0.06/pgi/109/istanbul/include -I/usr/include/alps
-----------------------------------------
Using C linker: cc
Using Fortran linker: ftn
Using libraries:
-Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
-L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
-lpetsc
-Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
-L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
-lsuperlu_dist_3.1 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common
-lpord -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr
-Wl,-rpath,/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib
-L/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib -lsci_pgi -lsci_pgi -lpthread
-Wl,-rpath,/opt/fftw/3.3.0.0/x86_64/lib
-L/opt/fftw/3.3.0.0/x86_64/lib-lfftw3_mpi -lfftw3 -lHYPRE
-L/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/lib64
-L/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/lib64
-L/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/lib
-L/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib -L/usr/lib/alps
-L/opt/pgi/11.9.0/linux86-64/11.9/lib
-L/usr/lib64/gcc/x86_64-suse-linux/4.3 -ldl -L/opt/cray/atp/1.4.1/lib
-lAtpSigHandler -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi
-lsci_pgi_mp -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi
-lportals -lalpslli -lalpsutil -lpthread
-Wl,-rpath,/opt/pgi/11.9.0/linux86-64/11.9/lib -lzceh -lgcc_eh -lstdmpz
-lCmpz -lpgmp -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc
-lpgc -lsci_pgi -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib
-lhdf5_fortran -lhdf5 -lhdf5hl_fortran -lhdf5_hl -ldl -lAtpSigHandler
-lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi -lsci_pgi_mp
-lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi -lportals
-lalpslli -lalpsutil -lpthread -lzceh -lgcc_eh -lstdmpz -lCmpz -lpgmp
-lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc -lpgc -lrt -lm
-lz -lz -ldl
-----------------------------------------


On Wed, Jan 29, 2014 at 12:30 PM, Hong Zhang <hzhang at mcs.anl.gov> wrote:

> David,
> The 1st solve calls LU factorization once which took 3.3686e+01 sec.
> The remaining solves do not call LU factorization at all, thus fast.
>
>
>> MatSolve               8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  7  0  0  0  0   7  0  0  0  0     0
>> MatLUFactorSym         1 1.0 4.4394e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> MatLUFactorNum         1 1.0 3.3686e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 85  0  0  0  0  85  0  0  0  0     0
>>
>
> For petsc/Superlu_dist  interface,  MatLUFactorNum actually includes time
> for MatLUFactorSym
> because Superlu_dist's API design., i.e., 3.3686e+01 includes time spent
> on MatLUFactorSym.
> Can you send us '-log_summary' from mumps?
>
> Hong
>
>
>
>> MatAssemblyBegin       7 1.0 4.4827e-02 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
>> 1.4e+01  0  0  0  0  5   0  0  0  0  5     0
>> MatAssemblyEnd         7 1.0 2.1890e-01 1.1 0.00e+00 0.0 2.6e+02 2.2e+04
>> 8.0e+00  1  0  6  4  3   1  0  6  4  3     0
>> KSPGMRESOrthog         6 1.0 3.5851e-03 1.2 1.12e+06 1.0 0.0e+00 0.0e+00
>> 6.0e+00  0  8  0  0  2   0  8  0  0  2  3760
>> KSPSetUp               1 1.0 8.0609e-04 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve               2 1.0 3.6561e+01 1.0 1.51e+07 1.7 7.9e+02 8.7e+04
>> 2.2e+01 92 63 18 43  8  92 63 18 43  8     3
>> PCSetUp                1 1.0 3.3688e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 8.0e+00 85  0  0  0  3  85  0  0  0  3     0
>> PCApply                8 1.0 2.8005e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  7  0  0  0  0   7  0  0  0  0     0
>>
>> ------------------------------------------------------------------------------------------------------------------------
>> Memory usage is given in bytes:
>> Object Type          Creations   Destructions     Memory  Descendants'
>> Mem.
>> Reports information only for process 0.
>> --- Event Stage 0: Main Stage
>>               Vector    50             50      6210712     0
>>       Vector Scatter    15             15        15540     0
>>            Index Set    32             32       465324     0
>>               Matrix     6              6     21032760     0
>>        Krylov Solver     1              1        18288     0
>>       Preconditioner     1              1          936     0
>>               Viewer     2              1          712     0
>>
>> ========================================================================================================================
>> Average time to get PetscTime(): 9.53674e-08
>> Average time for MPI_Barrier(): 5.38826e-06
>> Average time for zero size MPI_Send(): 9.57648e-06
>> #PETSc Option Table entries:
>> -Dmax 0.01
>> -LowerPML 0
>> -Mx 72
>> -My 60
>> -Mz 4
>> -Nc 3
>> -Npmlx 0
>> -Npmly 0
>> -Npmlz 3
>> -Nx 72
>> -Ny 60
>> -Nz 10
>> -dD 0.01
>> -epsfile eps3dhi.txt
>> -fproffile fprof3dhi.txt
>> -gamma 2.0
>> -hx 0.0625
>> -hy 0.061858957413174
>> -hz 0.2
>> -in0 pass
>> -log_summary
>> -norm 0.01
>> -out0 above
>> -pc_factor_mat_solver_package superlu_dist
>> -pc_type lu
>> -printnewton 1
>> -wa 1.5
>> #End of PETSc Option Table entries
>> Compiled without FORTRAN kernels
>> Compiled with full precision matrices (default)
>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4
>> Configure run at: Wed Apr 17 13:30:40 2013
>> Configure options: --prefix=/sw/xt/petsc/3.3/cnl3.1_pgi11.9.0
>> --PETSC_ARCH=REAL --known-level1-dcache-assoc=2
>> --known-level1-dcache-size=65536 --known-level1-dcache-linesize=64
>> --with-cc=cc --with-fc=ftn --with-cxx=CC --with-fortran
>> --with-fortran-interfaces=1 --with-single-library=1
>> --with-shared-libraries=0 --with-scalar-type=real --with-debugging=0
>> --known-mpi-shared-libraries=0 --with-clib-autodetect=0
>> --with-fortranlib-autodetect=0 --known-memcmp-ok=1 --COPTFLAGS=-fast
>> -Mipa=fast --FOPTFLAGS=-fast -Mipa=fast
>> --with-blas-lapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
>> --with-scalapack=1
>> --with-scalapack-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
>> --with-scalapack-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
>> --with-ptscotch=1 --download-ptscotch=yes --with-blacs=1
>> --with-blacs-lib=/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib/libsci_pgi.a
>> --with-blacs-include=/opt/xt-libsci/11.0.06/pgi/109/istanbul/include
>> --with-superlu_dist=1 --download-superlu_dist=yes --with-metis=1
>> --download-metis=yes --with-parmetis=1 --download-parmetis=yes
>> --with-mumps=1 --download-mumps=yes --with-hypre=1 --download-hypre
>> --with-fftw=1 --with-fftw-dir=/opt/fftw/3.3.0.0/x86_64 --with-hdf5=1
>> --with-hdf5-dir=/opt/cray/hdf5-parallel/1.8.6/pgi/109
>> -----------------------------------------
>> Libraries compiled on Wed Apr 17 13:30:40 2013 on krakenpf3
>> Machine characteristics: Linux-2.6.27.48-0.12.1
>> _1.0301.5943-cray_ss_s-x86_64-with-SuSE-11-x86_64
>> Using PETSc directory:
>> /nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4
>> Using PETSc arch: REAL
>> -----------------------------------------
>> Using C compiler: cc  -fast  ${COPTFLAGS} ${CFLAGS}
>> Using Fortran compiler: ftn  -fast   ${FOPTFLAGS} ${FFLAGS}
>> -----------------------------------------
>> Using include paths:
>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/include
>> -I/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/include
>> -I/opt/fftw/3.3.0.0/x86_64/include-I/opt/cray/hdf5-parallel/1.8.6/pgi/109/include
>> -I/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/include
>> -I/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/include
>> -I/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/include
>> -I/opt/xt-libsci/11.0.06/pgi/109/istanbul/include -I/usr/include/alps
>> -----------------------------------------
>> Using C linker: cc
>> Using Fortran linker: ftn
>> Using libraries:
>> -Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>> -L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>> -lpetsc
>> -Wl,-rpath,/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>> -L/nics/e/sw/xt-cle3.1/petsc/3.3/cnl3.1_pgi11.9.0/petsc-3.3-p4/REAL/lib
>> -lsuperlu_dist_3.1 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common
>> -lpord -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr
>> -Wl,-rpath,/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib
>> -L/opt/xt-libsci/11.0.06/pgi/109/istanbul/lib -lsci_pgi -lsci_pgi -lpthread
>> -Wl,-rpath,/opt/fftw/3.3.0.0/x86_64/lib -L/opt/fftw/3.3.0.0/x86_64/lib-lfftw3_mpi -lfftw3 -lHYPRE
>> -L/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/lib64
>> -L/opt/cray/pmi/2.1.4-1.0000.8596.15.1.ss/lib64
>> -L/opt/cray/mpt/5.3.5/xt/seastar/mpich2-pgi/109/lib
>> -L/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib -L/usr/lib/alps
>> -L/opt/pgi/11.9.0/linux86-64/11.9/lib
>> -L/usr/lib64/gcc/x86_64-suse-linux/4.3 -ldl -L/opt/cray/atp/1.4.1/lib
>> -lAtpSigHandler -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi
>> -lsci_pgi_mp -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi
>> -lportals -lalpslli -lalpsutil -lpthread
>> -Wl,-rpath,/opt/pgi/11.9.0/linux86-64/11.9/lib -lzceh -lgcc_eh -lstdmpz
>> -lCmpz -lpgmp -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc
>> -lpgc -lsci_pgi -Wl,-rpath,/opt/cray/hdf5-parallel/1.8.6/pgi/109/lib
>> -lhdf5_fortran -lhdf5 -lhdf5hl_fortran -lhdf5_hl -ldl -lAtpSigHandler
>> -lhdf5_hl_pgi_parallel -lhdf5_pgi_parallel -lz -lscicpp_pgi -lsci_pgi_mp
>> -lfftw3 -lfftw3f -lmpichcxx_pgi -lmpich_pgi -lmpl -lrt -lpmi -lportals
>> -lalpslli -lalpsutil -lpthread -lzceh -lgcc_eh -lstdmpz -lCmpz -lpgmp
>> -lpgf90 -lpgf90_rpm1 -lpgf902 -lpgf90rtl -lpgftnrtl -lnspgc -lpgc -lrt -lm
>> -lz -lz -ldl
>> -----------------------------------------
>>
>>
>>
>>
>> On Tue, Jan 28, 2014 at 6:34 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>>
>>>   Ok then use -log_summary and put the first solve in a separate stage
>>> (see PetscStageRegister()) and send the results of a run back demonstrating
>>> the slow first solver and we may be able to see what the story is.
>>>
>>>    Barry
>>>
>>> On Jan 28, 2014, at 5:23 PM, David Liu <daveliu at mit.edu> wrote:
>>>
>>> > wow that is news to me. I always assumed that this is normal.
>>> >
>>> > I'm pretty certain it's not the preallocation. I'm using MatCreateMPI,
>>> and to my knowledge I wouldn't even be able to set the values without
>>> crashing if I didn't preallocate. (If I'm not mistaken, the setting values
>>> slowly without preallocating is only possible if you create the Mat using
>>> MatCreate + MatSetup).
>>> >
>>> > Also, I'm certain that the time is taken on the first solve, not the
>>> setting of values, because I use the matrix in a MatMult first to get the
>>> RHS before solving, and the MatMult happens before the long first solve.
>>> >
>>> >
>>> > On Tue, Jan 28, 2014 at 5:04 PM, Barry Smith <bsmith at mcs.anl.gov>
>>> wrote:
>>> >
>>> > On Jan 28, 2014, at 1:36 PM, David Liu <daveliu at mit.edu> wrote:
>>> >
>>> > > Hi, I'm writing an application that solves a sparse matrix many
>>> times using Pastix. I notice that the first solves takes a very long time,
>>> >
>>> >   Is it the first "solve" or the first time you put values into that
>>> matrix that "takes a long time"? If you are not properly preallocating the
>>> matrix then the initial setting of values will be slow and waste memory.
>>>  See
>>> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatXAIJSetPreallocation.html
>>> >
>>> >   The symbolic factorization is usually much faster than a numeric
>>> factorization so that is not the cause of the slow "first solve".
>>> >
>>> >    Barry
>>> >
>>> >
>>> >
>>> > > while the subsequent solves are very fast. I don't fully understand
>>> what's going on behind the curtains, but I'm guessing it's because the very
>>> first solve has to read in the non-zero structure for the LU factorization,
>>> while the subsequent solves are faster because the nonzero structure
>>> doesn't change.
>>> > >
>>> > > My question is, is there any way to save the information obtained
>>> from the very first solve, so that the next time I run the application, the
>>> very first solve can be fast too (provided that I still have the same
>>> nonzero structure)?
>>> >
>>> >
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20140129/4074a8fc/attachment-0001.html>


More information about the petsc-users mailing list