[petsc-users] MatFDColorCreate takes really big portion of the total time

Bao Kai paeanball at gmail.com
Wed Jun 27 02:59:57 CDT 2012


Hi, Barry,

I changed the extension of the file and clean a little bit to attach it.

It is a parallel code.

Best Regards,
Kai


  Parallel or sequential?

  Barry

Please send entire -log_summary output

On Jun 26, 2012, at 2:22 PM, Bao Kai wrote:

> Hi, all,
>
> I use the SNES in petsc-3.2 to solve my problem. The problem is a 3-D
> finite difference problem with structured grid.
>
> I use MatFDColorCreate to generate the Jacobian matrix. I just found
> that when the size of problem is big, MatFDColorCreate takes really
> long time.  The following results is the summary with size of the mesh
> to be 1000^3. 90% of the time is costed in MatFDColorCreate.
>
>    237 MatGetOrdering         1 1.0 1.3502e-03 1.1 0.00e+00 0.0
> 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
>    238 MatZeroEntries        39 1.0 9.2822e-02 1.2 0.00e+00 0.0
> 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>    239 MatFDColorCreate      10 1.0 3.2863e+03 1.0 0.00e+00 0.0
> 1.6e+07 5.0e+02 6.8e+02 90  0  0  0  3  90  0  0  0  3     0
>    240 MatFDColorApply       20 1.0 2.5288e+01 1.0 3.54e+07 1.1
> 4.6e+08 2.0e+03 8.0e+01  1  0  5  5  0   1  0  5  5  0 42708
>    241 MatFDColorFunc       560 1.0 9.5386e+00 1.3 0.00e+00 0.0
> 4.5e+08 2.0e+03 0.0e+00  0  0  5  5  0   0  0  5  5  0     0
>
> And the following the code I use.
>
>    262        call DMGetMatrix(solv%da, MATAIJ, solv%jac,ierr)
>    263        call
> DMGetColoring(solv%da,IS_COLORING_GLOBAL,MATAIJ,iscoloring,ierr)
>    264        call MatFDColoringCreate(solv%jac,iscoloring,matfdcoloring,ierr)
>    265        call
> MatFDColoringSetFunction(matfdcoloring,FormFunction,equ,ierr)
>    266        call MatFDColoringSetFromOptions(matfdcoloring,ierr)
>    267        call SNESSetJacobian(solv%snes, solv%jac,
> solv%jac,SNESDefaultComputeJacobianColor, matfdcoloring, ierr)
>    268        call ISColoringDestroy(iscoloring,ierr)
>
> I am wondering if there is anything I can do to improve this problem.
>
> Thank you very much.
>
> Best Reg
-------------- next part --------------
 RST runs on 32768 processors.
Start the simulation of RST_advection on a 3D rectanguluar mesh of size 1000x1000x1000.
.................

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./rst on a arch-shah named ionode101 with 32768 processors, by Unknown Mon May 28 14:40:44 2012
Using Petsc Release Version 3.2.0, Patch 5, Sat Oct 29 13:45:54 CDT 2011 

                         Max       Max/Min        Avg      Total 
Time (sec):           3.632e+03      1.00000   3.632e+03
Objects:              2.450e+02      1.00000   2.450e+02
Flops:                3.878e+10      1.11767   3.606e+10  1.182e+15
Flops/sec:            1.068e+07      1.11767   9.927e+06  3.253e+11
MPI Messages:         2.843e+05      3.71427   2.662e+05  8.723e+09
MPI Message Lengths:  5.701e+08      2.19550   1.977e+03  1.725e+13
MPI Reductions:       2.161e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.6323e+03 100.0%  1.1816e+15 100.0%  8.723e+09 100.0%  1.977e+03      100.0%  2.161e+04 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

PetscBarrier          10 1.0 3.0198e-02 5.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SNESSolve             10 1.0 3.4011e+02 1.0 3.88e+10 1.1 8.7e+09 2.0e+03 2.1e+04  9100100100 96   9100100100 96 3474169
SNESLineSearch        20 1.0 7.0355e-01 1.0 4.06e+07 1.1 3.2e+07 2.0e+03 8.0e+01  0  0  0  0  0   0  0  0  0  0 1759413
SNESFunctionEval      30 1.0 4.8486e-01 1.2 0.00e+00 0.0 2.4e+07 2.0e+03 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
SNESJacobianEval      20 1.0 2.5288e+01 1.0 3.54e+07 1.1 4.6e+08 2.0e+03 8.0e+01  1  0  5  5  0   1  0  5  5  0 42708
VecDot             10272 1.0 2.7129e+01 7.8 6.73e+08 1.1 0.0e+00 0.0e+00 1.0e+04  0  2  0  0 48   0  2  0  0 48 757257
VecDotNorm2         5126 1.0 2.4923e+01 8.2 6.72e+08 1.1 0.0e+00 0.0e+00 5.1e+03  0  2  0  0 24   0  2  0  0 24 822681
VecNorm             5206 1.0 2.7628e+00 2.3 3.41e+08 1.1 0.0e+00 0.0e+00 5.2e+03  0  1  0  0 24   0  1  0  0 24 3768681
VecCopy              640 1.0 1.1987e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet             10332 1.0 8.4211e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              540 1.0 1.5848e-01 1.7 3.54e+07 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 6814569
VecAXPBYCZ         10252 1.0 1.0187e+01 1.3 1.34e+09 1.1 0.0e+00 0.0e+00 0.0e+00  0  3  0  0  0   0  3  0  0  0 4025498
VecWAXPY           10272 1.0 2.6825e+00 1.2 6.73e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 7651013
VecScatterBegin    10892 1.0 4.1604e+00 2.1 0.00e+00 0.0 8.7e+09 2.0e+03 0.0e+00  0  0100100  0   0  0100100  0     0
VecScatterEnd      10892 1.0 2.1351e+01 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecReduceArith        20 1.0 4.5640e-03 1.2 1.31e+06 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 8764008
VecReduceComm         10 1.0 3.6807e-02 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0     0
MatMult            10272 1.0 1.4871e+02 1.3 1.82e+10 1.1 8.2e+09 2.0e+03 0.0e+00  4 47 94 94  0   4 47 94 94  0 3722481
MatSolve           10272 1.0 1.3996e+02 1.1 1.67e+10 1.1 0.0e+00 0.0e+00 0.0e+00  4 43  0  0  0   4 43  0  0  0 3641627
MatLUFactorNum        20 1.0 2.4788e+00 1.1 1.39e+08 1.1 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 1707587
MatILUFactorSym        1 1.0 2.2120e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      50 1.0 2.6742e+0015.4 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+01  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd        50 1.0 8.5123e-01 1.1 0.00e+00 0.0 1.6e+07 5.0e+02 1.1e+02  0  0  0  0  1   0  0  0  0  1     0
MatGetRowIJ            1 1.0 5.0068e-06 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice      20 1.0 7.3212e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.3502e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        39 1.0 9.2822e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatFDColorCreate      10 1.0 3.2863e+03 1.0 0.00e+00 0.0 1.6e+07 5.0e+02 6.8e+02 90  0  0  0  3  90  0  0  0  3     0
MatFDColorApply       20 1.0 2.5288e+01 1.0 3.54e+07 1.1 4.6e+08 2.0e+03 8.0e+01  1  0  5  5  0   1  0  5  5  0 42708
MatFDColorFunc       560 1.0 9.5386e+00 1.3 0.00e+00 0.0 4.5e+08 2.0e+03 0.0e+00  0  0  5  5  0   0  0  5  5  0     0
KSPSetup              40 1.0 1.2743e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              20 1.0 3.1394e+02 1.0 3.87e+10 1.1 8.2e+09 2.0e+03 2.1e+04  9100 94 94 95   9100 94 94 95 3756190
PCSetUp               40 1.0 3.2385e+00 1.1 1.39e+08 1.1 0.0e+00 0.0e+00 1.0e+01  0  0  0  0  0   0  0  0  0  0 1307021
PCSetUpOnBlocks    10292 1.0 1.4383e+02 1.1 1.69e+10 1.1 0.0e+00 0.0e+00 3.0e+00  4 43  0  0  0   4 43  0  0  0 3573042
PCApply            10272 1.0 1.4149e+02 1.1 1.67e+10 1.1 0.0e+00 0.0e+00 0.0e+00  4 43  0  0  0   4 43  0  0  0 3602261
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

                SNES     1              1          812     0
    Distributed Mesh     1              1       316976     0
              Vector    87             87     14789660     0
      Vector Scatter    22             22        13992     0
           Index Set    76             76       565644     0
   IS L to G Mapping    11             11       160868     0
              Matrix    32             32    134526420     0
  Matrix FD Coloring    10             10     69912560     0
              Viewer     1              0            0     0
       Krylov Solver     2              2         1436     0
      Preconditioner     2              2         1100     0
========================================================================================================================
Average time to get PetscTime(): 3.38554e-06
Average time for MPI_Barrier(): 4.19617e-06
Average time for zero size MPI_Send(): 1.40555e-05
#PETSc Option Table entries:
-log_summary
-snes_monitor
#End of PETSc Option Table entries
Compiled with FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4 sizeof(PetscScalar) 8
Configure run at: Sat Nov 26 18:15:55 2011
Configure options: --known-bits-per-byte=8 --known-level1-dcache-assoc=0 --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768 --known-memcmp-ok --known-mpi-long-double=1 --known-mpi-shared-libraries=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=4 --known-sizeof-short=2 --known-sizeof-size_t=4 --known-sizeof-void-p=4 --with-batch=1 --with-blas-lapack-lib=" -L/home/aron/ksl/lapack/pdc/ppc450d/lib -L/home/aron/ksl/cblas/pdc/ppc450d/lib -L/bgsys/ibm_essl/sles10/prod/opt/ibmmath/lib -L/opt/ibmcmp/xlsmp/bg/1.7/lib -L/opt/ibmcmp/xlmass/bg/4.4/bglib -L/opt/ibmcmp/xlf/bg/11.1/bglib -llapack  -lcblas  -lesslbg  -lxlf90_r -lxlopt -lxlsmp -lxl -lxlfmath" --with-cc=mpixlc_r --with-cxx=mpixlcxx_r --with-debugging=0 --with-fc="mpixlf77_r -qnosave" --with-fortran-kernels=1 --with-is-color-value-type=short --with-shared-libraries=0 --with-x=0 -COPTFLAGS="-O3 -qarch=450d -qtune=450 -qmaxmem=-1" -CXXOPTFLAGS="-O3 -qarch=450d -qtune=450 -qmaxmem=-1" -FOPTFLAGS="-O3 -qarch=450d -qtune=450 -qmaxmem=-1" PETSC_ARCH=arch-shaheen
-----------------------------------------
Libraries compiled on Sat Nov 26 18:15:55 2011 on fen1 
Machine characteristics: Linux-2.6.16.60-0.74.7-ppc64-ppc64-with-SuSE-10-ppc
Using PETSc directory: /opt/share/petsc/3.2-p5/bgp
Using PETSc arch: arch-shaheen
-----------------------------------------

Using C compiler: mpixlc_r  -O3 -qarch=450d -qtune=450 -qmaxmem=-1  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpixlf77_r -qnosave  -O3 -qarch=450d -qtune=450 -qmaxmem=-1   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/opt/share/petsc/3.2-p5/bgp/arch-shaheen/include -I/opt/share/petsc/3.2-p5/bgp/include -I/opt/share/petsc/3.2-p5/bgp/include -I/opt/share/petsc/3.2-p5/bgp/arch-shaheen/include -I/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/default/include -I/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/sys/include
-----------------------------------------

Using C linker: mpixlc_r
Using Fortran linker: mpixlf77_r -qnosave
Using libraries: -Wl,-rpath,/opt/share/petsc/3.2-p5/bgp/arch-shaheen/lib -L/opt/share/petsc/3.2-p5/bgp/arch-shaheen/lib -lpetsc -lpthread -L/home/aron/ksl/lapack/pdc/ppc450d/lib -L/home/aron/ksl/cblas/pdc/ppc450d/lib -L/bgsys/ibm_essl/sles10/prod/opt/ibmmath/lib -L/opt/ibmcmp/xlsmp/bg/1.7/lib -L/opt/ibmcmp/xlmass/bg/4.4/bglib -L/opt/ibmcmp/xlf/bg/11.1/bglib -llapack -lcblas -lesslbg -lxlf90_r -lxlopt -lxlsmp -lxl -lxlfmath -L/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/default/lib -L/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/sys/lib -L/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/runtime/SPI -L/opt/ibmcmp/xlsmp/bg/1.7/bglib -L/opt/ibmcmp/vac/bg/9.0/bglib -L/opt/ibmcmp/vacpp/bg/9.0/bglib -L/bgsys/drivers/ppcfloor/gnu-linux/lib/gcc/powerpc-bgp-linux/4.1.2 -L/bgsys/drivers/ppcfloor/gnu-linux/powerpc-bgp-linux/lib -Wl,-rpath,/opt/ibmcmp/lib/bg/bglib -ldl -Wl,-rpath,/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/default/lib -lmpich.cnk -lopa -Wl,-rpath,/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/sys/lib -ldcmf.cnk -ldcmfcoll.cnk -lpthread -Wl,-rpath,/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/runtime/SPI -lSPI.cna -lrt -lxlopt -lxl -lgcc_eh -lxlf90_r -lxlomp_ser -lxlfmath -lm -ldl -lmpich.cnk -lopa -ldcmf.cnk -ldcmfcoll.cnk -lpthread -lSPI.cna -lrt -lxlopt -lxl -lgcc_eh -ldl 
-----------------------------------------

 Finish RST_CO2.
 Finish interpreting the file infile_RSTi.m by RST.
 after mpi_finalize: ierr =  0


More information about the petsc-users mailing list