[petsc-users] MatFDColorCreate takes really big portion of the total time
Bao Kai
paeanball at gmail.com
Wed Jun 27 02:59:57 CDT 2012
Hi, Barry,
I changed the extension of the file and clean a little bit to attach it.
It is a parallel code.
Best Regards,
Kai
Parallel or sequential?
Barry
Please send entire -log_summary output
On Jun 26, 2012, at 2:22 PM, Bao Kai wrote:
> Hi, all,
>
> I use the SNES in petsc-3.2 to solve my problem. The problem is a 3-D
> finite difference problem with structured grid.
>
> I use MatFDColorCreate to generate the Jacobian matrix. I just found
> that when the size of problem is big, MatFDColorCreate takes really
> long time. The following results is the summary with size of the mesh
> to be 1000^3. 90% of the time is costed in MatFDColorCreate.
>
> 237 MatGetOrdering 1 1.0 1.3502e-03 1.1 0.00e+00 0.0
> 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
> 238 MatZeroEntries 39 1.0 9.2822e-02 1.2 0.00e+00 0.0
> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> 239 MatFDColorCreate 10 1.0 3.2863e+03 1.0 0.00e+00 0.0
> 1.6e+07 5.0e+02 6.8e+02 90 0 0 0 3 90 0 0 0 3 0
> 240 MatFDColorApply 20 1.0 2.5288e+01 1.0 3.54e+07 1.1
> 4.6e+08 2.0e+03 8.0e+01 1 0 5 5 0 1 0 5 5 0 42708
> 241 MatFDColorFunc 560 1.0 9.5386e+00 1.3 0.00e+00 0.0
> 4.5e+08 2.0e+03 0.0e+00 0 0 5 5 0 0 0 5 5 0 0
>
> And the following the code I use.
>
> 262 call DMGetMatrix(solv%da, MATAIJ, solv%jac,ierr)
> 263 call
> DMGetColoring(solv%da,IS_COLORING_GLOBAL,MATAIJ,iscoloring,ierr)
> 264 call MatFDColoringCreate(solv%jac,iscoloring,matfdcoloring,ierr)
> 265 call
> MatFDColoringSetFunction(matfdcoloring,FormFunction,equ,ierr)
> 266 call MatFDColoringSetFromOptions(matfdcoloring,ierr)
> 267 call SNESSetJacobian(solv%snes, solv%jac,
> solv%jac,SNESDefaultComputeJacobianColor, matfdcoloring, ierr)
> 268 call ISColoringDestroy(iscoloring,ierr)
>
> I am wondering if there is anything I can do to improve this problem.
>
> Thank you very much.
>
> Best Reg
-------------- next part --------------
RST runs on 32768 processors.
Start the simulation of RST_advection on a 3D rectanguluar mesh of size 1000x1000x1000.
.................
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./rst on a arch-shah named ionode101 with 32768 processors, by Unknown Mon May 28 14:40:44 2012
Using Petsc Release Version 3.2.0, Patch 5, Sat Oct 29 13:45:54 CDT 2011
Max Max/Min Avg Total
Time (sec): 3.632e+03 1.00000 3.632e+03
Objects: 2.450e+02 1.00000 2.450e+02
Flops: 3.878e+10 1.11767 3.606e+10 1.182e+15
Flops/sec: 1.068e+07 1.11767 9.927e+06 3.253e+11
MPI Messages: 2.843e+05 3.71427 2.662e+05 8.723e+09
MPI Message Lengths: 5.701e+08 2.19550 1.977e+03 1.725e+13
MPI Reductions: 2.161e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.6323e+03 100.0% 1.1816e+15 100.0% 8.723e+09 100.0% 1.977e+03 100.0% 2.161e+04 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
PetscBarrier 10 1.0 3.0198e-02 5.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESSolve 10 1.0 3.4011e+02 1.0 3.88e+10 1.1 8.7e+09 2.0e+03 2.1e+04 9100100100 96 9100100100 96 3474169
SNESLineSearch 20 1.0 7.0355e-01 1.0 4.06e+07 1.1 3.2e+07 2.0e+03 8.0e+01 0 0 0 0 0 0 0 0 0 0 1759413
SNESFunctionEval 30 1.0 4.8486e-01 1.2 0.00e+00 0.0 2.4e+07 2.0e+03 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
SNESJacobianEval 20 1.0 2.5288e+01 1.0 3.54e+07 1.1 4.6e+08 2.0e+03 8.0e+01 1 0 5 5 0 1 0 5 5 0 42708
VecDot 10272 1.0 2.7129e+01 7.8 6.73e+08 1.1 0.0e+00 0.0e+00 1.0e+04 0 2 0 0 48 0 2 0 0 48 757257
VecDotNorm2 5126 1.0 2.4923e+01 8.2 6.72e+08 1.1 0.0e+00 0.0e+00 5.1e+03 0 2 0 0 24 0 2 0 0 24 822681
VecNorm 5206 1.0 2.7628e+00 2.3 3.41e+08 1.1 0.0e+00 0.0e+00 5.2e+03 0 1 0 0 24 0 1 0 0 24 3768681
VecCopy 640 1.0 1.1987e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 10332 1.0 8.4211e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 540 1.0 1.5848e-01 1.7 3.54e+07 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6814569
VecAXPBYCZ 10252 1.0 1.0187e+01 1.3 1.34e+09 1.1 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 4025498
VecWAXPY 10272 1.0 2.6825e+00 1.2 6.73e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 7651013
VecScatterBegin 10892 1.0 4.1604e+00 2.1 0.00e+00 0.0 8.7e+09 2.0e+03 0.0e+00 0 0100100 0 0 0100100 0 0
VecScatterEnd 10892 1.0 2.1351e+01 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecReduceArith 20 1.0 4.5640e-03 1.2 1.31e+06 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 8764008
VecReduceComm 10 1.0 3.6807e-02 6.2 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 0 0 0 0 0
MatMult 10272 1.0 1.4871e+02 1.3 1.82e+10 1.1 8.2e+09 2.0e+03 0.0e+00 4 47 94 94 0 4 47 94 94 0 3722481
MatSolve 10272 1.0 1.3996e+02 1.1 1.67e+10 1.1 0.0e+00 0.0e+00 0.0e+00 4 43 0 0 0 4 43 0 0 0 3641627
MatLUFactorNum 20 1.0 2.4788e+00 1.1 1.39e+08 1.1 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1707587
MatILUFactorSym 1 1.0 2.2120e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 50 1.0 2.6742e+0015.4 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+01 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 50 1.0 8.5123e-01 1.1 0.00e+00 0.0 1.6e+07 5.0e+02 1.1e+02 0 0 0 0 1 0 0 0 0 1 0
MatGetRowIJ 1 1.0 5.0068e-06 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 20 1.0 7.3212e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.3502e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 39 1.0 9.2822e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorCreate 10 1.0 3.2863e+03 1.0 0.00e+00 0.0 1.6e+07 5.0e+02 6.8e+02 90 0 0 0 3 90 0 0 0 3 0
MatFDColorApply 20 1.0 2.5288e+01 1.0 3.54e+07 1.1 4.6e+08 2.0e+03 8.0e+01 1 0 5 5 0 1 0 5 5 0 42708
MatFDColorFunc 560 1.0 9.5386e+00 1.3 0.00e+00 0.0 4.5e+08 2.0e+03 0.0e+00 0 0 5 5 0 0 0 5 5 0 0
KSPSetup 40 1.0 1.2743e-02 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 20 1.0 3.1394e+02 1.0 3.87e+10 1.1 8.2e+09 2.0e+03 2.1e+04 9100 94 94 95 9100 94 94 95 3756190
PCSetUp 40 1.0 3.2385e+00 1.1 1.39e+08 1.1 0.0e+00 0.0e+00 1.0e+01 0 0 0 0 0 0 0 0 0 0 1307021
PCSetUpOnBlocks 10292 1.0 1.4383e+02 1.1 1.69e+10 1.1 0.0e+00 0.0e+00 3.0e+00 4 43 0 0 0 4 43 0 0 0 3573042
PCApply 10272 1.0 1.4149e+02 1.1 1.67e+10 1.1 0.0e+00 0.0e+00 0.0e+00 4 43 0 0 0 4 43 0 0 0 3602261
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
SNES 1 1 812 0
Distributed Mesh 1 1 316976 0
Vector 87 87 14789660 0
Vector Scatter 22 22 13992 0
Index Set 76 76 565644 0
IS L to G Mapping 11 11 160868 0
Matrix 32 32 134526420 0
Matrix FD Coloring 10 10 69912560 0
Viewer 1 0 0 0
Krylov Solver 2 2 1436 0
Preconditioner 2 2 1100 0
========================================================================================================================
Average time to get PetscTime(): 3.38554e-06
Average time for MPI_Barrier(): 4.19617e-06
Average time for zero size MPI_Send(): 1.40555e-05
#PETSc Option Table entries:
-log_summary
-snes_monitor
#End of PETSc Option Table entries
Compiled with FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4 sizeof(PetscScalar) 8
Configure run at: Sat Nov 26 18:15:55 2011
Configure options: --known-bits-per-byte=8 --known-level1-dcache-assoc=0 --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768 --known-memcmp-ok --known-mpi-long-double=1 --known-mpi-shared-libraries=0 --known-sizeof-MPI_Comm=4 --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8 --known-sizeof-float=4 --known-sizeof-int=4 --known-sizeof-long-long=8 --known-sizeof-long=4 --known-sizeof-short=2 --known-sizeof-size_t=4 --known-sizeof-void-p=4 --with-batch=1 --with-blas-lapack-lib=" -L/home/aron/ksl/lapack/pdc/ppc450d/lib -L/home/aron/ksl/cblas/pdc/ppc450d/lib -L/bgsys/ibm_essl/sles10/prod/opt/ibmmath/lib -L/opt/ibmcmp/xlsmp/bg/1.7/lib -L/opt/ibmcmp/xlmass/bg/4.4/bglib -L/opt/ibmcmp/xlf/bg/11.1/bglib -llapack -lcblas -lesslbg -lxlf90_r -lxlopt -lxlsmp -lxl -lxlfmath" --with-cc=mpixlc_r --with-cxx=mpixlcxx_r --with-debugging=0 --with-fc="mpixlf77_r -qnosave" --with-fortran-kernels=1 --with-is-color-value-type=short --with-shared-libraries=0 --with-x=0 -COPTFLAGS="-O3 -qarch=450d -qtune=450 -qmaxmem=-1" -CXXOPTFLAGS="-O3 -qarch=450d -qtune=450 -qmaxmem=-1" -FOPTFLAGS="-O3 -qarch=450d -qtune=450 -qmaxmem=-1" PETSC_ARCH=arch-shaheen
-----------------------------------------
Libraries compiled on Sat Nov 26 18:15:55 2011 on fen1
Machine characteristics: Linux-2.6.16.60-0.74.7-ppc64-ppc64-with-SuSE-10-ppc
Using PETSc directory: /opt/share/petsc/3.2-p5/bgp
Using PETSc arch: arch-shaheen
-----------------------------------------
Using C compiler: mpixlc_r -O3 -qarch=450d -qtune=450 -qmaxmem=-1 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: mpixlf77_r -qnosave -O3 -qarch=450d -qtune=450 -qmaxmem=-1 ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/opt/share/petsc/3.2-p5/bgp/arch-shaheen/include -I/opt/share/petsc/3.2-p5/bgp/include -I/opt/share/petsc/3.2-p5/bgp/include -I/opt/share/petsc/3.2-p5/bgp/arch-shaheen/include -I/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/default/include -I/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/sys/include
-----------------------------------------
Using C linker: mpixlc_r
Using Fortran linker: mpixlf77_r -qnosave
Using libraries: -Wl,-rpath,/opt/share/petsc/3.2-p5/bgp/arch-shaheen/lib -L/opt/share/petsc/3.2-p5/bgp/arch-shaheen/lib -lpetsc -lpthread -L/home/aron/ksl/lapack/pdc/ppc450d/lib -L/home/aron/ksl/cblas/pdc/ppc450d/lib -L/bgsys/ibm_essl/sles10/prod/opt/ibmmath/lib -L/opt/ibmcmp/xlsmp/bg/1.7/lib -L/opt/ibmcmp/xlmass/bg/4.4/bglib -L/opt/ibmcmp/xlf/bg/11.1/bglib -llapack -lcblas -lesslbg -lxlf90_r -lxlopt -lxlsmp -lxl -lxlfmath -L/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/default/lib -L/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/sys/lib -L/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/runtime/SPI -L/opt/ibmcmp/xlsmp/bg/1.7/bglib -L/opt/ibmcmp/vac/bg/9.0/bglib -L/opt/ibmcmp/vacpp/bg/9.0/bglib -L/bgsys/drivers/ppcfloor/gnu-linux/lib/gcc/powerpc-bgp-linux/4.1.2 -L/bgsys/drivers/ppcfloor/gnu-linux/powerpc-bgp-linux/lib -Wl,-rpath,/opt/ibmcmp/lib/bg/bglib -ldl -Wl,-rpath,/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/default/lib -lmpich.cnk -lopa -Wl,-rpath,/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/comm/sys/lib -ldcmf.cnk -ldcmfcoll.cnk -lpthread -Wl,-rpath,/bgsys/drivers/V1R4M2_200_2010-100508P/ppc/runtime/SPI -lSPI.cna -lrt -lxlopt -lxl -lgcc_eh -lxlf90_r -lxlomp_ser -lxlfmath -lm -ldl -lmpich.cnk -lopa -ldcmf.cnk -ldcmfcoll.cnk -lpthread -lSPI.cna -lrt -lxlopt -lxl -lgcc_eh -ldl
-----------------------------------------
Finish RST_CO2.
Finish interpreting the file infile_RSTi.m by RST.
after mpi_finalize: ierr = 0
More information about the petsc-users
mailing list