[petsc-users] Generation, refinement of the mesh (Sieve mesh) isvery slow!
fdkong
fd.kong at siat.ac.cn
Mon Mar 28 01:33:42 CDT 2011
Thank you very much for your reply. The -log_summary is shown as follows:
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./linearElasticity on a linux-gnu named c0409 with 64 processors, by fdkong Sat Mar 26 12:44:53 2011
Using Petsc Release Version 3.1.0, Patch 7, Mon Dec 20 14:26:37 CST 2010
Max Max/Min Avg Total
Time (sec): 3.459e+02 1.00012 3.459e+02
Objects: 2.120e+02 1.00000 2.120e+02
Flops: 3.205e+08 1.40451 2.697e+08 1.726e+10
Flops/sec: 9.268e+05 1.40451 7.799e+05 4.991e+07
Memory: 3.065e+06 1.25099 1.721e+08
MPI Messages: 1.182e+04 3.60692 8.252e+03 5.281e+05
MPI Message Lengths: 1.066e+07 5.78633 4.086e+02 2.158e+08
MPI Reductions: 4.365e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 3.4587e+02 100.0% 1.7263e+10 100.0% 5.281e+05 100.0% 4.086e+02 100.0% 4.278e+03 98.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
##########################################################
# #
# WARNING!!! #
# #
# This code was compiled with a debugging option, #
# To get timing results run config/configure.py #
# using --with-debugging=no, the performance will #
# be generally two or three times faster. #
# #
##########################################################
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMDot 3195 1.0 8.1583e-01 2.7 2.23e+07 1.4 0.0e+00 0.0e+00 3.6e+02 0 7 0 0 8 0 7 0 0 8 1480
VecNorm 4335 1.0 1.2828e+00 2.0 1.13e+07 1.4 0.0e+00 0.0e+00 7.8e+02 0 4 0 0 18 0 4 0 0 18 475
VecScale 4192 1.0 2.3778e-02 1.3 5.47e+06 1.4 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 12415
VecCopy 995 1.0 5.0113e-03 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 2986 1.0 6.1189e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 1140 1.0 8.8003e-03 1.3 2.78e+06 1.3 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 17310
VecAYPX 71 1.0 8.8720e-04 1.7 8.09e+04 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5286
VecWAXPY 2 1.0 2.3029e-05 1.7 2.28e+03 1.2 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 5736
VecMAXPY 4192 1.0 5.6833e-02 1.4 3.08e+07 1.4 0.0e+00 0.0e+00 0.0e+00 0 10 0 0 0 0 10 0 0 0 29299
VecAssemblyBegin 3 1.0 5.8301e-03 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 3 1.0 1.3023e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 2279 1.0 5.6063e-02 3.6 0.00e+00 0.0 5.1e+05 3.7e+02 0.0e+00 0 0 96 88 0 0 0 96 88 0 0
VecScatterEnd 2279 1.0 4.8437e-0122.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecNormalize 4118 1.0 8.3514e-01 1.9 1.61e+07 1.4 0.0e+00 0.0e+00 5.7e+02 0 5 0 0 13 0 5 0 0 13 1043
MatMult 3482 1.0 1.0133e+00 2.1 1.18e+08 1.4 2.1e+05 2.8e+02 0.0e+00 0 37 39 27 0 0 37 39 27 0 6271
MatMultAdd 71 1.0 9.5340e-02 3.9 1.04e+06 1.3 2.3e+04 1.8e+02 0.0e+00 0 0 4 2 0 0 0 4 2 0 611
MatMultTranspose 142 1.0 2.2453e-01 1.6 2.09e+06 1.3 4.6e+04 1.8e+02 2.8e+02 0 1 9 4 7 0 1 9 4 7 519
MatSolve 3550 1.0 5.7862e-01 1.4 1.26e+08 1.4 0.0e+00 0.0e+00 0.0e+00 0 39 0 0 0 0 39 0 0 0 11693
MatLUFactorNum 2 1.0 4.7321e-03 1.5 3.25e+05 1.5 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3655
MatILUFactorSym 2 1.0 1.1258e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 5 1.0 1.6813e-0120.3 0.00e+00 0.0 1.4e+03 2.3e+03 6.0e+00 0 0 0 2 0 0 0 0 2 0 0
MatAssemblyEnd 5 1.0 2.3137e-02 1.3 0.00e+00 0.0 1.9e+03 6.1e+01 2.8e+01 0 0 0 0 1 0 0 0 0 1 0
MatGetRowIJ 2 1.0 4.9662e-06 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 2 1.0 2.5637e-0132.3 0.00e+00 0.0 3.2e+03 2.1e+03 1.0e+01 0 0 1 3 0 0 0 1 3 0 0
MatGetOrdering 2 1.0 1.2449e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 2 1.0 9.9950e-03 1.1 0.00e+00 0.0 1.3e+03 1.8e+02 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 2 1.0 8.5980e-05 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MeshView 4 1.0 5.9615e+00 1.0 0.00e+00 0.0 1.8e+03 3.1e+03 0.0e+00 2 0 0 3 0 2 0 0 3 0 0
MeshGetGlobalScatter 3 1.0 4.1654e-02 1.2 0.00e+00 0.0 9.7e+02 6.0e+01 1.8e+01 0 0 0 0 0 0 0 0 0 0 0
MeshAssembleMatrix 1606 1.1 6.7121e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MeshUpdateOperator 2168 1.1 2.7389e-01 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00 0 0 0 0 0 0 0 0 0 0 0
SectionRealView 2 1.0 5.9061e-01199.5 0.00e+00 0.0 2.5e+02 4.1e+03 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 5 1.0 2.8859e-01 7.4 3.25e+05 1.5 5.8e+03 1.2e+03 4.6e+01 0 0 1 3 1 0 0 1 3 1 60
PCSetUpOnBlocks 284 1.0 8.3234e-03 1.3 3.25e+05 1.5 0.0e+00 0.0e+00 2.6e+01 0 0 0 0 1 0 0 0 0 1 2078
PCApply 71 1.0 4.8040e+00 1.0 3.13e+08 1.4 4.8e+05 3.8e+02 4.0e+03 1 97 91 84 92 1 97 91 84 94 3503
KSPGMRESOrthog 3195 1.0 8.5857e-01 2.5 4.46e+07 1.4 0.0e+00 0.0e+00 3.6e+02 0 14 0 0 8 0 14 0 0 8 2814
KSPSetup 6 1.0 2.9785e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 5.0004e+00 1.0 3.20e+08 1.4 5.1e+05 3.7e+02 4.2e+03 1100 96 87 95 1100 96 87 97 3449
MeshDestroy 5 1.0 3.1958e-011357.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DistributeMesh 1 1.0 4.5183e+00 1.1 0.00e+00 0.0 5.0e+02 9.5e+03 0.0e+00 1 0 0 2 0 1 0 0 2 0 0
PartitionCreate 2 1.0 3.5427e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PartitionClosure 2 1.0 1.2162e+0011594.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
DistributeCoords 2 1.0 8.2849e-01 2.8 0.00e+00 0.0 5.0e+02 3.0e+03 0.0e+00 0 0 0 1 0 0 0 0 1 0 0
DistributeLabels 2 1.0 1.6425e+00 1.0 0.00e+00 0.0 3.8e+02 6.9e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
CreateOverlap 2 1.0 1.2166e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 1 0 0 0 0 1 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Viewer 4 4 1344 0
Index Set 29 29 89664 0
Vec 132 131 1098884 0
Vec Scatter 8 8 4320 0
Matrix 13 13 1315884 0
Mesh 5 5 1680 0
SectionReal 7 5 1320 0
Preconditioner 7 7 3132 0
Krylov Solver 7 7 88364 0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 0.000211
Average time for zero size MPI_Send(): 1.4998e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4 sizeof(PetscScalar) 8
Configure run at: Wed Mar 9 20:22:08 2011
Configure options: --with-clanguage=cxx --with-shared=1 --with-dynamic=1 --download-f-blas-lapack=1 --with-mpi-dir=/bwfs/software/ictce3.2/impi/3.2.0.011 --download-boost=1 --download-fiat=1 --download-generator=1 --download-triangle=1 --download-tetgen=1 --download-chaco=1 --download-parmetis=1 --download-zoltan=1 --with-sieve=1 --with-opt-sieve=1 --with-exodusii-dir=/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/exodusii-4.75 --with-netcdf-dir=/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/netcdf-4.1.1
-----------------------------------------
Libraries compiled on Wed Mar 9 20:22:27 CST 2011 on console
Machine characteristics: Linux console 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
Using PETSc directory: /bwfs/home/fdkong/petsc/petsc-3.1-p7
Using PETSc arch: linux-gnu-c-debug
-----------------------------------------
Using C compiler: /bwfs/software/ictce3.2/impi/3.2.0.011/bin/mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -g -fPIC
Using Fortran compiler: /bwfs/software/ictce3.2/impi/3.2.0.011/bin/mpif90 -fPIC -Wall -Wno-unused-variable -g
-----------------------------------------
Using include paths: -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/linux-gnu-c-debug/include -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/include -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/linux-gnu-c-debug/include -I/export/ictce3.2/impi/3.2.0.011/include/gfortran/4.1.0 -I/export/ictce3.2/impi/3.2.0.011/include -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/include/sieve -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/Boost/ -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/exodusii-4.75/include -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/netcdf-4.1.1/include -I/bwfs/software/ictce3.2/impi/3.2.0.011/include
------------------------------------------
Using C linker: /bwfs/software/ictce3.2/impi/3.2.0.011/bin/mpicxx -Wall -Wwrite-strings -Wno-strict-aliasing -g
Using Fortran linker: /bwfs/software/ictce3.2/impi/3.2.0.011/bin/mpif90 -fPIC -Wall -Wno-unused-variable -g
Using libraries: -Wl,-rpath,/bwfs/home/fdkong/petsc/petsc-3.1-p7/linux-gnu-c-debug/lib -L/bwfs/home/fdkong/petsc/petsc-3.1-p7/linux-gnu-c-debug/lib -lpetsc -Wl,-rpath,/bwfs/home/fdkong/petsc/petsc-3.1-p7/linux-gnu-c-debug/lib -L/bwfs/home/fdkong/petsc/petsc-3.1-p7/linux-gnu-c-debug/lib -lzoltan -ltriangle -lX11 -lchaco -lparmetis -lmetis -ltetgen -lflapack -lfblas -Wl,-rpath,/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/exodusii-4.75/lib -L/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/exodusii-4.75/lib -lexoIIv2for -lexoIIv2c -Wl,-rpath,/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/netcdf-4.1.1/lib -L/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/netcdf-4.1.1/lib -lnetcdf -Wl,-rpath,/export/ictce3.2/impi/3.2.0.011/lib -L/export/ictce3.2/impi/3.2.0.011/lib -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2/32 -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/32 -ldl -lmpi -lmpigf -lmpigi -lrt -lpthread -lgcc_s -Wl,-rpath,/bwfs/home/fdkong/petsc/petsc-3.1-p7/-Xlinker -lmpi_dbg -lgfortran -lm -Wl,-rpath,/opt/intel/mpi-rt/3.2 -lm -lmpigc4 -lmpi_dbg -lstdc++ -lmpigc4 -lmpi_dbg -lstdc++ -ldl -lmpi -lmpigf -lmpigi -lrt -lpthread -lgcc_s -ldl
------------------------------------------
------------------
Fande Kong
ShenZhen Institutes of Advanced Technology
Chinese Academy of Sciences
------------------ Original ------------------
From: "knepley"<knepley at gmail.com>;
Date: Mon, Mar 28, 2011 02:19 PM
To: "PETSc users list"<petsc-users at mcs.anl.gov>;
Cc: "fdkong"<fd.kong at siat.ac.cn>;
Subject: Re: [petsc-users] Generation, refinement of the mesh (Sieve mesh) isvery slow!
1) Always send the output of -log_summary when asking a performance question
2) There are implementations that are optimized for different things. Its possible to
1¤71¤7 1¤7optimize mesh handling for a cells-vertices mesh, but not if you need edges and
1¤71¤7 1¤7faces generated.
3) I am out of the country. I can look at the performance when I get back.
1¤71¤7 Matt
On Mon, Mar 28, 2011 at 1:06 AM, fdkong <fd.kong at siat.ac.cn> wrote:
Hi everyone
1¤71¤7 I have developed my application based on the sieve mesh object in the Pestc. And now,1¤7I encountered some 1¤7serious problems.1¤7
1¤71. The generation of mesh1¤7takes a lot of time, run very slowly. The following code is used:
1¤71¤71¤71¤71¤7 1¤7double lower[2] = {-1.0, -1.0};
1¤71¤7 1¤7 1¤7 1¤7double upper[2] = {1.0, 1.0};
1¤71¤7 1¤7 1¤7 1¤7int 1¤7 1¤7edges[2] = {256,256};
1¤71¤7 1¤7 1¤7 1¤7mB = ALE::MeshBuilder<ALE::Mesh>::createSquareBoundary(comm, lower, upper, edges, debug);
1¤71¤7ALE::ISieveConverter::convertMesh(*mB, *meshBd, renumbering, false);
1¤71¤7 1¤7 1¤7ierr = PetscPrintf(PETSC_COMM_WORLD," End build convertMesh 1¤7\n");CHKERRQ(ierr);
1¤71¤7 1¤7 1¤7ierr = MeshSetMesh(boundary, meshBd);CHKERRQ(ierr);
1¤71¤7 1¤7 1¤7ierr = PetscPrintf(PETSC_COMM_WORLD," Begin build MeshGenerate 1¤7\n");CHKERRQ(ierr);
1¤71¤7 1¤7 1¤7ierr = MeshGenerate(boundary,interpolate, &mesh);CHKERRQ(ierr);
1¤71¤71¤7
1¤71¤7 2. The refinement of mesh is also very slow. Th code:
1¤71¤7 1¤71¤7refinementLimit=0.0001;
1¤71¤7 1¤7if (refinementLimit > 0.0)1¤7
1¤71¤7 1¤7{
1¤71¤7 1¤7 1¤7Mesh refinedMesh;
1¤71¤7 1¤7 1¤7ierr = MeshRefine(mesh, refinementLimit,interpolate, &refinedMesh);CHKERRQ(ierr);
1¤71¤7 1¤7 1¤7ierr = MeshDestroy(mesh);CHKERRQ(ierr);
1¤71¤7 1¤7 1¤7mesh = refinedMesh;
1¤71¤7 1¤7}
1¤71¤7 1¤73. The distribution of mesh is also very slow. The code:
1¤71¤7 1¤71¤7if (size > 1)1¤7
1¤71¤7 1¤7{
1¤71¤7 1¤7 1¤7Mesh parallelMesh;
1¤71¤7 1¤7 1¤7//ierr = DistributeMeshnew(mesh, "chao", ¶llelMesh);CHKERRQ(ierr);
1¤71¤7 1¤7 1¤7ierr = DistributeMeshnew(mesh, "parmetis", ¶llelMesh);CHKERRQ(ierr);
1¤71¤7 1¤7 1¤7ierr = MeshDestroy(mesh);CHKERRQ(ierr);
1¤71¤7 1¤7 1¤7mesh = parallelMesh;
1¤71¤7 1¤7}.
1¤71¤71¤7
1¤71¤7 1¤7Does anyone encounter these similar problem? If anyone can help, thank you very much! 1¤7
1¤71¤7
1¤71¤7 And I wonder to consult which parallel mesh can work 1¤7with Petsc very well, when we develop some complex problem?1¤7
1¤71¤71¤7
1¤71¤7 1¤71¤7
------------------
Fande Kong
ShenZhen Institutes of Advanced Technology
Chinese Academy of Sciences
1¤7
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110328/75740375/attachment-0001.htm>
More information about the petsc-users
mailing list