[petsc-users] Generation, refinement of the mesh (Sieve mesh) isvery slow!

fdkong fd.kong at siat.ac.cn
Mon Mar 28 01:33:42 CDT 2011

Thank you very much for your reply. The -log_summary is shown as follows:

***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

./linearElasticity on a linux-gnu named c0409 with 64 processors, by fdkong Sat Mar 26 12:44:53 2011
Using Petsc Release Version 3.1.0, Patch 7, Mon Dec 20 14:26:37 CST 2010

                         Max       Max/Min        Avg      Total 
Time (sec):           3.459e+02      1.00012   3.459e+02
Objects:              2.120e+02      1.00000   2.120e+02
Flops:                3.205e+08      1.40451   2.697e+08  1.726e+10
Flops/sec:            9.268e+05      1.40451   7.799e+05  4.991e+07
Memory:               3.065e+06      1.25099              1.721e+08
MPI Messages:         1.182e+04      3.60692   8.252e+03  5.281e+05
MPI Message Lengths:  1.066e+07      5.78633   4.086e+02  2.158e+08
MPI Reductions:       4.365e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 3.4587e+02 100.0%  1.7263e+10 100.0%  5.281e+05 100.0%  4.086e+02      100.0%  4.278e+03  98.0% 

See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)

      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run config/configure.py        #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #

Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s

--- Event Stage 0: Main Stage

VecMDot             3195 1.0 8.1583e-01 2.7 2.23e+07 1.4 0.0e+00 0.0e+00 3.6e+02  0  7  0  0  8   0  7  0  0  8  1480
VecNorm             4335 1.0 1.2828e+00 2.0 1.13e+07 1.4 0.0e+00 0.0e+00 7.8e+02  0  4  0  0 18   0  4  0  0 18   475
VecScale            4192 1.0 2.3778e-02 1.3 5.47e+06 1.4 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 12415
VecCopy              995 1.0 5.0113e-03 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              2986 1.0 6.1189e-03 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY             1140 1.0 8.8003e-03 1.3 2.78e+06 1.3 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 17310
VecAYPX               71 1.0 8.8720e-04 1.7 8.09e+04 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  5286
VecWAXPY               2 1.0 2.3029e-05 1.7 2.28e+03 1.2 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  5736
VecMAXPY            4192 1.0 5.6833e-02 1.4 3.08e+07 1.4 0.0e+00 0.0e+00 0.0e+00  0 10  0  0  0   0 10  0  0  0 29299
VecAssemblyBegin       3 1.0 5.8301e-03 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 9.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         3 1.0 1.3023e-05 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin     2279 1.0 5.6063e-02 3.6 0.00e+00 0.0 5.1e+05 3.7e+02 0.0e+00  0  0 96 88  0   0  0 96 88  0     0
VecScatterEnd       2279 1.0 4.8437e-0122.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecNormalize        4118 1.0 8.3514e-01 1.9 1.61e+07 1.4 0.0e+00 0.0e+00 5.7e+02  0  5  0  0 13   0  5  0  0 13  1043
MatMult             3482 1.0 1.0133e+00 2.1 1.18e+08 1.4 2.1e+05 2.8e+02 0.0e+00  0 37 39 27  0   0 37 39 27  0  6271
MatMultAdd            71 1.0 9.5340e-02 3.9 1.04e+06 1.3 2.3e+04 1.8e+02 0.0e+00  0  0  4  2  0   0  0  4  2  0   611
MatMultTranspose     142 1.0 2.2453e-01 1.6 2.09e+06 1.3 4.6e+04 1.8e+02 2.8e+02  0  1  9  4  7   0  1  9  4  7   519
MatSolve            3550 1.0 5.7862e-01 1.4 1.26e+08 1.4 0.0e+00 0.0e+00 0.0e+00  0 39  0  0  0   0 39  0  0  0 11693
MatLUFactorNum         2 1.0 4.7321e-03 1.5 3.25e+05 1.5 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0  3655
MatILUFactorSym        2 1.0 1.1258e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       5 1.0 1.6813e-0120.3 0.00e+00 0.0 1.4e+03 2.3e+03 6.0e+00  0  0  0  2  0   0  0  0  2  0     0
MatAssemblyEnd         5 1.0 2.3137e-02 1.3 0.00e+00 0.0 1.9e+03 6.1e+01 2.8e+01  0  0  0  0  1   0  0  0  0  1     0
MatGetRowIJ            2 1.0 4.9662e-06 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       2 1.0 2.5637e-0132.3 0.00e+00 0.0 3.2e+03 2.1e+03 1.0e+01  0  0  1  3  0   0  0  1  3  0     0
MatGetOrdering         2 1.0 1.2449e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       2 1.0 9.9950e-03 1.1 0.00e+00 0.0 1.3e+03 1.8e+02 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         2 1.0 8.5980e-05 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MeshView               4 1.0 5.9615e+00 1.0 0.00e+00 0.0 1.8e+03 3.1e+03 0.0e+00  2  0  0  3  0   2  0  0  3  0     0
MeshGetGlobalScatter       3 1.0 4.1654e-02 1.2 0.00e+00 0.0 9.7e+02 6.0e+01 1.8e+01  0  0  0  0  0   0  0  0  0  0     0
MeshAssembleMatrix    1606 1.1 6.7121e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MeshUpdateOperator    2168 1.1 2.7389e-01 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  0   0  0  0  0  0     0
SectionRealView        2 1.0 5.9061e-01199.5 0.00e+00 0.0 2.5e+02 4.1e+03 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                5 1.0 2.8859e-01 7.4 3.25e+05 1.5 5.8e+03 1.2e+03 4.6e+01  0  0  1  3  1   0  0  1  3  1    60
PCSetUpOnBlocks      284 1.0 8.3234e-03 1.3 3.25e+05 1.5 0.0e+00 0.0e+00 2.6e+01  0  0  0  0  1   0  0  0  0  1  2078
PCApply               71 1.0 4.8040e+00 1.0 3.13e+08 1.4 4.8e+05 3.8e+02 4.0e+03  1 97 91 84 92   1 97 91 84 94  3503
KSPGMRESOrthog      3195 1.0 8.5857e-01 2.5 4.46e+07 1.4 0.0e+00 0.0e+00 3.6e+02  0 14  0  0  8   0 14  0  0  8  2814
KSPSetup               6 1.0 2.9785e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 5.0004e+00 1.0 3.20e+08 1.4 5.1e+05 3.7e+02 4.2e+03  1100 96 87 95   1100 96 87 97  3449
MeshDestroy            5 1.0 3.1958e-011357.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DistributeMesh         1 1.0 4.5183e+00 1.1 0.00e+00 0.0 5.0e+02 9.5e+03 0.0e+00  1  0  0  2  0   1  0  0  2  0     0
PartitionCreate        2 1.0 3.5427e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PartitionClosure       2 1.0 1.2162e+0011594.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DistributeCoords       2 1.0 8.2849e-01 2.8 0.00e+00 0.0 5.0e+02 3.0e+03 0.0e+00  0  0  0  1  0   0  0  0  1  0     0
DistributeLabels       2 1.0 1.6425e+00 1.0 0.00e+00 0.0 3.8e+02 6.9e+02 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
CreateOverlap          2 1.0 1.2166e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  1  0   0  0  0  1  0     0

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     4              4         1344     0
           Index Set    29             29        89664     0
                 Vec   132            131      1098884     0
         Vec Scatter     8              8         4320     0
              Matrix    13             13      1315884     0
                Mesh     5              5         1680     0
         SectionReal     7              5         1320     0
      Preconditioner     7              7         3132     0
       Krylov Solver     7              7        88364     0
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 0.000211
Average time for zero size MPI_Send(): 1.4998e-05
#PETSc Option Table entries:
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 4 sizeof(PetscScalar) 8
Configure run at: Wed Mar  9 20:22:08 2011
Configure options: --with-clanguage=cxx --with-shared=1 --with-dynamic=1 --download-f-blas-lapack=1 --with-mpi-dir=/bwfs/software/ictce3.2/impi/ --download-boost=1 --download-fiat=1 --download-generator=1 --download-triangle=1 --download-tetgen=1 --download-chaco=1 --download-parmetis=1 --download-zoltan=1 --with-sieve=1 --with-opt-sieve=1 --with-exodusii-dir=/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/exodusii-4.75 --with-netcdf-dir=/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/netcdf-4.1.1
Libraries compiled on Wed Mar  9 20:22:27 CST 2011 on console 
Machine characteristics: Linux console 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux 
Using PETSc directory: /bwfs/home/fdkong/petsc/petsc-3.1-p7
Using PETSc arch: linux-gnu-c-debug
Using C compiler: /bwfs/software/ictce3.2/impi/ -Wall -Wwrite-strings -Wno-strict-aliasing -g   -fPIC   
Using Fortran compiler: /bwfs/software/ictce3.2/impi/ -fPIC -Wall -Wno-unused-variable -g    
Using include paths: -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/linux-gnu-c-debug/include -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/include -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/linux-gnu-c-debug/include -I/export/ictce3.2/impi/ -I/export/ictce3.2/impi/ -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/include/sieve -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/Boost/ -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/exodusii-4.75/include -I/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/netcdf-4.1.1/include -I/bwfs/software/ictce3.2/impi/  
Using C linker: /bwfs/software/ictce3.2/impi/ -Wall -Wwrite-strings -Wno-strict-aliasing -g 
Using Fortran linker: /bwfs/software/ictce3.2/impi/ -fPIC -Wall -Wno-unused-variable -g  
Using libraries: -Wl,-rpath,/bwfs/home/fdkong/petsc/petsc-3.1-p7/linux-gnu-c-debug/lib -L/bwfs/home/fdkong/petsc/petsc-3.1-p7/linux-gnu-c-debug/lib -lpetsc       -Wl,-rpath,/bwfs/home/fdkong/petsc/petsc-3.1-p7/linux-gnu-c-debug/lib -L/bwfs/home/fdkong/petsc/petsc-3.1-p7/linux-gnu-c-debug/lib -lzoltan -ltriangle -lX11 -lchaco -lparmetis -lmetis -ltetgen -lflapack -lfblas -Wl,-rpath,/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/exodusii-4.75/lib -L/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/exodusii-4.75/lib -lexoIIv2for -lexoIIv2c -Wl,-rpath,/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/netcdf-4.1.1/lib -L/bwfs/home/fdkong/petsc/petsc-3.1-p7/externalpackages/netcdf-4.1.1/lib -lnetcdf -Wl,-rpath,/export/ictce3.2/impi/ -L/export/ictce3.2/impi/ -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.1.2/32 -L/usr/lib/gcc/x86_64-redhat-linux/4.1.2/32 -ldl -lmpi -lmpigf -lmpigi -lrt -lpthread -lgcc_s -Wl,-rpath,/bwfs/home/fdkong/petsc/petsc-3.1-p7/-Xlinker -lmpi_dbg -lgfortran -lm -Wl,-rpath,/opt/intel/mpi-rt/3.2 -lm -lmpigc4 -lmpi_dbg -lstdc++ -lmpigc4 -lmpi_dbg -lstdc++ -ldl -lmpi -lmpigf -lmpigi -lrt -lpthread -lgcc_s -ldl  

Fande Kong
ShenZhen Institutes of Advanced Technology
Chinese Academy of Sciences

------------------ Original ------------------
From:  "knepley"<knepley at gmail.com>;
Date:  Mon, Mar 28, 2011 02:19 PM
To:  "PETSc users list"<petsc-users at mcs.anl.gov>; 
Cc:  "fdkong"<fd.kong at siat.ac.cn>; 
Subject:  Re: [petsc-users] Generation, refinement of the mesh (Sieve mesh) isvery slow!

 1) Always send the output of -log_summary when asking a performance question

2) There are implementations that are optimized for different things. Its possible to
„1¤7„1¤7 „1¤7optimize mesh handling for a cells-vertices mesh, but not if you need edges and
 „1¤7„1¤7 „1¤7faces generated.

3) I am out of the country. I can look at the performance when I get back.

„1¤7„1¤7 Matt

On Mon, Mar 28, 2011 at 1:06 AM, fdkong <fd.kong at siat.ac.cn> wrote:
 Hi everyone
 „1¤7„1¤7 I have developed my application based on the sieve mesh object in the Pestc. And now,„1¤7I encountered some „1¤7serious problems.„1¤7
„1¤71. The generation of mesh„1¤7takes a lot of time, run very slowly. The following code is used:
 „1¤7„1¤7„1¤7„1¤7„1¤7 „1¤7double lower[2] = {-1.0, -1.0};
„1¤7„1¤7 „1¤7 „1¤7 „1¤7double upper[2] = {1.0, 1.0};
„1¤7„1¤7 „1¤7 „1¤7 „1¤7int „1¤7 „1¤7edges[2] = {256,256};
„1¤7„1¤7 „1¤7 „1¤7 „1¤7mB = ALE::MeshBuilder<ALE::Mesh>::createSquareBoundary(comm, lower, upper, edges, debug);
 „1¤7„1¤7ALE::ISieveConverter::convertMesh(*mB, *meshBd, renumbering, false);
„1¤7„1¤7 „1¤7 „1¤7ierr = PetscPrintf(PETSC_COMM_WORLD," End build convertMesh „1¤7\n");CHKERRQ(ierr);
„1¤7„1¤7 „1¤7 „1¤7ierr = MeshSetMesh(boundary, meshBd);CHKERRQ(ierr);
 „1¤7„1¤7 „1¤7 „1¤7ierr = PetscPrintf(PETSC_COMM_WORLD," Begin build MeshGenerate „1¤7\n");CHKERRQ(ierr);

„1¤7„1¤7 „1¤7 „1¤7ierr = MeshGenerate(boundary,interpolate, &mesh);CHKERRQ(ierr);

 „1¤7„1¤7 2. The refinement of mesh is also very slow. Th code:
„1¤7„1¤7 „1¤7„1¤7refinementLimit=0.0001;
„1¤7„1¤7 „1¤7if (refinementLimit > 0.0)„1¤7
„1¤7„1¤7 „1¤7{
„1¤7„1¤7 „1¤7 „1¤7Mesh refinedMesh;

 „1¤7„1¤7 „1¤7 „1¤7ierr = MeshRefine(mesh, refinementLimit,interpolate, &refinedMesh);CHKERRQ(ierr);
„1¤7„1¤7 „1¤7 „1¤7ierr = MeshDestroy(mesh);CHKERRQ(ierr);
„1¤7„1¤7 „1¤7 „1¤7mesh = refinedMesh;
„1¤7„1¤7 „1¤7}

 „1¤7„1¤7 „1¤73. The distribution of mesh is also very slow. The code:
„1¤7„1¤7 „1¤7„1¤7if (size > 1)„1¤7
„1¤7„1¤7 „1¤7{
„1¤7„1¤7 „1¤7 „1¤7Mesh parallelMesh;

„1¤7„1¤7 „1¤7 „1¤7//ierr = DistributeMeshnew(mesh, "chao", &parallelMesh);CHKERRQ(ierr);
 „1¤7„1¤7 „1¤7 „1¤7ierr = DistributeMeshnew(mesh, "parmetis", &parallelMesh);CHKERRQ(ierr);
„1¤7„1¤7 „1¤7 „1¤7ierr = MeshDestroy(mesh);CHKERRQ(ierr);
„1¤7„1¤7 „1¤7 „1¤7mesh = parallelMesh;
„1¤7„1¤7 „1¤7}.
 „1¤7„1¤7 „1¤7Does anyone encounter these similar problem? If anyone can help, thank you very much! „1¤7
„1¤7„1¤7 And I wonder to consult which parallel mesh can work „1¤7with Petsc very well, when we develop some complex problem?„1¤7
„1¤7„1¤7 „1¤7„1¤7
 Fande Kong
ShenZhen Institutes of Advanced Technology
Chinese Academy of Sciences


What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
 -- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20110328/75740375/attachment-0001.htm>

More information about the petsc-users mailing list