[petsc-users] Speedup problem when using OpenMP?

Danyang Su danyang.su at gmail.com
Thu Oct 31 18:54:21 CDT 2013


Hi All,

I have a question on the speedup of PETSc when using OpenMP. I can get 
good speedup when using MPI, but no speedup when using OpenMP.
The example is ex2f with m=100 and n=100. The number of available 
processors is 16 (32 threads) and the OS is Windows Server 2012. The log 
files for 4 and 8 processors are attached.

The commands I used to run with 4 processors are as follows:
Run using MPI
mpiexec -n 4 Petsc-windows-ex2f.exe -m 100 -n 100 -log_summary 
log_100x100_mpi_p4.log

Run using OpenMP
Petsc-windows-ex2f.exe -threadcomm_type openmp -threadcomm_nthreads 4 -m 
100 -n 100 -log_summary log_100x100_openmp_p4.log

The PETSc used for this test is PETSc for Windows 
http://www.mic-tc.ch/downloads/PETScForWindows.zip, but I guess this is 
not the problem because the same problem exists when I use PETSc-dev in 
Cygwin. I don't know if this problem exists in Linux, would anybody help 
to test?

Thanks and regards,

Danyang
-------------- next part --------------
!
!  Description: Solves a linear system in parallel with KSP (Fortran code).
!               Also shows how to set a user-defined monitoring routine.
!
!
!/*T
!  Concepts: KSP^basic parallel example
!  Concepts: KSP^setting a user-defined monitoring routine
!  Processors: n
!T*/
!
! -----------------------------------------------------------------------

      program main
      implicit none

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                    Include files
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!
!  This program uses CPP for preprocessing, as indicated by the use of
!  PETSc include files in the directory petsc/include/finclude.  This
!  convention enables use of the CPP preprocessor, which allows the use
!  of the #include statements that define PETSc objects and variables.
!
!  Use of the conventional Fortran include statements is also supported
!  In this case, the PETsc include files are located in the directory
!  petsc/include/foldinclude.
!
!  Since one must be very careful to include each file no more than once
!  in a Fortran routine, application programmers must exlicitly list
!  each file needed for the various PETSc components within their
!  program (unlike the C/C++ interface).
!
!  See the Fortran section of the PETSc users manual for details.
!
!  The following include statements are required for KSP Fortran programs:
!     petscsys.h       - base PETSc routines
!     petscvec.h    - vectors
!     petscmat.h    - matrices
!     petscpc.h     - preconditioners
!     petscksp.h    - Krylov subspace methods
!  Additional include statements may be needed if using additional
!  PETSc routines in a Fortran program, e.g.,
!     petscviewer.h - viewers
!     petscis.h     - index sets
!

#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
#include <finclude/petscmat.h>
#include <finclude/petscpc.h>
#include <finclude/petscksp.h>

!
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                   Variable declarations
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!
!  Variables:
!     ksp     - linear solver context
!     ksp      - Krylov subspace method context
!     pc       - preconditioner context
!     x, b, u  - approx solution, right-hand-side, exact solution vectors
!     A        - matrix that defines linear system
!     its      - iterations for convergence
!     norm     - norm of error in solution
!     rctx     - random number generator context
!
!  Note that vectors are declared as PETSc "Vec" objects.  These vectors
!  are mathematical objects that contain more than just an array of
!  double precision numbers. I.e., vectors in PETSc are not just
!        double precision x(*).
!  However, local vector data can be easily accessed via VecGetArray().
!  See the Fortran section of the PETSc users manual for details.
!
      double precision  norm
      PetscInt  i,j,II,JJ,m,n,its
      PetscInt  Istart,Iend,ione
      PetscErrorCode ierr
      PetscMPIInt     rank,size
      PetscBool   flg
      PetscScalar v,one,neg_one
      Vec         x,b,u
      Mat         A
      KSP         ksp
      PetscRandom rctx

!  These variables are not currently used.
!      PC          pc
!      PCType      ptype
!      double precision tol


!  Note: Any user-defined Fortran routines (such as MyKSPMonitor)
!  MUST be declared as external.

      external MyKSPMonitor,MyKSPConverged

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                 Beginning of program
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      call PetscInitialize(Petsc_Null_Character,ierr)
      m = 3
      n = 3
      one  = 1.0
      neg_one = -1.0
      ione    = 1
      call PetscOptionsGetInt(Petsc_Null_Character,'-m',m,flg,ierr)
      call PetscOptionsGetInt(Petsc_Null_Character,'-n',n,flg,ierr)
      call MPI_Comm_rank(Petsc_Comm_World,rank,ierr)
      call MPI_Comm_size(Petsc_Comm_World,size,ierr)

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!      Compute the matrix and right-hand-side vector that define
!      the linear system, Ax = b.
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

!  Create parallel matrix, specifying only its global dimensions.
!  When using MatCreate(), the matrix format can be specified at
!  runtime. Also, the parallel partitioning of the matrix is
!  determined by PETSc at runtime.

      call MatCreate(Petsc_Comm_World,A,ierr)
      call MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,m*n,m*n,ierr)
      call MatSetFromOptions(A,ierr)
      call MatSetUp(A,ierr)

!  Currently, all PETSc parallel matrix formats are partitioned by
!  contiguous chunks of rows across the processors.  Determine which
!  rows of the matrix are locally owned.

      call MatGetOwnershipRange(A,Istart,Iend,ierr)

!  Set matrix elements for the 2-D, five-point stencil in parallel.
!   - Each processor needs to insert only elements that it owns
!     locally (but any non-local elements will be sent to the
!     appropriate processor during matrix assembly).
!   - Always specify global row and columns of matrix entries.
!   - Note that MatSetValues() uses 0-based row and column numbers
!     in Fortran as well as in C.

!     Note: this uses the less common natural ordering that orders first
!     all the unknowns for x = h then for x = 2h etc; Hence you see JH = II +- n
!     instead of JJ = II +- m as you might expect. The more standard ordering
!     would first do all variables for y = h, then y = 2h etc.

      do 10, II=Istart,Iend-1
        v = -1.0
        i = II/n
        j = II - i*n
        if (i.gt.0) then
          JJ = II - n
          call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
        endif
        if (i.lt.m-1) then
          JJ = II + n
          call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
        endif
        if (j.gt.0) then
          JJ = II - 1
          call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
        endif
        if (j.lt.n-1) then
          JJ = II + 1
          call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
        endif
        v = 4.0
        call  MatSetValues(A,ione,II,ione,II,v,INSERT_VALUES,ierr)
 10   continue

!  Assemble matrix, using the 2-step process:
!       MatAssemblyBegin(), MatAssemblyEnd()
!  Computations can be done while messages are in transition,
!  by placing code between these two statements.

      call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr)
      call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr)

!  Create parallel vectors.
!   - Here, the parallel partitioning of the vector is determined by
!     PETSc at runtime.  We could also specify the local dimensions
!     if desired -- or use the more general routine VecCreate().
!   - When solving a linear system, the vectors and matrices MUST
!     be partitioned accordingly.  PETSc automatically generates
!     appropriately partitioned matrices and vectors when MatCreate()
!     and VecCreate() are used with the same communicator.
!   - Note: We form 1 vector from scratch and then duplicate as needed.

      call VecCreateMPI(Petsc_Comm_World,PETSC_DECIDE,m*n,u,ierr)
      call VecSetFromOptions(u,ierr)
      call VecDuplicate(u,b,ierr)
      call VecDuplicate(b,x,ierr)

!  Set exact solution; then compute right-hand-side vector.
!  By default we use an exact solution of a vector with all
!  elements of 1.0;  Alternatively, using the runtime option
!  -random_sol forms a solution vector with random components.

      call PetscOptionsHasName(Petsc_Null_Character,                    &
     &             "-random_exact_sol",flg,ierr)
      if (flg) then
         call PetscRandomCreate(Petsc_Comm_World,rctx,ierr)
         call PetscRandomSetFromOptions(rctx,ierr)
         call VecSetRandom(u,rctx,ierr)
         call PetscRandomDestroy(rctx,ierr)
      else
         call VecSet(u,one,ierr)
      endif
      call MatMult(A,u,b,ierr)

!  View the exact solution vector if desired

      call PetscOptionsHasName(Petsc_Null_Character,                    &
     &             "-view_exact_sol",flg,ierr)
      if (flg) then
         call VecView(u,PETSC_VIEWER_STDOUT_WORLD,ierr)
      endif

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!         Create the linear solver and set various options
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

!  Create linear solver context

      call KSPCreate(Petsc_Comm_World,ksp,ierr)

!  Set operators. Here the matrix that defines the linear system
!  also serves as the preconditioning matrix.

      call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)

!  Set linear solver defaults for this problem (optional).
!   - By extracting the KSP and PC contexts from the KSP context,
!     we can then directly directly call any KSP and PC routines
!     to set various options.
!   - The following four statements are optional; all of these
!     parameters could alternatively be specified at runtime via
!     KSPSetFromOptions(). All of these defaults can be
!     overridden at runtime, as indicated below.

!     We comment out this section of code since the Jacobi
!     preconditioner is not a good general default.

!      call KSPGetPC(ksp,pc,ierr)
!      ptype = PCJACOBI
!      call PCSetType(pc,ptype,ierr)
!      tol = 1.e-7
!      call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_DOUBLE_PRECISION,
!     &     PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr)

!  Set user-defined monitoring routine if desired

      call PetscOptionsHasName(Petsc_Null_Character,'-my_ksp_monitor',  &
     &                    flg,ierr)
      if (flg) then
        call KSPMonitorSet(ksp,MyKSPMonitor,PETSC_NULL_OBJECT,          &
     &                     PETSC_NULL_FUNCTION,ierr)
      endif


!  Set runtime options, e.g.,
!      -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol>
!  These options will override those specified above as long as
!  KSPSetFromOptions() is called _after_ any other customization
!  routines.

      call KSPSetFromOptions(ksp,ierr)

!  Set convergence test routine if desired

      call PetscOptionsHasName(Petsc_Null_Character,                    &
     &     '-my_ksp_convergence',flg,ierr)
      if (flg) then
        call KSPSetConvergenceTest(ksp,MyKSPConverged,                  &
     &          PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr)
      endif
!
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                      Solve the linear system
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      call KSPSolve(ksp,b,x,ierr)

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                     Check solution and clean up
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

!  Check the error
      call VecAXPY(x,neg_one,u,ierr)
      call VecNorm(x,NORM_2,norm,ierr)
      call KSPGetIterationNumber(ksp,its,ierr)
      if (rank .eq. 0) then
        if (norm .gt. 1.e-12) then
           write(6,100) norm,its
        else
           write(6,110) its
        endif
      endif
  100 format('Norm of error ',e11.4,' iterations ',i5)
  110 format('Norm of error < 1.e-12,iterations ',i5)

!  Free work space.  All PETSc objects should be destroyed when they
!  are no longer needed.

      call KSPDestroy(ksp,ierr)
      call VecDestroy(u,ierr)
      call VecDestroy(x,ierr)
      call VecDestroy(b,ierr)
      call MatDestroy(A,ierr)

!  Always call PetscFinalize() before exiting a program.  This routine
!    - finalizes the PETSc libraries as well as MPI
!    - provides summary and diagnostic information if certain runtime
!      options are chosen (e.g., -log_summary).  See PetscFinalize()
!      manpage for more information.

      call PetscFinalize(ierr)
      end

! --------------------------------------------------------------
!
!  MyKSPMonitor - This is a user-defined routine for monitoring
!  the KSP iterative solvers.
!
!  Input Parameters:
!    ksp   - iterative context
!    n     - iteration number
!    rnorm - 2-norm (preconditioned) residual value (may be estimated)
!    dummy - optional user-defined monitor context (unused here)
!
      subroutine MyKSPMonitor(ksp,n,rnorm,dummy,ierr)

      implicit none

#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
#include <finclude/petscksp.h>

      KSP              ksp
      Vec              x
      PetscErrorCode ierr
      PetscInt n,dummy
      PetscMPIInt rank
      double precision rnorm

!  Build the solution vector

      call KSPBuildSolution(ksp,PETSC_NULL_OBJECT,x,ierr)

!  Write the solution vector and residual norm to stdout
!   - Note that the parallel viewer PETSC_VIEWER_STDOUT_WORLD
!     handles data from multiple processors so that the
!     output is not jumbled.

      call MPI_Comm_rank(Petsc_Comm_World,rank,ierr)
      if (rank .eq. 0) write(6,100) n
      call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)
      if (rank .eq. 0) write(6,200) n,rnorm

 100  format('iteration ',i5,' solution vector:')
 200  format('iteration ',i5,' residual norm ',e11.4)
      ierr = 0
      end

! --------------------------------------------------------------
!
!  MyKSPConverged - This is a user-defined routine for testing
!  convergence of the KSP iterative solvers.
!
!  Input Parameters:
!    ksp   - iterative context
!    n     - iteration number
!    rnorm - 2-norm (preconditioned) residual value (may be estimated)
!    dummy - optional user-defined monitor context (unused here)
!
      subroutine MyKSPConverged(ksp,n,rnorm,flag,dummy,ierr)

      implicit none

#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
#include <finclude/petscksp.h>

      KSP              ksp
      PetscErrorCode ierr
      PetscInt n,dummy
      KSPConvergedReason flag
      double precision rnorm

      if (rnorm .le. .05) then
        flag = 1
      else
        flag = 0
      endif
      ierr = 0

      end



-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

Petsc-windows-ex2f.exe on a arch-mswin-c-opt named STARGAZER2012 with 1 processor, by danyang Thu Oct 31 16:12:46 2013
With 4 threads per MPI_Comm
Using Petsc Release Version 3.4.2, Jul, 02, 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           8.896e-002      1.00000   8.896e-002
Objects:              4.500e+001      1.00000   4.500e+001
Flops:                5.352e+007      1.00000   5.352e+007  5.352e+007
Flops/sec:            6.016e+008      1.00000   6.016e+008  6.016e+008
MPI Messages:         0.000e+000      0.00000   0.000e+000  0.000e+000
MPI Message Lengths:  0.000e+000      0.00000   0.000e+000  0.000e+000
MPI Reductions:       1.410e+002      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 8.8951e-002 100.0%  5.3519e+007 100.0%  0.000e+000   0.0%  0.000e+000        0.0%  1.400e+002  99.3% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult               68 1.0 1.3164e-002 1.0 6.07e+006 1.0 0.0e+000 0.0e+000 0.0e+000 15 11  0  0  0  15 11  0  0  0   461
MatSolve              68 1.0 2.1107e-002 1.0 6.07e+006 1.0 0.0e+000 0.0e+000 0.0e+000 24 11  0  0  0  24 11  0  0  0   287
MatLUFactorNum         1 1.0 1.5468e-003 1.0 1.09e+005 1.0 0.0e+000 0.0e+000 0.0e+000  2  0  0  0  0   2  0  0  0  0    70
MatILUFactorSym        1 1.0 8.8292e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 1.0e+000  1  0  0  0  1   1  0  0  0  1     0
MatAssemblyBegin       1 1.0 1.1378e-006 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 7.5264e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  1  0  0  0  0   1  0  0  0  0     0
MatGetRowIJ            1 1.0 2.8444e-006 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.9911e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 2.0e+000  0  0  0  0  1   0  0  0  0  1     0
VecMDot               65 1.0 1.1031e-002 1.0 1.89e+007 1.0 0.0e+000 0.0e+000 6.5e+001 12 35  0  0 46  12 35  0  0 46  1713
VecNorm               69 1.0 4.4828e-003 1.0 1.38e+006 1.0 0.0e+000 0.0e+000 6.9e+001  5  3  0  0 49   5  3  0  0 49   308
VecScale              68 1.0 1.3096e-003 1.0 6.80e+005 1.0 0.0e+000 0.0e+000 0.0e+000  1  1  0  0  0   1  1  0  0  0   519
VecCopy                3 1.0 5.1769e-005 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
VecSet                 5 1.0 6.5991e-005 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                6 1.0 2.9753e-004 1.0 1.20e+005 1.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0   403
VecMAXPY              68 1.0 1.6623e-002 1.0 2.02e+007 1.0 0.0e+000 0.0e+000 0.0e+000 19 38  0  0  0  19 38  0  0  0  1215
VecNormalize          68 1.0 5.8857e-003 1.0 2.04e+006 1.0 0.0e+000 0.0e+000 6.8e+001  7  4  0  0 48   7  4  0  0 49   347
KSPGMRESOrthog        65 1.0 2.6701e-002 1.0 3.78e+007 1.0 0.0e+000 0.0e+000 6.5e+001 30 71  0  0 46  30 71  0  0 46  1416
KSPSetUp               1 1.0 3.7205e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 7.3189e-002 1.0 5.34e+007 1.0 0.0e+000 0.0e+000 1.4e+002 82100  0  0 96  82100  0  0 97   729
PCSetUp                1 1.0 2.6908e-003 1.0 1.09e+005 1.0 0.0e+000 0.0e+000 3.0e+000  3  0  0  0  2   3  0  0  0  2    40
PCApply               68 1.0 2.1146e-002 1.0 6.07e+006 1.0 0.0e+000 0.0e+000 0.0e+000 24 11  0  0  0  24 11  0  0  0   287
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     2              2      1520524     0
              Vector    37             37      3017128     0
       Krylov Solver     1              1        18360     0
      Preconditioner     1              1          976     0
           Index Set     3              3        42280     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 5.68889e-008
#PETSc Option Table entries:
-log_summary log_100x100_openmp_p4.log
-m 100
-n 100
-threadcomm_nthreads 4
-threadcomm_type openmp
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Oct  2 16:35:54 2013
Configure options: --with-cc="win32fe icl" --with-cxx="win32fe icl" --with-fc="win32fe ifort" --with-blas-lapack-dir=/cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64 --with-mpi-include=/cygdrive/c/MSMPI/Inc -with-mpi-lib="[/cygdrive/C/MSMPI/Lib/amd64/msmpi.lib,/cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib]" --with-openmp --with-shared-libraries --with-debugging=no --useThreads=0
-----------------------------------------
Libraries compiled on Wed Oct  2 16:35:54 2013 on NB-TT-113812 
Machine characteristics: CYGWIN_NT-6.1-WOW64-1.7.25-0.270-5-3-i686-32bit
Using PETSc directory: /cygdrive/d/WorkDir/petsc-3.4.2
Using PETSc arch: arch-mswin-c-opt
-----------------------------------------

Using C compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl  -MT -O3 -QxW -Qopenmp  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort  -MT -O3 -QxW -fpp -Qopenmp  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/c/MSMPI/Inc
-----------------------------------------

Using C linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl
Using Fortran linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort
Using libraries: -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -lpetsc /cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64/mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib /cygdrive/C/MSMPI/Lib/amd64/msmpi.lib /cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib Gdi32.lib User32.lib Advapi32.lib Kernel32.lib Ws2_32.lib 
-----------------------------------------

-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

Petsc-windows-ex2f.exe on a arch-mswin-c-opt named STARGAZER2012 with 1 processor, by danyang Thu Oct 31 16:12:57 2013
With 8 threads per MPI_Comm
Using Petsc Release Version 3.4.2, Jul, 02, 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           8.634e-002      1.00000   8.634e-002
Objects:              4.500e+001      1.00000   4.500e+001
Flops:                5.352e+007      1.00000   5.352e+007  5.352e+007
Flops/sec:            6.198e+008      1.00000   6.198e+008  6.198e+008
MPI Messages:         0.000e+000      0.00000   0.000e+000  0.000e+000
MPI Message Lengths:  0.000e+000      0.00000   0.000e+000  0.000e+000
MPI Reductions:       1.410e+002      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 8.6338e-002 100.0%  5.3519e+007 100.0%  0.000e+000   0.0%  0.000e+000        0.0%  1.400e+002  99.3% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult               68 1.0 1.3230e-002 1.0 6.07e+006 1.0 0.0e+000 0.0e+000 0.0e+000 15 11  0  0  0  15 11  0  0  0   458
MatSolve              68 1.0 2.0949e-002 1.0 6.07e+006 1.0 0.0e+000 0.0e+000 0.0e+000 24 11  0  0  0  24 11  0  0  0   290
MatLUFactorNum         1 1.0 1.5417e-003 1.0 1.09e+005 1.0 0.0e+000 0.0e+000 0.0e+000  2  0  0  0  0   2  0  0  0  0    70
MatILUFactorSym        1 1.0 9.4436e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 1.0e+000  1  0  0  0  1   1  0  0  0  1     0
MatAssemblyBegin       1 1.0 5.6889e-007 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 7.5776e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  1  0  0  0  0   1  0  0  0  0     0
MatGetRowIJ            1 1.0 2.8444e-006 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 1.7579e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 2.0e+000  0  0  0  0  1   0  0  0  0  1     0
VecMDot               65 1.0 1.0993e-002 1.0 1.89e+007 1.0 0.0e+000 0.0e+000 6.5e+001 13 35  0  0 46  13 35  0  0 46  1719
VecNorm               69 1.0 3.6978e-003 1.0 1.38e+006 1.0 0.0e+000 0.0e+000 6.9e+001  4  3  0  0 49   4  3  0  0 49   373
VecScale              68 1.0 1.0667e-003 1.0 6.80e+005 1.0 0.0e+000 0.0e+000 0.0e+000  1  1  0  0  0   1  1  0  0  0   637
VecCopy                3 1.0 5.0631e-005 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
VecSet                 5 1.0 6.2009e-005 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                6 1.0 1.4108e-004 1.0 1.20e+005 1.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0   851
VecMAXPY              68 1.0 1.6730e-002 1.0 2.02e+007 1.0 0.0e+000 0.0e+000 0.0e+000 19 38  0  0  0  19 38  0  0  0  1207
VecNormalize          68 1.0 4.8583e-003 1.0 2.04e+006 1.0 0.0e+000 0.0e+000 6.8e+001  6  4  0  0 48   6  4  0  0 49   420
KSPGMRESOrthog        65 1.0 2.6769e-002 1.0 3.78e+007 1.0 0.0e+000 0.0e+000 6.5e+001 31 71  0  0 46  31 71  0  0 46  1412
KSPSetUp               1 1.0 3.2484e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 7.1967e-002 1.0 5.34e+007 1.0 0.0e+000 0.0e+000 1.4e+002 83100  0  0 96  83100  0  0 97   742
PCSetUp                1 1.0 2.7182e-003 1.0 1.09e+005 1.0 0.0e+000 0.0e+000 3.0e+000  3  0  0  0  2   3  0  0  0  2    40
PCApply               68 1.0 2.0985e-002 1.0 6.07e+006 1.0 0.0e+000 0.0e+000 0.0e+000 24 11  0  0  0  24 11  0  0  0   289
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     2              2      1520524     0
              Vector    37             37      3017128     0
       Krylov Solver     1              1        18360     0
      Preconditioner     1              1          976     0
           Index Set     3              3        42280     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.13778e-007
#PETSc Option Table entries:
-log_summary log_100x100_openmp_p8.log
-m 100
-n 100
-threadcomm_nthreads 8
-threadcomm_type openmp
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Oct  2 16:35:54 2013
Configure options: --with-cc="win32fe icl" --with-cxx="win32fe icl" --with-fc="win32fe ifort" --with-blas-lapack-dir=/cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64 --with-mpi-include=/cygdrive/c/MSMPI/Inc -with-mpi-lib="[/cygdrive/C/MSMPI/Lib/amd64/msmpi.lib,/cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib]" --with-openmp --with-shared-libraries --with-debugging=no --useThreads=0
-----------------------------------------
Libraries compiled on Wed Oct  2 16:35:54 2013 on NB-TT-113812 
Machine characteristics: CYGWIN_NT-6.1-WOW64-1.7.25-0.270-5-3-i686-32bit
Using PETSc directory: /cygdrive/d/WorkDir/petsc-3.4.2
Using PETSc arch: arch-mswin-c-opt
-----------------------------------------

Using C compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl  -MT -O3 -QxW -Qopenmp  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort  -MT -O3 -QxW -fpp -Qopenmp  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/c/MSMPI/Inc
-----------------------------------------

Using C linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl
Using Fortran linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort
Using libraries: -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -lpetsc /cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64/mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib /cygdrive/C/MSMPI/Lib/amd64/msmpi.lib /cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib Gdi32.lib User32.lib Advapi32.lib Kernel32.lib Ws2_32.lib 
-----------------------------------------

-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

Petsc-windows-ex2f.exe on a arch-mswin-c-opt named STARGAZER2012 with 4 processors, by danyang Thu Oct 31 16:08:41 2013
Using Petsc Release Version 3.4.2, Jul, 02, 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           6.401e-002      1.01274   6.350e-002
Objects:              5.600e+001      1.00000   5.600e+001
Flops:                2.031e+007      1.00102   2.030e+007  8.120e+007
Flops/sec:            3.213e+008      1.01377   3.197e+008  1.279e+009
MPI Messages:         2.100e+002      2.00000   1.575e+002  6.300e+002
MPI Message Lengths:  1.656e+005      2.00000   7.886e+002  4.968e+005
MPI Reductions:       2.270e+002      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 6.3490e-002 100.0%  8.1202e+007 100.0%  6.300e+002 100.0%  7.886e+002      100.0%  2.260e+002  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult              103 1.0 1.3790e-002 1.9 2.31e+006 1.0 6.2e+002 8.0e+002 0.0e+000 17 11 98100  0  17 11 98100  0   666
MatSolve             103 1.0 1.1147e-002 1.4 2.27e+006 1.0 0.0e+000 0.0e+000 0.0e+000 16 11  0  0  0  16 11  0  0  0   813
MatLUFactorNum         1 1.0 3.9652e-004 1.0 2.66e+004 1.0 0.0e+000 0.0e+000 0.0e+000  1  0  0  0  0   1  0  0  0  0   269
MatILUFactorSym        1 1.0 2.7420e-004 1.1 0.00e+000 0.0 0.0e+000 0.0e+000 1.0e+000  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 1.1662e-004 1.3 0.00e+000 0.0 0.0e+000 0.0e+000 2.0e+000  0  0  0  0  1   0  0  0  0  1     0
MatAssemblyEnd         1 1.0 1.2538e-003 1.0 0.00e+000 0.0 1.2e+001 2.0e+002 9.0e+000  2  0  2  0  4   2  0  2  0  4     0
MatGetRowIJ            1 1.0 7.3956e-006 2.6 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 7.1680e-005 1.2 0.00e+000 0.0 0.0e+000 0.0e+000 2.0e+000  0  0  0  0  1   0  0  0  0  1     0
VecMDot               99 1.0 1.5917e-002 2.0 7.20e+006 1.0 0.0e+000 0.0e+000 9.9e+001 17 35  0  0 44  17 35  0  0 44  1809
VecNorm              104 1.0 9.6899e-003 4.3 5.20e+005 1.0 0.0e+000 0.0e+000 1.0e+002  8  3  0  0 46   8  3  0  0 46   215
VecScale             103 1.0 4.1813e-004 1.6 2.58e+005 1.0 0.0e+000 0.0e+000 0.0e+000  1  1  0  0  0   1  1  0  0  0  2463
VecCopy                4 1.0 4.2667e-005 1.5 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
VecSet               110 1.0 4.7957e-004 1.5 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  1  0  0  0  0   1  0  0  0  0     0
VecAXPY                8 1.0 9.3008e-003 1.5 4.00e+004 1.0 0.0e+000 0.0e+000 0.0e+000 11  0  0  0  0  11  0  0  0  0    17
VecMAXPY             103 1.0 1.0259e-002 1.6 7.70e+006 1.0 0.0e+000 0.0e+000 0.0e+000 14 38  0  0  0  14 38  0  0  0  3000
VecScatterBegin      103 1.0 6.1099e-004 1.7 0.00e+000 0.0 6.2e+002 8.0e+002 0.0e+000  1  0 98100  0   1  0 98100  0     0
VecScatterEnd        103 1.0 3.1807e-003 7.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  2  0  0  0  0   2  0  0  0  0     0
VecNormalize         103 1.0 9.9965e-003 3.6 7.73e+005 1.0 0.0e+000 0.0e+000 1.0e+002  9  4  0  0 45   9  4  0  0 46   309
KSPGMRESOrthog        99 1.0 2.1869e-002 1.2 1.44e+007 1.0 0.0e+000 0.0e+000 9.9e+001 30 71  0  0 44  30 71  0  0 44  2634
KSPSetUp               2 1.0 1.4677e-004 1.1 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 5.7238e-002 1.0 2.03e+007 1.0 6.1e+002 8.0e+002 2.1e+002 90100 97 99 91  90100 97 99 92  1416
PCSetUp                2 1.0 9.2729e-004 1.0 2.66e+004 1.0 0.0e+000 0.0e+000 5.0e+000  1  0  0  0  2   1  0  0  0  2   115
PCSetUpOnBlocks        1 1.0 7.8507e-004 1.0 2.66e+004 1.0 0.0e+000 0.0e+000 3.0e+000  1  0  0  0  1   1  0  0  0  1   136
PCApply              103 1.0 1.3624e-002 1.4 2.27e+006 1.0 0.0e+000 0.0e+000 0.0e+000 19 11  0  0  0  19 11  0  0  0   665
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     4              4       577836     0
              Vector    41             41       803912     0
      Vector Scatter     1              1         1052     0
           Index Set     5              5        14192     0
       Krylov Solver     2              2        19504     0
      Preconditioner     2              2         1848     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 5.68889e-008
Average time for MPI_Barrier(): 2.38933e-006
Average time for zero size MPI_Send(): 2.13333e-006
#PETSc Option Table entries:
-log_summary log_100x100_mpi_p4.log
-m 100
-n 100
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Oct  2 16:35:54 2013
Configure options: --with-cc="win32fe icl" --with-cxx="win32fe icl" --with-fc="win32fe ifort" --with-blas-lapack-dir=/cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64 --with-mpi-include=/cygdrive/c/MSMPI/Inc -with-mpi-lib="[/cygdrive/C/MSMPI/Lib/amd64/msmpi.lib,/cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib]" --with-openmp --with-shared-libraries --with-debugging=no --useThreads=0
-----------------------------------------
Libraries compiled on Wed Oct  2 16:35:54 2013 on NB-TT-113812 
Machine characteristics: CYGWIN_NT-6.1-WOW64-1.7.25-0.270-5-3-i686-32bit
Using PETSc directory: /cygdrive/d/WorkDir/petsc-3.4.2
Using PETSc arch: arch-mswin-c-opt
-----------------------------------------

Using C compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl  -MT -O3 -QxW -Qopenmp  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort  -MT -O3 -QxW -fpp -Qopenmp  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/c/MSMPI/Inc
-----------------------------------------

Using C linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl
Using Fortran linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort
Using libraries: -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -lpetsc /cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64/mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib /cygdrive/C/MSMPI/Lib/amd64/msmpi.lib /cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib Gdi32.lib User32.lib Advapi32.lib Kernel32.lib Ws2_32.lib 
-----------------------------------------

-------------- next part --------------
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

Petsc-windows-ex2f.exe on a arch-mswin-c-opt named STARGAZER2012 with 8 processors, by danyang Thu Oct 31 16:08:44 2013
Using Petsc Release Version 3.4.2, Jul, 02, 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.930e-002      1.03041   2.877e-002
Objects:              5.600e+001      1.00000   5.600e+001
Flops:                1.090e+007      1.00204   1.089e+007  8.715e+007
Flops/sec:            3.833e+008      1.03251   3.787e+008  3.030e+009
MPI Messages:         2.260e+002      2.00000   1.978e+002  1.582e+003
MPI Message Lengths:  1.784e+005      2.00000   7.894e+002  1.249e+006
MPI Reductions:       2.430e+002      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 2.8748e-002  99.9%  8.7151e+007 100.0%  1.582e+003 100.0%  7.894e+002      100.0%  2.420e+002  99.6% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

MatMult              111 1.0 4.7565e-003 1.1 1.24e+006 1.0 1.6e+003 8.0e+002 0.0e+000 16 11 98100  0  16 11 98100  0  2082
MatSolve             111 1.0 4.3031e-003 1.0 1.20e+006 1.0 0.0e+000 0.0e+000 0.0e+000 15 11  0  0  0  15 11  0  0  0  2228
MatLUFactorNum         1 1.0 2.0708e-004 1.1 1.30e+004 1.0 0.0e+000 0.0e+000 0.0e+000  1  0  0  0  0   1  0  0  0  0   501
MatILUFactorSym        1 1.0 1.5815e-004 1.1 0.00e+000 0.0 0.0e+000 0.0e+000 1.0e+000  1  0  0  0  0   1  0  0  0  0     0
MatAssemblyBegin       1 1.0 1.4336e-004 1.3 0.00e+000 0.0 0.0e+000 0.0e+000 2.0e+000  0  0  0  0  1   0  0  0  0  1     0
MatAssemblyEnd         1 1.0 1.7192e-003 1.0 0.00e+000 0.0 2.8e+001 2.0e+002 9.0e+000  6  0  2  0  4   6  0  2  0  4     0
MatGetRowIJ            1 1.0 4.5511e-006 2.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 4.7787e-005 1.1 0.00e+000 0.0 0.0e+000 0.0e+000 2.0e+000  0  0  0  0  1   0  0  0  0  1     0
VecMDot              107 1.0 4.4311e-003 1.2 3.87e+006 1.0 0.0e+000 0.0e+000 1.1e+002 14 36  0  0 44  14 36  0  0 44  6984
VecNorm              112 1.0 3.2262e-003 1.0 2.80e+005 1.0 0.0e+000 0.0e+000 1.1e+002 11  3  0  0 46  11  3  0  0 46   694
VecScale             111 1.0 2.1390e-004 1.2 1.39e+005 1.0 0.0e+000 0.0e+000 0.0e+000  1  1  0  0  0   1  1  0  0  0  5189
VecCopy                4 1.0 4.4942e-005 3.2 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
VecSet               118 1.0 2.0594e-004 1.2 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  1  0  0  0  0   1  0  0  0  0     0
VecAXPY                8 1.0 6.4853e-005 1.8 2.00e+004 1.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0  2467
VecMAXPY             111 1.0 3.4446e-003 1.0 4.14e+006 1.0 0.0e+000 0.0e+000 0.0e+000 12 38  0  0  0  12 38  0  0  0  9609
VecScatterBegin      111 1.0 5.5694e-004 1.7 0.00e+000 0.0 1.6e+003 8.0e+002 0.0e+000  2  0 98100  0   2  0 98100  0     0
VecScatterEnd        111 1.0 5.4556e-004 1.7 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  2  0  0  0  0   2  0  0  0  0     0
VecNormalize         111 1.0 3.5487e-003 1.0 4.16e+005 1.0 0.0e+000 0.0e+000 1.1e+002 12  4  0  0 46  12  4  0  0 46   938
KSPGMRESOrthog       107 1.0 7.7466e-003 1.1 7.74e+006 1.0 0.0e+000 0.0e+000 1.1e+002 26 71  0  0 44  26 71  0  0 44  7992
KSPSetUp               2 1.0 1.7180e-004 1.6 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 2.3104e-002 1.0 1.09e+007 1.0 1.5e+003 8.0e+002 2.2e+002 80100 97 99 92  80100 97 99 92  3766
PCSetUp                2 1.0 6.2976e-004 1.1 1.30e+004 1.0 0.0e+000 0.0e+000 5.0e+000  2  0  0  0  2   2  0  0  0  2   165
PCSetUpOnBlocks        1 1.0 4.5852e-004 1.1 1.30e+004 1.0 0.0e+000 0.0e+000 3.0e+000  2  0  0  0  1   2  0  0  0  1   226
PCApply              111 1.0 5.9147e-003 1.0 1.20e+006 1.0 0.0e+000 0.0e+000 0.0e+000 21 11  0  0  0  21 11  0  0  0  1621
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     4              4       293124     0
              Vector    41             41       433912     0
      Vector Scatter     1              1         1052     0
           Index Set     5              5         9192     0
       Krylov Solver     2              2        19504     0
      Preconditioner     2              2         1848     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 5.68889e-008
Average time for MPI_Barrier(): 5.00622e-006
Average time for zero size MPI_Send(): 2.27556e-006
#PETSc Option Table entries:
-log_summary log_100x100_mpi_p8.log
-m 100
-n 100
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Oct  2 16:35:54 2013
Configure options: --with-cc="win32fe icl" --with-cxx="win32fe icl" --with-fc="win32fe ifort" --with-blas-lapack-dir=/cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64 --with-mpi-include=/cygdrive/c/MSMPI/Inc -with-mpi-lib="[/cygdrive/C/MSMPI/Lib/amd64/msmpi.lib,/cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib]" --with-openmp --with-shared-libraries --with-debugging=no --useThreads=0
-----------------------------------------
Libraries compiled on Wed Oct  2 16:35:54 2013 on NB-TT-113812 
Machine characteristics: CYGWIN_NT-6.1-WOW64-1.7.25-0.270-5-3-i686-32bit
Using PETSc directory: /cygdrive/d/WorkDir/petsc-3.4.2
Using PETSc arch: arch-mswin-c-opt
-----------------------------------------

Using C compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl  -MT -O3 -QxW -Qopenmp  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort  -MT -O3 -QxW -fpp -Qopenmp  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/c/MSMPI/Inc
-----------------------------------------

Using C linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl
Using Fortran linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort
Using libraries: -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -lpetsc /cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64/mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib /cygdrive/C/MSMPI/Lib/amd64/msmpi.lib /cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib Gdi32.lib User32.lib Advapi32.lib Kernel32.lib Ws2_32.lib 
-----------------------------------------



More information about the petsc-users mailing list