[petsc-users] Speedup problem when using OpenMP?
Danyang Su
danyang.su at gmail.com
Thu Oct 31 18:54:21 CDT 2013
Hi All,
I have a question on the speedup of PETSc when using OpenMP. I can get
good speedup when using MPI, but no speedup when using OpenMP.
The example is ex2f with m=100 and n=100. The number of available
processors is 16 (32 threads) and the OS is Windows Server 2012. The log
files for 4 and 8 processors are attached.
The commands I used to run with 4 processors are as follows:
Run using MPI
mpiexec -n 4 Petsc-windows-ex2f.exe -m 100 -n 100 -log_summary
log_100x100_mpi_p4.log
Run using OpenMP
Petsc-windows-ex2f.exe -threadcomm_type openmp -threadcomm_nthreads 4 -m
100 -n 100 -log_summary log_100x100_openmp_p4.log
The PETSc used for this test is PETSc for Windows
http://www.mic-tc.ch/downloads/PETScForWindows.zip, but I guess this is
not the problem because the same problem exists when I use PETSc-dev in
Cygwin. I don't know if this problem exists in Linux, would anybody help
to test?
Thanks and regards,
Danyang
-------------- next part --------------
!
! Description: Solves a linear system in parallel with KSP (Fortran code).
! Also shows how to set a user-defined monitoring routine.
!
!
!/*T
! Concepts: KSP^basic parallel example
! Concepts: KSP^setting a user-defined monitoring routine
! Processors: n
!T*/
!
! -----------------------------------------------------------------------
program main
implicit none
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Include files
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!
! This program uses CPP for preprocessing, as indicated by the use of
! PETSc include files in the directory petsc/include/finclude. This
! convention enables use of the CPP preprocessor, which allows the use
! of the #include statements that define PETSc objects and variables.
!
! Use of the conventional Fortran include statements is also supported
! In this case, the PETsc include files are located in the directory
! petsc/include/foldinclude.
!
! Since one must be very careful to include each file no more than once
! in a Fortran routine, application programmers must exlicitly list
! each file needed for the various PETSc components within their
! program (unlike the C/C++ interface).
!
! See the Fortran section of the PETSc users manual for details.
!
! The following include statements are required for KSP Fortran programs:
! petscsys.h - base PETSc routines
! petscvec.h - vectors
! petscmat.h - matrices
! petscpc.h - preconditioners
! petscksp.h - Krylov subspace methods
! Additional include statements may be needed if using additional
! PETSc routines in a Fortran program, e.g.,
! petscviewer.h - viewers
! petscis.h - index sets
!
#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
#include <finclude/petscmat.h>
#include <finclude/petscpc.h>
#include <finclude/petscksp.h>
!
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Variable declarations
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!
! Variables:
! ksp - linear solver context
! ksp - Krylov subspace method context
! pc - preconditioner context
! x, b, u - approx solution, right-hand-side, exact solution vectors
! A - matrix that defines linear system
! its - iterations for convergence
! norm - norm of error in solution
! rctx - random number generator context
!
! Note that vectors are declared as PETSc "Vec" objects. These vectors
! are mathematical objects that contain more than just an array of
! double precision numbers. I.e., vectors in PETSc are not just
! double precision x(*).
! However, local vector data can be easily accessed via VecGetArray().
! See the Fortran section of the PETSc users manual for details.
!
double precision norm
PetscInt i,j,II,JJ,m,n,its
PetscInt Istart,Iend,ione
PetscErrorCode ierr
PetscMPIInt rank,size
PetscBool flg
PetscScalar v,one,neg_one
Vec x,b,u
Mat A
KSP ksp
PetscRandom rctx
! These variables are not currently used.
! PC pc
! PCType ptype
! double precision tol
! Note: Any user-defined Fortran routines (such as MyKSPMonitor)
! MUST be declared as external.
external MyKSPMonitor,MyKSPConverged
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Beginning of program
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
call PetscInitialize(Petsc_Null_Character,ierr)
m = 3
n = 3
one = 1.0
neg_one = -1.0
ione = 1
call PetscOptionsGetInt(Petsc_Null_Character,'-m',m,flg,ierr)
call PetscOptionsGetInt(Petsc_Null_Character,'-n',n,flg,ierr)
call MPI_Comm_rank(Petsc_Comm_World,rank,ierr)
call MPI_Comm_size(Petsc_Comm_World,size,ierr)
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Compute the matrix and right-hand-side vector that define
! the linear system, Ax = b.
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Create parallel matrix, specifying only its global dimensions.
! When using MatCreate(), the matrix format can be specified at
! runtime. Also, the parallel partitioning of the matrix is
! determined by PETSc at runtime.
call MatCreate(Petsc_Comm_World,A,ierr)
call MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,m*n,m*n,ierr)
call MatSetFromOptions(A,ierr)
call MatSetUp(A,ierr)
! Currently, all PETSc parallel matrix formats are partitioned by
! contiguous chunks of rows across the processors. Determine which
! rows of the matrix are locally owned.
call MatGetOwnershipRange(A,Istart,Iend,ierr)
! Set matrix elements for the 2-D, five-point stencil in parallel.
! - Each processor needs to insert only elements that it owns
! locally (but any non-local elements will be sent to the
! appropriate processor during matrix assembly).
! - Always specify global row and columns of matrix entries.
! - Note that MatSetValues() uses 0-based row and column numbers
! in Fortran as well as in C.
! Note: this uses the less common natural ordering that orders first
! all the unknowns for x = h then for x = 2h etc; Hence you see JH = II +- n
! instead of JJ = II +- m as you might expect. The more standard ordering
! would first do all variables for y = h, then y = 2h etc.
do 10, II=Istart,Iend-1
v = -1.0
i = II/n
j = II - i*n
if (i.gt.0) then
JJ = II - n
call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
endif
if (i.lt.m-1) then
JJ = II + n
call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
endif
if (j.gt.0) then
JJ = II - 1
call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
endif
if (j.lt.n-1) then
JJ = II + 1
call MatSetValues(A,ione,II,ione,JJ,v,INSERT_VALUES,ierr)
endif
v = 4.0
call MatSetValues(A,ione,II,ione,II,v,INSERT_VALUES,ierr)
10 continue
! Assemble matrix, using the 2-step process:
! MatAssemblyBegin(), MatAssemblyEnd()
! Computations can be done while messages are in transition,
! by placing code between these two statements.
call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr)
call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr)
! Create parallel vectors.
! - Here, the parallel partitioning of the vector is determined by
! PETSc at runtime. We could also specify the local dimensions
! if desired -- or use the more general routine VecCreate().
! - When solving a linear system, the vectors and matrices MUST
! be partitioned accordingly. PETSc automatically generates
! appropriately partitioned matrices and vectors when MatCreate()
! and VecCreate() are used with the same communicator.
! - Note: We form 1 vector from scratch and then duplicate as needed.
call VecCreateMPI(Petsc_Comm_World,PETSC_DECIDE,m*n,u,ierr)
call VecSetFromOptions(u,ierr)
call VecDuplicate(u,b,ierr)
call VecDuplicate(b,x,ierr)
! Set exact solution; then compute right-hand-side vector.
! By default we use an exact solution of a vector with all
! elements of 1.0; Alternatively, using the runtime option
! -random_sol forms a solution vector with random components.
call PetscOptionsHasName(Petsc_Null_Character, &
& "-random_exact_sol",flg,ierr)
if (flg) then
call PetscRandomCreate(Petsc_Comm_World,rctx,ierr)
call PetscRandomSetFromOptions(rctx,ierr)
call VecSetRandom(u,rctx,ierr)
call PetscRandomDestroy(rctx,ierr)
else
call VecSet(u,one,ierr)
endif
call MatMult(A,u,b,ierr)
! View the exact solution vector if desired
call PetscOptionsHasName(Petsc_Null_Character, &
& "-view_exact_sol",flg,ierr)
if (flg) then
call VecView(u,PETSC_VIEWER_STDOUT_WORLD,ierr)
endif
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Create the linear solver and set various options
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Create linear solver context
call KSPCreate(Petsc_Comm_World,ksp,ierr)
! Set operators. Here the matrix that defines the linear system
! also serves as the preconditioning matrix.
call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)
! Set linear solver defaults for this problem (optional).
! - By extracting the KSP and PC contexts from the KSP context,
! we can then directly directly call any KSP and PC routines
! to set various options.
! - The following four statements are optional; all of these
! parameters could alternatively be specified at runtime via
! KSPSetFromOptions(). All of these defaults can be
! overridden at runtime, as indicated below.
! We comment out this section of code since the Jacobi
! preconditioner is not a good general default.
! call KSPGetPC(ksp,pc,ierr)
! ptype = PCJACOBI
! call PCSetType(pc,ptype,ierr)
! tol = 1.e-7
! call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_DOUBLE_PRECISION,
! & PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr)
! Set user-defined monitoring routine if desired
call PetscOptionsHasName(Petsc_Null_Character,'-my_ksp_monitor', &
& flg,ierr)
if (flg) then
call KSPMonitorSet(ksp,MyKSPMonitor,PETSC_NULL_OBJECT, &
& PETSC_NULL_FUNCTION,ierr)
endif
! Set runtime options, e.g.,
! -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol>
! These options will override those specified above as long as
! KSPSetFromOptions() is called _after_ any other customization
! routines.
call KSPSetFromOptions(ksp,ierr)
! Set convergence test routine if desired
call PetscOptionsHasName(Petsc_Null_Character, &
& '-my_ksp_convergence',flg,ierr)
if (flg) then
call KSPSetConvergenceTest(ksp,MyKSPConverged, &
& PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr)
endif
!
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Solve the linear system
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
call KSPSolve(ksp,b,x,ierr)
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Check solution and clean up
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
! Check the error
call VecAXPY(x,neg_one,u,ierr)
call VecNorm(x,NORM_2,norm,ierr)
call KSPGetIterationNumber(ksp,its,ierr)
if (rank .eq. 0) then
if (norm .gt. 1.e-12) then
write(6,100) norm,its
else
write(6,110) its
endif
endif
100 format('Norm of error ',e11.4,' iterations ',i5)
110 format('Norm of error < 1.e-12,iterations ',i5)
! Free work space. All PETSc objects should be destroyed when they
! are no longer needed.
call KSPDestroy(ksp,ierr)
call VecDestroy(u,ierr)
call VecDestroy(x,ierr)
call VecDestroy(b,ierr)
call MatDestroy(A,ierr)
! Always call PetscFinalize() before exiting a program. This routine
! - finalizes the PETSc libraries as well as MPI
! - provides summary and diagnostic information if certain runtime
! options are chosen (e.g., -log_summary). See PetscFinalize()
! manpage for more information.
call PetscFinalize(ierr)
end
! --------------------------------------------------------------
!
! MyKSPMonitor - This is a user-defined routine for monitoring
! the KSP iterative solvers.
!
! Input Parameters:
! ksp - iterative context
! n - iteration number
! rnorm - 2-norm (preconditioned) residual value (may be estimated)
! dummy - optional user-defined monitor context (unused here)
!
subroutine MyKSPMonitor(ksp,n,rnorm,dummy,ierr)
implicit none
#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
#include <finclude/petscksp.h>
KSP ksp
Vec x
PetscErrorCode ierr
PetscInt n,dummy
PetscMPIInt rank
double precision rnorm
! Build the solution vector
call KSPBuildSolution(ksp,PETSC_NULL_OBJECT,x,ierr)
! Write the solution vector and residual norm to stdout
! - Note that the parallel viewer PETSC_VIEWER_STDOUT_WORLD
! handles data from multiple processors so that the
! output is not jumbled.
call MPI_Comm_rank(Petsc_Comm_World,rank,ierr)
if (rank .eq. 0) write(6,100) n
call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)
if (rank .eq. 0) write(6,200) n,rnorm
100 format('iteration ',i5,' solution vector:')
200 format('iteration ',i5,' residual norm ',e11.4)
ierr = 0
end
! --------------------------------------------------------------
!
! MyKSPConverged - This is a user-defined routine for testing
! convergence of the KSP iterative solvers.
!
! Input Parameters:
! ksp - iterative context
! n - iteration number
! rnorm - 2-norm (preconditioned) residual value (may be estimated)
! dummy - optional user-defined monitor context (unused here)
!
subroutine MyKSPConverged(ksp,n,rnorm,flag,dummy,ierr)
implicit none
#include <finclude/petscsys.h>
#include <finclude/petscvec.h>
#include <finclude/petscksp.h>
KSP ksp
PetscErrorCode ierr
PetscInt n,dummy
KSPConvergedReason flag
double precision rnorm
if (rnorm .le. .05) then
flag = 1
else
flag = 0
endif
ierr = 0
end
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
Petsc-windows-ex2f.exe on a arch-mswin-c-opt named STARGAZER2012 with 1 processor, by danyang Thu Oct 31 16:12:46 2013
With 4 threads per MPI_Comm
Using Petsc Release Version 3.4.2, Jul, 02, 2013
Max Max/Min Avg Total
Time (sec): 8.896e-002 1.00000 8.896e-002
Objects: 4.500e+001 1.00000 4.500e+001
Flops: 5.352e+007 1.00000 5.352e+007 5.352e+007
Flops/sec: 6.016e+008 1.00000 6.016e+008 6.016e+008
MPI Messages: 0.000e+000 0.00000 0.000e+000 0.000e+000
MPI Message Lengths: 0.000e+000 0.00000 0.000e+000 0.000e+000
MPI Reductions: 1.410e+002 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 8.8951e-002 100.0% 5.3519e+007 100.0% 0.000e+000 0.0% 0.000e+000 0.0% 1.400e+002 99.3%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 68 1.0 1.3164e-002 1.0 6.07e+006 1.0 0.0e+000 0.0e+000 0.0e+000 15 11 0 0 0 15 11 0 0 0 461
MatSolve 68 1.0 2.1107e-002 1.0 6.07e+006 1.0 0.0e+000 0.0e+000 0.0e+000 24 11 0 0 0 24 11 0 0 0 287
MatLUFactorNum 1 1.0 1.5468e-003 1.0 1.09e+005 1.0 0.0e+000 0.0e+000 0.0e+000 2 0 0 0 0 2 0 0 0 0 70
MatILUFactorSym 1 1.0 8.8292e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 1.0e+000 1 0 0 0 1 1 0 0 0 1 0
MatAssemblyBegin 1 1.0 1.1378e-006 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 7.5264e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 1 0 0 0 0 1 0 0 0 0 0
MatGetRowIJ 1 1.0 2.8444e-006 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.9911e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 2.0e+000 0 0 0 0 1 0 0 0 0 1 0
VecMDot 65 1.0 1.1031e-002 1.0 1.89e+007 1.0 0.0e+000 0.0e+000 6.5e+001 12 35 0 0 46 12 35 0 0 46 1713
VecNorm 69 1.0 4.4828e-003 1.0 1.38e+006 1.0 0.0e+000 0.0e+000 6.9e+001 5 3 0 0 49 5 3 0 0 49 308
VecScale 68 1.0 1.3096e-003 1.0 6.80e+005 1.0 0.0e+000 0.0e+000 0.0e+000 1 1 0 0 0 1 1 0 0 0 519
VecCopy 3 1.0 5.1769e-005 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
VecSet 5 1.0 6.5991e-005 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 6 1.0 2.9753e-004 1.0 1.20e+005 1.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 403
VecMAXPY 68 1.0 1.6623e-002 1.0 2.02e+007 1.0 0.0e+000 0.0e+000 0.0e+000 19 38 0 0 0 19 38 0 0 0 1215
VecNormalize 68 1.0 5.8857e-003 1.0 2.04e+006 1.0 0.0e+000 0.0e+000 6.8e+001 7 4 0 0 48 7 4 0 0 49 347
KSPGMRESOrthog 65 1.0 2.6701e-002 1.0 3.78e+007 1.0 0.0e+000 0.0e+000 6.5e+001 30 71 0 0 46 30 71 0 0 46 1416
KSPSetUp 1 1.0 3.7205e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 7.3189e-002 1.0 5.34e+007 1.0 0.0e+000 0.0e+000 1.4e+002 82100 0 0 96 82100 0 0 97 729
PCSetUp 1 1.0 2.6908e-003 1.0 1.09e+005 1.0 0.0e+000 0.0e+000 3.0e+000 3 0 0 0 2 3 0 0 0 2 40
PCApply 68 1.0 2.1146e-002 1.0 6.07e+006 1.0 0.0e+000 0.0e+000 0.0e+000 24 11 0 0 0 24 11 0 0 0 287
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 2 2 1520524 0
Vector 37 37 3017128 0
Krylov Solver 1 1 18360 0
Preconditioner 1 1 976 0
Index Set 3 3 42280 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 5.68889e-008
#PETSc Option Table entries:
-log_summary log_100x100_openmp_p4.log
-m 100
-n 100
-threadcomm_nthreads 4
-threadcomm_type openmp
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Oct 2 16:35:54 2013
Configure options: --with-cc="win32fe icl" --with-cxx="win32fe icl" --with-fc="win32fe ifort" --with-blas-lapack-dir=/cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64 --with-mpi-include=/cygdrive/c/MSMPI/Inc -with-mpi-lib="[/cygdrive/C/MSMPI/Lib/amd64/msmpi.lib,/cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib]" --with-openmp --with-shared-libraries --with-debugging=no --useThreads=0
-----------------------------------------
Libraries compiled on Wed Oct 2 16:35:54 2013 on NB-TT-113812
Machine characteristics: CYGWIN_NT-6.1-WOW64-1.7.25-0.270-5-3-i686-32bit
Using PETSc directory: /cygdrive/d/WorkDir/petsc-3.4.2
Using PETSc arch: arch-mswin-c-opt
-----------------------------------------
Using C compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl -MT -O3 -QxW -Qopenmp ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort -MT -O3 -QxW -fpp -Qopenmp ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/c/MSMPI/Inc
-----------------------------------------
Using C linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl
Using Fortran linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort
Using libraries: -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -lpetsc /cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64/mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib /cygdrive/C/MSMPI/Lib/amd64/msmpi.lib /cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib Gdi32.lib User32.lib Advapi32.lib Kernel32.lib Ws2_32.lib
-----------------------------------------
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
Petsc-windows-ex2f.exe on a arch-mswin-c-opt named STARGAZER2012 with 1 processor, by danyang Thu Oct 31 16:12:57 2013
With 8 threads per MPI_Comm
Using Petsc Release Version 3.4.2, Jul, 02, 2013
Max Max/Min Avg Total
Time (sec): 8.634e-002 1.00000 8.634e-002
Objects: 4.500e+001 1.00000 4.500e+001
Flops: 5.352e+007 1.00000 5.352e+007 5.352e+007
Flops/sec: 6.198e+008 1.00000 6.198e+008 6.198e+008
MPI Messages: 0.000e+000 0.00000 0.000e+000 0.000e+000
MPI Message Lengths: 0.000e+000 0.00000 0.000e+000 0.000e+000
MPI Reductions: 1.410e+002 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 8.6338e-002 100.0% 5.3519e+007 100.0% 0.000e+000 0.0% 0.000e+000 0.0% 1.400e+002 99.3%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 68 1.0 1.3230e-002 1.0 6.07e+006 1.0 0.0e+000 0.0e+000 0.0e+000 15 11 0 0 0 15 11 0 0 0 458
MatSolve 68 1.0 2.0949e-002 1.0 6.07e+006 1.0 0.0e+000 0.0e+000 0.0e+000 24 11 0 0 0 24 11 0 0 0 290
MatLUFactorNum 1 1.0 1.5417e-003 1.0 1.09e+005 1.0 0.0e+000 0.0e+000 0.0e+000 2 0 0 0 0 2 0 0 0 0 70
MatILUFactorSym 1 1.0 9.4436e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 1.0e+000 1 0 0 0 1 1 0 0 0 1 0
MatAssemblyBegin 1 1.0 5.6889e-007 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 7.5776e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 1 0 0 0 0 1 0 0 0 0 0
MatGetRowIJ 1 1.0 2.8444e-006 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 1.7579e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 2.0e+000 0 0 0 0 1 0 0 0 0 1 0
VecMDot 65 1.0 1.0993e-002 1.0 1.89e+007 1.0 0.0e+000 0.0e+000 6.5e+001 13 35 0 0 46 13 35 0 0 46 1719
VecNorm 69 1.0 3.6978e-003 1.0 1.38e+006 1.0 0.0e+000 0.0e+000 6.9e+001 4 3 0 0 49 4 3 0 0 49 373
VecScale 68 1.0 1.0667e-003 1.0 6.80e+005 1.0 0.0e+000 0.0e+000 0.0e+000 1 1 0 0 0 1 1 0 0 0 637
VecCopy 3 1.0 5.0631e-005 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
VecSet 5 1.0 6.2009e-005 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 6 1.0 1.4108e-004 1.0 1.20e+005 1.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 851
VecMAXPY 68 1.0 1.6730e-002 1.0 2.02e+007 1.0 0.0e+000 0.0e+000 0.0e+000 19 38 0 0 0 19 38 0 0 0 1207
VecNormalize 68 1.0 4.8583e-003 1.0 2.04e+006 1.0 0.0e+000 0.0e+000 6.8e+001 6 4 0 0 48 6 4 0 0 49 420
KSPGMRESOrthog 65 1.0 2.6769e-002 1.0 3.78e+007 1.0 0.0e+000 0.0e+000 6.5e+001 31 71 0 0 46 31 71 0 0 46 1412
KSPSetUp 1 1.0 3.2484e-004 1.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 7.1967e-002 1.0 5.34e+007 1.0 0.0e+000 0.0e+000 1.4e+002 83100 0 0 96 83100 0 0 97 742
PCSetUp 1 1.0 2.7182e-003 1.0 1.09e+005 1.0 0.0e+000 0.0e+000 3.0e+000 3 0 0 0 2 3 0 0 0 2 40
PCApply 68 1.0 2.0985e-002 1.0 6.07e+006 1.0 0.0e+000 0.0e+000 0.0e+000 24 11 0 0 0 24 11 0 0 0 289
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 2 2 1520524 0
Vector 37 37 3017128 0
Krylov Solver 1 1 18360 0
Preconditioner 1 1 976 0
Index Set 3 3 42280 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.13778e-007
#PETSc Option Table entries:
-log_summary log_100x100_openmp_p8.log
-m 100
-n 100
-threadcomm_nthreads 8
-threadcomm_type openmp
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Oct 2 16:35:54 2013
Configure options: --with-cc="win32fe icl" --with-cxx="win32fe icl" --with-fc="win32fe ifort" --with-blas-lapack-dir=/cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64 --with-mpi-include=/cygdrive/c/MSMPI/Inc -with-mpi-lib="[/cygdrive/C/MSMPI/Lib/amd64/msmpi.lib,/cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib]" --with-openmp --with-shared-libraries --with-debugging=no --useThreads=0
-----------------------------------------
Libraries compiled on Wed Oct 2 16:35:54 2013 on NB-TT-113812
Machine characteristics: CYGWIN_NT-6.1-WOW64-1.7.25-0.270-5-3-i686-32bit
Using PETSc directory: /cygdrive/d/WorkDir/petsc-3.4.2
Using PETSc arch: arch-mswin-c-opt
-----------------------------------------
Using C compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl -MT -O3 -QxW -Qopenmp ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort -MT -O3 -QxW -fpp -Qopenmp ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/c/MSMPI/Inc
-----------------------------------------
Using C linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl
Using Fortran linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort
Using libraries: -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -lpetsc /cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64/mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib /cygdrive/C/MSMPI/Lib/amd64/msmpi.lib /cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib Gdi32.lib User32.lib Advapi32.lib Kernel32.lib Ws2_32.lib
-----------------------------------------
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
Petsc-windows-ex2f.exe on a arch-mswin-c-opt named STARGAZER2012 with 4 processors, by danyang Thu Oct 31 16:08:41 2013
Using Petsc Release Version 3.4.2, Jul, 02, 2013
Max Max/Min Avg Total
Time (sec): 6.401e-002 1.01274 6.350e-002
Objects: 5.600e+001 1.00000 5.600e+001
Flops: 2.031e+007 1.00102 2.030e+007 8.120e+007
Flops/sec: 3.213e+008 1.01377 3.197e+008 1.279e+009
MPI Messages: 2.100e+002 2.00000 1.575e+002 6.300e+002
MPI Message Lengths: 1.656e+005 2.00000 7.886e+002 4.968e+005
MPI Reductions: 2.270e+002 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 6.3490e-002 100.0% 8.1202e+007 100.0% 6.300e+002 100.0% 7.886e+002 100.0% 2.260e+002 99.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 103 1.0 1.3790e-002 1.9 2.31e+006 1.0 6.2e+002 8.0e+002 0.0e+000 17 11 98100 0 17 11 98100 0 666
MatSolve 103 1.0 1.1147e-002 1.4 2.27e+006 1.0 0.0e+000 0.0e+000 0.0e+000 16 11 0 0 0 16 11 0 0 0 813
MatLUFactorNum 1 1.0 3.9652e-004 1.0 2.66e+004 1.0 0.0e+000 0.0e+000 0.0e+000 1 0 0 0 0 1 0 0 0 0 269
MatILUFactorSym 1 1.0 2.7420e-004 1.1 0.00e+000 0.0 0.0e+000 0.0e+000 1.0e+000 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 1 1.0 1.1662e-004 1.3 0.00e+000 0.0 0.0e+000 0.0e+000 2.0e+000 0 0 0 0 1 0 0 0 0 1 0
MatAssemblyEnd 1 1.0 1.2538e-003 1.0 0.00e+000 0.0 1.2e+001 2.0e+002 9.0e+000 2 0 2 0 4 2 0 2 0 4 0
MatGetRowIJ 1 1.0 7.3956e-006 2.6 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 7.1680e-005 1.2 0.00e+000 0.0 0.0e+000 0.0e+000 2.0e+000 0 0 0 0 1 0 0 0 0 1 0
VecMDot 99 1.0 1.5917e-002 2.0 7.20e+006 1.0 0.0e+000 0.0e+000 9.9e+001 17 35 0 0 44 17 35 0 0 44 1809
VecNorm 104 1.0 9.6899e-003 4.3 5.20e+005 1.0 0.0e+000 0.0e+000 1.0e+002 8 3 0 0 46 8 3 0 0 46 215
VecScale 103 1.0 4.1813e-004 1.6 2.58e+005 1.0 0.0e+000 0.0e+000 0.0e+000 1 1 0 0 0 1 1 0 0 0 2463
VecCopy 4 1.0 4.2667e-005 1.5 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
VecSet 110 1.0 4.7957e-004 1.5 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 8 1.0 9.3008e-003 1.5 4.00e+004 1.0 0.0e+000 0.0e+000 0.0e+000 11 0 0 0 0 11 0 0 0 0 17
VecMAXPY 103 1.0 1.0259e-002 1.6 7.70e+006 1.0 0.0e+000 0.0e+000 0.0e+000 14 38 0 0 0 14 38 0 0 0 3000
VecScatterBegin 103 1.0 6.1099e-004 1.7 0.00e+000 0.0 6.2e+002 8.0e+002 0.0e+000 1 0 98100 0 1 0 98100 0 0
VecScatterEnd 103 1.0 3.1807e-003 7.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 2 0 0 0 0 2 0 0 0 0 0
VecNormalize 103 1.0 9.9965e-003 3.6 7.73e+005 1.0 0.0e+000 0.0e+000 1.0e+002 9 4 0 0 45 9 4 0 0 46 309
KSPGMRESOrthog 99 1.0 2.1869e-002 1.2 1.44e+007 1.0 0.0e+000 0.0e+000 9.9e+001 30 71 0 0 44 30 71 0 0 44 2634
KSPSetUp 2 1.0 1.4677e-004 1.1 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 5.7238e-002 1.0 2.03e+007 1.0 6.1e+002 8.0e+002 2.1e+002 90100 97 99 91 90100 97 99 92 1416
PCSetUp 2 1.0 9.2729e-004 1.0 2.66e+004 1.0 0.0e+000 0.0e+000 5.0e+000 1 0 0 0 2 1 0 0 0 2 115
PCSetUpOnBlocks 1 1.0 7.8507e-004 1.0 2.66e+004 1.0 0.0e+000 0.0e+000 3.0e+000 1 0 0 0 1 1 0 0 0 1 136
PCApply 103 1.0 1.3624e-002 1.4 2.27e+006 1.0 0.0e+000 0.0e+000 0.0e+000 19 11 0 0 0 19 11 0 0 0 665
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 4 4 577836 0
Vector 41 41 803912 0
Vector Scatter 1 1 1052 0
Index Set 5 5 14192 0
Krylov Solver 2 2 19504 0
Preconditioner 2 2 1848 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 5.68889e-008
Average time for MPI_Barrier(): 2.38933e-006
Average time for zero size MPI_Send(): 2.13333e-006
#PETSc Option Table entries:
-log_summary log_100x100_mpi_p4.log
-m 100
-n 100
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Oct 2 16:35:54 2013
Configure options: --with-cc="win32fe icl" --with-cxx="win32fe icl" --with-fc="win32fe ifort" --with-blas-lapack-dir=/cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64 --with-mpi-include=/cygdrive/c/MSMPI/Inc -with-mpi-lib="[/cygdrive/C/MSMPI/Lib/amd64/msmpi.lib,/cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib]" --with-openmp --with-shared-libraries --with-debugging=no --useThreads=0
-----------------------------------------
Libraries compiled on Wed Oct 2 16:35:54 2013 on NB-TT-113812
Machine characteristics: CYGWIN_NT-6.1-WOW64-1.7.25-0.270-5-3-i686-32bit
Using PETSc directory: /cygdrive/d/WorkDir/petsc-3.4.2
Using PETSc arch: arch-mswin-c-opt
-----------------------------------------
Using C compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl -MT -O3 -QxW -Qopenmp ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort -MT -O3 -QxW -fpp -Qopenmp ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/c/MSMPI/Inc
-----------------------------------------
Using C linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl
Using Fortran linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort
Using libraries: -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -lpetsc /cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64/mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib /cygdrive/C/MSMPI/Lib/amd64/msmpi.lib /cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib Gdi32.lib User32.lib Advapi32.lib Kernel32.lib Ws2_32.lib
-----------------------------------------
-------------- next part --------------
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
Petsc-windows-ex2f.exe on a arch-mswin-c-opt named STARGAZER2012 with 8 processors, by danyang Thu Oct 31 16:08:44 2013
Using Petsc Release Version 3.4.2, Jul, 02, 2013
Max Max/Min Avg Total
Time (sec): 2.930e-002 1.03041 2.877e-002
Objects: 5.600e+001 1.00000 5.600e+001
Flops: 1.090e+007 1.00204 1.089e+007 8.715e+007
Flops/sec: 3.833e+008 1.03251 3.787e+008 3.030e+009
MPI Messages: 2.260e+002 2.00000 1.978e+002 1.582e+003
MPI Message Lengths: 1.784e+005 2.00000 7.894e+002 1.249e+006
MPI Reductions: 2.430e+002 1.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 2.8748e-002 99.9% 8.7151e+007 100.0% 1.582e+003 100.0% 7.894e+002 100.0% 2.420e+002 99.6%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 111 1.0 4.7565e-003 1.1 1.24e+006 1.0 1.6e+003 8.0e+002 0.0e+000 16 11 98100 0 16 11 98100 0 2082
MatSolve 111 1.0 4.3031e-003 1.0 1.20e+006 1.0 0.0e+000 0.0e+000 0.0e+000 15 11 0 0 0 15 11 0 0 0 2228
MatLUFactorNum 1 1.0 2.0708e-004 1.1 1.30e+004 1.0 0.0e+000 0.0e+000 0.0e+000 1 0 0 0 0 1 0 0 0 0 501
MatILUFactorSym 1 1.0 1.5815e-004 1.1 0.00e+000 0.0 0.0e+000 0.0e+000 1.0e+000 1 0 0 0 0 1 0 0 0 0 0
MatAssemblyBegin 1 1.0 1.4336e-004 1.3 0.00e+000 0.0 0.0e+000 0.0e+000 2.0e+000 0 0 0 0 1 0 0 0 0 1 0
MatAssemblyEnd 1 1.0 1.7192e-003 1.0 0.00e+000 0.0 2.8e+001 2.0e+002 9.0e+000 6 0 2 0 4 6 0 2 0 4 0
MatGetRowIJ 1 1.0 4.5511e-006 2.0 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 4.7787e-005 1.1 0.00e+000 0.0 0.0e+000 0.0e+000 2.0e+000 0 0 0 0 1 0 0 0 0 1 0
VecMDot 107 1.0 4.4311e-003 1.2 3.87e+006 1.0 0.0e+000 0.0e+000 1.1e+002 14 36 0 0 44 14 36 0 0 44 6984
VecNorm 112 1.0 3.2262e-003 1.0 2.80e+005 1.0 0.0e+000 0.0e+000 1.1e+002 11 3 0 0 46 11 3 0 0 46 694
VecScale 111 1.0 2.1390e-004 1.2 1.39e+005 1.0 0.0e+000 0.0e+000 0.0e+000 1 1 0 0 0 1 1 0 0 0 5189
VecCopy 4 1.0 4.4942e-005 3.2 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
VecSet 118 1.0 2.0594e-004 1.2 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 8 1.0 6.4853e-005 1.8 2.00e+004 1.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 2467
VecMAXPY 111 1.0 3.4446e-003 1.0 4.14e+006 1.0 0.0e+000 0.0e+000 0.0e+000 12 38 0 0 0 12 38 0 0 0 9609
VecScatterBegin 111 1.0 5.5694e-004 1.7 0.00e+000 0.0 1.6e+003 8.0e+002 0.0e+000 2 0 98100 0 2 0 98100 0 0
VecScatterEnd 111 1.0 5.4556e-004 1.7 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 2 0 0 0 0 2 0 0 0 0 0
VecNormalize 111 1.0 3.5487e-003 1.0 4.16e+005 1.0 0.0e+000 0.0e+000 1.1e+002 12 4 0 0 46 12 4 0 0 46 938
KSPGMRESOrthog 107 1.0 7.7466e-003 1.1 7.74e+006 1.0 0.0e+000 0.0e+000 1.1e+002 26 71 0 0 44 26 71 0 0 44 7992
KSPSetUp 2 1.0 1.7180e-004 1.6 0.00e+000 0.0 0.0e+000 0.0e+000 0.0e+000 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 1 1.0 2.3104e-002 1.0 1.09e+007 1.0 1.5e+003 8.0e+002 2.2e+002 80100 97 99 92 80100 97 99 92 3766
PCSetUp 2 1.0 6.2976e-004 1.1 1.30e+004 1.0 0.0e+000 0.0e+000 5.0e+000 2 0 0 0 2 2 0 0 0 2 165
PCSetUpOnBlocks 1 1.0 4.5852e-004 1.1 1.30e+004 1.0 0.0e+000 0.0e+000 3.0e+000 2 0 0 0 1 2 0 0 0 1 226
PCApply 111 1.0 5.9147e-003 1.0 1.20e+006 1.0 0.0e+000 0.0e+000 0.0e+000 21 11 0 0 0 21 11 0 0 0 1621
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 4 4 293124 0
Vector 41 41 433912 0
Vector Scatter 1 1 1052 0
Index Set 5 5 9192 0
Krylov Solver 2 2 19504 0
Preconditioner 2 2 1848 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 5.68889e-008
Average time for MPI_Barrier(): 5.00622e-006
Average time for zero size MPI_Send(): 2.27556e-006
#PETSc Option Table entries:
-log_summary log_100x100_mpi_p8.log
-m 100
-n 100
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 4 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Wed Oct 2 16:35:54 2013
Configure options: --with-cc="win32fe icl" --with-cxx="win32fe icl" --with-fc="win32fe ifort" --with-blas-lapack-dir=/cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64 --with-mpi-include=/cygdrive/c/MSMPI/Inc -with-mpi-lib="[/cygdrive/C/MSMPI/Lib/amd64/msmpi.lib,/cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib]" --with-openmp --with-shared-libraries --with-debugging=no --useThreads=0
-----------------------------------------
Libraries compiled on Wed Oct 2 16:35:54 2013 on NB-TT-113812
Machine characteristics: CYGWIN_NT-6.1-WOW64-1.7.25-0.270-5-3-i686-32bit
Using PETSc directory: /cygdrive/d/WorkDir/petsc-3.4.2
Using PETSc arch: arch-mswin-c-opt
-----------------------------------------
Using C compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl -MT -O3 -QxW -Qopenmp ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort -MT -O3 -QxW -fpp -Qopenmp ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/include -I/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/include -I/cygdrive/c/MSMPI/Inc
-----------------------------------------
Using C linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe icl
Using Fortran linker: /cygdrive/d/WorkDir/petsc-3.4.2/bin/win32fe/win32fe ifort
Using libraries: -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -L/cygdrive/d/WorkDir/petsc-3.4.2/arch-mswin-c-opt/lib -lpetsc /cygdrive/d/HardLinks/PETSc/Intel2013/mkl/lib/intel64/mkl_intel_lp64_dll.lib mkl_intel_thread_dll.lib mkl_core_dll.lib libiomp5md.lib /cygdrive/C/MSMPI/Lib/amd64/msmpi.lib /cygdrive/C/MSMPI/Lib/amd64/msmpifec.lib Gdi32.lib User32.lib Advapi32.lib Kernel32.lib Ws2_32.lib
-----------------------------------------
More information about the petsc-users
mailing list