<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Hi,<br>
<br>
I've finally managed to test using ex2f on the servers. This time, the
command was issued such that I'm assigned
1*atlas3-c01,1*atlas3-c02,1*atlas3-c03,1*atlas3-c04 instead of
2*atlas3-c01,2*atlas3-c02 for 4 processors run. It seems to be better,
although I was told that 2*atlas3-c02 doesn't actually mean utilizing 2
cores on the same processor. Here's the result below. I think they are
better now, although performance time still vary at times. Btw, I'm not
able to try the latest mpich2 because I do not have the administrator
rights. I was told that some special configuration is required.<br>
<br>
Btw, should there be any different in speed whether I use mpiuni and
ifort or mpi and mpif90? I tried on ex2f (below) and there's only a
small difference. If there is a large difference (mpi being slower),
then it mean there's something wrong in the code?<br>
<br>
Thank you very much.<br>
<br>
<b>for 1 processor (use of mpiuni and ifort during complilation )</b><br>
<br>
Norm of error 0.6935E+01 iterations 1818<br>
************************************************************************************************************************<br>
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***<br>
************************************************************************************************************************<br>
<br>
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------<br>
<br>
./a.out on a atlas3 named atlas2-c11 with 1 processor, by g0306332 Sat
Apr 19 10:28:13 2008<br>
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST
2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b<br>
<br>
Max Max/Min Avg Total<br>
Time (sec): 2.317e+02 1.00000 2.317e+02<br>
Objects: 4.400e+01 1.00000 4.400e+01<br>
Flops: 9.958e+10 1.00000 9.958e+10 9.958e+10<br>
Flops/sec: 4.298e+08 1.00000 4.298e+08 4.298e+08<br>
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00<br>
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00<br>
MPI Reductions: 3.701e+03 1.00000<br>
<br>
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)<br>
e.g., VecAXPY() for real vectors of length
N --> 2N flops<br>
and VecAXPY() for complex vectors of length
N --> 8N flops<br>
<br>
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --<br>
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total<br>
0: Main Stage: 2.3171e+02 100.0% 9.9582e+10 100.0% 0.000e+00
0.0% 0.000e+00 0.0% 3.701e+03 100.0%<br>
<br>
------------------------------------------------------------------------------------------------------------------------<br>
See the 'Profiling' chapter of the users' manual for details on
interpreting output.<br>
Phase summary info: <br>
Count: number of times phase was executed<br>
Time and Flops/sec: Max - maximum over all processors<br>
Ratio - ratio of maximum to minimum over all
processors<br>
Mess: number of messages sent<br>
Avg. len: average message length<br>
Reduct: number of global reductions<br>
Global: entire computation<br>
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().<br>
%T - percent time in this phase %F - percent flops in
this phase<br>
%M - percent messages in this phase %L - percent message
lengths in this phase<br>
%R - percent reductions in this phase<br>
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
<br>
##########################################################<br>
# #<br>
# WARNING!!! #<br>
# #<br>
# This code was run without the PreLoadBegin() #<br>
# macros. To get timing results we always recommend #<br>
# preloading. otherwise timing numbers may be #<br>
# meaningless. #<br>
##########################################################<br>
<br>
<br>
<br>
<br>
Event Count Time (sec)
Flops/sec --- Global --- --- Stage --- Total<br>
Max Ratio Max Ratio Max Ratio Mess Avg
len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
--- Event Stage 0: Main Stage<br>
<br>
MatMult 1879 1.0 2.8137e+01 1.0 3.84e+08 1.0 0.0e+00
0.0e+00 0.0e+00 12 11 0 0 0 12 11 0 0 0 384<br>
MatSolve 1879 1.0 5.4371e+01 1.0 1.99e+08 1.0 0.0e+00
0.0e+00 0.0e+00 23 11 0 0 0 23 11 0 0 0 199<br>
MatLUFactorNum 1 1.0 9.2121e-02 1.0 6.24e+07 1.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 62<br>
MatILUFactorSym 1 1.0 7.3340e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatAssemblyBegin 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatAssemblyEnd 1 1.0 5.5443e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatGetRowIJ 1 1.0 2.8610e-06 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatGetOrdering 1 1.0 1.6465e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecMDot 1818 1.0 6.0178e+01 1.0 5.97e+08 1.0 0.0e+00
0.0e+00 1.8e+03 26 36 0 0 49 26 36 0 0 49 597<br>
VecNorm 1880 1.0 4.1541e+00 1.0 5.79e+08 1.0 0.0e+00
0.0e+00 1.9e+03 2 2 0 0 51 2 2 0 0 51 579<br>
VecScale 1879 1.0 4.8439e+00 1.0 2.48e+08 1.0 0.0e+00
0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 248<br>
VecCopy 61 1.0 1.7232e-01 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecSet 63 1.0 8.0270e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecAXPY 122 1.0 5.6893e-01 1.0 2.74e+08 1.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 274<br>
VecMAXPY 1879 1.0 7.8124e+01 1.0 4.90e+08 1.0 0.0e+00
0.0e+00 0.0e+00 34 38 0 0 0 34 38 0 0 0 490<br>
VecNormalize 1879 1.0 9.0043e+00 1.0 4.01e+08 1.0 0.0e+00
0.0e+00 1.9e+03 4 4 0 0 51 4 4 0 0 51 401<br>
KSPGMRESOrthog 1818 1.0 1.3358e+02 1.0 5.38e+08 1.0 0.0e+00
0.0e+00 1.8e+03 58 72 0 0 49 58 72 0 0 49 538<br>
KSPSetup 1 1.0 3.0222e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
<b>KSPSolve 1 1.0 2.3103e+02 1.0 4.31e+08 1.0 0.0e+00
0.0e+00 3.7e+03100100 0 0100 100100 0 0100 431</b><br>
PCSetUp 1 1.0 1.8197e-01 1.0 3.16e+07 1.0 0.0e+00
0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 32<br>
PCApply 1879 1.0 5.4377e+01 1.0 1.99e+08 1.0 0.0e+00
0.0e+00 0.0e+00 23 11 0 0 0 23 11 0 0 0 199<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
Memory usage is given in bytes:<br>
<br>
Object Type Creations Destructions Memory Descendants'
Mem.<br>
<br>
--- Event Stage 0: Main Stage<br>
<br>
Matrix 2 2 97241612 0<br>
Index Set 3 3 7681032 0<br>
Vec 37 37 184348408 0<br>
Krylov Solver 1 1 17216 0<br>
Preconditioner 1 1 168 0<br>
========================================================================================================================<br>
Average time to get PetscTime(): 2.86102e-07<br>
OptionTable: -log_summary<br>
Compiled without FORTRAN kernels<br>
Compiled with full precision matrices (default)<br>
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8<br>
Configure run at: Wed Jan 9 14:33:02 2008<br>
Configure options: --with-cc=icc --with-fc=ifort --with-x=0
--with-blas-lapack-dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared
--with-mpi-dir=/lsftmp/g0306332/mpich2/ --with-debugging=0
--with-hypre-dir=/home/enduser/g0306332/lib/hypre_shared<br>
-----------------------------------------<br>
Libraries compiled on Wed Jan 9 14:33:36 SGT 2008 on atlas3-c01<br>
Machine characteristics: Linux atlas3-c01 2.6.9-42.ELsmp #1 SMP Wed Jul
12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux<br>
Using PETSc directory: /home/enduser/g0306332/petsc-2.3.3-p8<br>
Using PETSc arch: atlas3<br>
-----------------------------------------<br>
Using C compiler: icc -fPIC -O<br>
<br>
<b>for 1 processor (with mpi and mpif90)</b><br>
<br>
Norm of error 0.6935E+01 iterations 1818<br>
************************************************************************************************************************<br>
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***<br>
************************************************************************************************************************<br>
<br>
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------<br>
<br>
./a.out on a atlas3-mp named atlas3-c35 with 1 processor, by g0306332
Sat Apr 19 12:06:10 2008<br>
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST
2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b<br>
<br>
Max Max/Min Avg Total<br>
Time (sec): 1.994e+02 1.00000 1.994e+02<br>
Objects: 4.400e+01 1.00000 4.400e+01<br>
Flops: 9.958e+10 1.00000 9.958e+10 9.958e+10<br>
Flops/sec: 4.994e+08 1.00000 4.994e+08 4.994e+08<br>
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00<br>
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00<br>
MPI Reductions: 3.701e+03 1.00000<br>
<br>
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)<br>
e.g., VecAXPY() for real vectors of length
N --> 2N flops<br>
and VecAXPY() for complex vectors of length
N --> 8N flops<br>
<br>
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --<br>
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total<br>
0: Main Stage: 1.9941e+02 100.0% 9.9582e+10 100.0% 0.000e+00
0.0% 0.000e+00 0.0% 3.701e+03 100.0%<br>
<br>
------------------------------------------------------------------------------------------------------------------------<br>
See the 'Profiling' chapter of the users' manual for details on
interpreting output.<br>
Phase summary info:<br>
Count: number of times phase was executed<br>
Time and Flops/sec: Max - maximum over all processors<br>
Ratio - ratio of maximum to minimum over all
processors<br>
Mess: number of messages sent<br>
Avg. len: average message length<br>
Reduct: number of global reductions<br>
Global: entire computation<br>
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().<br>
%T - percent time in this phase %F - percent flops in
this phase<br>
%M - percent messages in this phase %L - percent message
lengths in this phase<br>
%R - percent reductions in this phase<br>
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
<br>
##########################################################<br>
# #<br>
# WARNING!!! #<br>
# #<br>
# This code was run without the PreLoadBegin() #<br>
# macros. To get timing results we always recommend #<br>
# preloading. otherwise timing numbers may be #<br>
# meaningless. #<br>
##########################################################<br>
<br>
##########################################################<br>
<br>
<br>
Event Count Time (sec)
Flops/sec --- Global --- --- Stage --- Total<br>
Max Ratio Max Ratio Max Ratio Mess Avg
len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
--- Event Stage 0: Main Stage<br>
<br>
MatMult 1879 1.0 2.5570e+01 1.0 4.23e+08 1.0 0.0e+00
0.0e+00 0.0e+00 13 11 0 0 0 13 11 0 0 0 423<br>
MatSolve 1879 1.0 4.9718e+01 1.0 2.17e+08 1.0 0.0e+00
0.0e+00 0.0e+00 25 11 0 0 0 25 11 0 0 0 217<br>
MatLUFactorNum 1 1.0 6.2375e-02 1.0 9.22e+07 1.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 92<br>
MatILUFactorSym 1 1.0 5.7791e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatAssemblyBegin 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatAssemblyEnd 1 1.0 4.0974e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatGetRowIJ 1 1.0 1.9073e-06 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatGetOrdering 1 1.0 1.1152e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecMDot 1818 1.0 5.4006e+01 1.0 6.65e+08 1.0 0.0e+00
0.0e+00 1.8e+03 27 36 0 0 49 27 36 0 0 49 665<br>
VecNorm 1880 1.0 3.1264e+00 1.0 7.70e+08 1.0 0.0e+00
0.0e+00 1.9e+03 2 2 0 0 51 2 2 0 0 51 770<br>
VecScale 1879 1.0 2.2186e+00 1.0 5.42e+08 1.0 0.0e+00
0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 542<br>
VecCopy 61 1.0 1.5514e-01 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecSet 63 1.0 9.5223e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecAXPY 122 1.0 3.7005e-01 1.0 4.22e+08 1.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 422<br>
VecMAXPY 1879 1.0 6.3324e+01 1.0 6.04e+08 1.0 0.0e+00
0.0e+00 0.0e+00 32 38 0 0 0 32 38 0 0 0 604<br>
VecNormalize 1879 1.0 5.3485e+00 1.0 6.75e+08 1.0 0.0e+00
0.0e+00 1.9e+03 3 4 0 0 51 3 4 0 0 51 675<br>
KSPGMRESOrthog 1818 1.0 1.1345e+02 1.0 6.33e+08 1.0 0.0e+00
0.0e+00 1.8e+03 57 72 0 0 49 57 72 0 0 49 633<br>
KSPSetup 1 1.0 2.1831e-02 1.0 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
<b>KSPSolve 1 1.0 1.9887e+02 1.0 5.01e+08 1.0 0.0e+00
0.0e+00 3.7e+03100100 0 0100 100100 0 0100 501</b><br>
PCSetUp 1 1.0 1.3134e-01 1.0 4.38e+07 1.0 0.0e+00
0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 44<br>
PCApply 1879 1.0 4.9722e+01 1.0 2.17e+08 1.0 0.0e+00
0.0e+00 0.0e+00 25 11 0 0 0 25 11 0 0 0 217<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
Memory usage is given in bytes:<br>
<br>
Object Type Creations Destructions Memory Descendants'
Mem.<br>
<br>
--- Event Stage 0: Main Stage<br>
<br>
Matrix 2 2 97241612 0<br>
Index Set 3 3 7681032 0<br>
Vec 37 37 184348408 0<br>
Krylov Solver 1 1 17216 0<br>
Preconditioner 1 1 168 0<br>
========================================================================================================================<br>
Average time to get PetscTime(): 2.14577e-07<br>
OptionTable: -log_summary<br>
Compiled without FORTRAN kernels<br>
Compiled with full precision matrices (default)<br>
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8<br>
Configure run at: Tue Jan 8 22:22:08 2008<br>
Configure options: --with-memcmp-ok --sizeof_char=1 --sizeof_void_p=8
--sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --sizeof_long_long=8
--sizeof_float=4 --sizeof_double=8 --bits_per_byte=8
--sizeof_MPI_Comm=4 --sizeof_MPI_Fint=4 --with-vendor-compilers=intel
--with-x=0 --with-hypre-dir=/home/enduser/g0306332/lib/hypre
--with-debugging=0 --with-batch=1 --with-mpi-shared=0
--with-mpi-include=/usr/local/topspin/mpi/mpich/include
--with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a
--with-mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun
--with-blas-lapack-dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0<br>
-----------------------------------------<br>
Libraries compiled on Tue Jan 8 22:34:13 SGT 2008 on atlas3-c01<br>
Machine characteristics: Linux atlas3-c01 2.6.9-42.ELsmp #1 SMP Wed Jul
12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux<br>
Using PETSc directory: /nfs/home/enduser/g0306332/petsc-2.3.3-p8<br>
<br>
<br>
<br>
<b>for 4 processors:</b><br>
<br>
Norm of error 0.6563E+01 iterations 1224<br>
************************************************************************************************************************<br>
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***<br>
************************************************************************************************************************<br>
<br>
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------<br>
<br>
./a.out on a atlas3-mp named atlas3-c07 with 4 processors, by g0306332
Sat Apr 19 10:47:39 2008<br>
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST
2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b<br>
<br>
Max Max/Min Avg Total<br>
Time (sec): 5.816e+01 1.00339 5.802e+01<br>
Objects: 5.500e+01 1.00000 5.500e+01<br>
Flops: 1.676e+10 1.00012 1.676e+10 6.704e+10<br>
Flops/sec: 2.891e+08 1.00339 2.888e+08 1.155e+09<br>
MPI Messages: 2.532e+03 2.00000 1.899e+03 7.596e+03<br>
MPI Message Lengths: 1.620e+07 2.00000 6.397e+03 4.860e+07<br>
MPI Reductions: 6.255e+02 1.00000<br>
<br>
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)<br>
e.g., VecAXPY() for real vectors of length
N --> 2N flops<br>
and VecAXPY() for complex vectors of length
N --> 8N flops<br>
<br>
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --<br>
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total<br>
0: Main Stage: 5.8023e+01 100.0% 6.7036e+10 100.0% 7.596e+03
100.0% 6.397e+03 100.0% 2.502e+03 100.0%<br>
<br>
------------------------------------------------------------------------------------------------------------------------<br>
See the 'Profiling' chapter of the users' manual for details on
interpreting output.<br>
Phase summary info: <br>
Count: number of times phase was executed<br>
Time and Flops/sec: Max - maximum over all processors<br>
Ratio - ratio of maximum to minimum over all
processors<br>
Mess: number of messages sent<br>
Avg. len: average message length<br>
Reduct: number of global reductions<br>
Global: entire computation<br>
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().<br>
%T - percent time in this phase %F - percent flops in
this phase<br>
%M - percent messages in this phase %L - percent message
lengths in this phase<br>
%R - percent reductions in this phase<br>
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
<br>
##########################################################<br>
# #<br>
# WARNING!!! #<br>
# #<br>
# This code was run without the PreLoadBegin() #<br>
# macros. To get timing results we always recommend #<br>
# preloading. otherwise timing numbers may be #<br>
# meaningless. #<br>
##########################################################<br>
<br>
Event Count Time (sec)
Flops/sec --- Global --- --- Stage --- Total<br>
Max Ratio Max Ratio Max Ratio Mess Avg
len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
--- Event Stage 0: Main Stage<br>
<br>
MatMult 1265 1.0 6.1331e+00 1.3 3.83e+08 1.3 7.6e+03
6.4e+03 0.0e+00 9 11100100 0 9 11100100 0 1187<br>
MatSolve 1265 1.0 1.0547e+01 1.3 2.30e+08 1.3 0.0e+00
0.0e+00 0.0e+00 15 11 0 0 0 15 11 0 0 0 689<br>
MatLUFactorNum 1 1.0 4.4247e-0130.2 9.79e+0730.2 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 13<br>
MatILUFactorSym 1 1.0 3.2925e+00242.4 0.00e+00 0.0 0.0e+00
0.0e+00 1.0e+00 1 0 0 0 0 1 0 0 0 0 0<br>
MatAssemblyBegin 1 1.0 1.1287e+00 2.8 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+00 2 0 0 0 0 2 0 0 0 0 0<br>
58.24user 0.08system 1:16.61elapsed 76%CPU (0avgtext+0avgdata
0maxresident)k<br>
0inputs+0outputs (17major+24855minor)pagefaults 0swaps<br>
57.73user 0.09system 1:16.52elapsed 75%CPU (0avgtext+0avgdata
0maxresident)k<br>
0inputs+0outputs (33major+24778minor)pagefaults 0swaps<br>
MatAssemblyEnd 1 1.0 3.5604e-01 1.3 0.00e+00 0.0 6.0e+00
3.2e+03 7.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatGetRowIJ 1 1.0 3.0994e-06 3.2 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatGetOrdering 1 1.0 1.1066e+00574.2 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecMDot 1224 1.0 1.3427e+01 1.4 6.35e+08 1.4 0.0e+00
0.0e+00 1.2e+03 20 36 0 0 49 20 36 0 0 49 1802<br>
VecNorm 1266 1.0 1.6744e+01 1.5 3.67e+07 1.5 0.0e+00
0.0e+00 1.3e+03 25 2 0 0 51 25 2 0 0 51 97<br>
VecScale 1265 1.0 1.5984e-01 1.7 2.15e+09 1.7 0.0e+00
0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 5065<br>
VecCopy 41 1.0 4.5000e-02 1.6 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecSet 1308 1.0 6.7918e-01 1.3 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0<br>
VecAXPY 82 1.0 1.1008e-01 2.8 6.71e+08 2.8 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 953<br>
VecMAXPY 1265 1.0 1.0437e+01 1.3 7.82e+08 1.3 0.0e+00
0.0e+00 0.0e+00 15 38 0 0 0 15 38 0 0 0 2468<br>
VecScatterBegin 1265 1.0 2.0925e-02 1.5 0.00e+00 0.0 7.6e+03
6.4e+03 0.0e+00 0 0100100 0 0 0100100 0 0<br>
VecScatterEnd 1265 1.0 1.5369e+0014.9 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0<br>
VecNormalize 1265 1.0 1.6829e+01 1.5 5.44e+07 1.5 0.0e+00
0.0e+00 1.3e+03 25 4 0 0 51 25 4 0 0 51 144<br>
KSPGMRESOrthog 1224 1.0 2.1170e+01 1.1 6.26e+08 1.1 0.0e+00
0.0e+00 1.2e+03 35 72 0 0 49 35 72 0 0 49 2286<br>
KSPSetup 2 1.0 1.6389e+00 1.3 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0<br>
<b>KSPSolve 1 1.0 5.4782e+01 1.0 3.09e+08 1.0 7.6e+03
6.4e+03 2.5e+03 94100100100100 94100100100100 1224 </b><br>
PCSetUp 2 1.0 5.0808e+00167.8 4.74e+07167.8 0.0e+00
0.0e+00 3.0e+00 2 0 0 0 0 2 0 0 0 0 1<br>
PCSetUpOnBlocks 1 1.0 4.9581e+00164.2 4.75e+07164.2 0.0e+00
0.0e+00 3.0e+00 2 0 0 0 0 2 0 0 0 0 1<br>
PCApply 1265 1.0 1.1233e+01 1.3 2.15e+08 1.3 0.0e+00
0.0e+00 0.0e+00 16 11 0 0 0 16 11 0 0 0 647<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
Memory usage is given in bytes:<br>
<br>
Object Type Creations Destructions Memory Descendants'
Mem.<br>
<br>
--- Event Stage 0: Main Stage<br>
<br>
Matrix 4 4 30699220 0<br>
Index Set 5 5 1924920 0<br>
Vec 41 41 47397592 0<br>
Vec Scatter 1 1 0 0<br>
Krylov Solver 2 2 17216 0<br>
Preconditioner 2 2 256 0<br>
========================================================================================================================<br>
Average time to get PetscTime(): 1.90735e-07<br>
Average time for MPI_Barrier(): 1.52111e-05<br>
Average time for zero size MPI_Send(): 7.42674e-05<br>
<br>
<b>for 8 processors</b><br>
<br>
Norm of error 0.7057E+01 iterations 1974<br>
************************************************************************************************************************<br>
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***<br>
************************************************************************************************************************<br>
<br>
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------<br>
<br>
./a.out on a atlas3-mp named atlas3-c07 with 8 processors, by g0306332
Sat Apr 19 10:50:39 2008<br>
Using Petsc Release Version 2.3.3, Patch 8, Fri Nov 16 17:03:40 CST
2007 HG revision: 414581156e67e55c761739b0deb119f7590d0f4b<br>
<br>
Max Max/Min Avg Total<br>
Time (sec): 3.884e+01 1.00356 3.872e+01<br>
Objects: 5.500e+01 1.00000 5.500e+01 <br>
Flops: 1.352e+10 1.00024 1.352e+10 1.082e+11<br>
Flops/sec: 3.494e+08 1.00356 3.492e+08 2.794e+09<br>
MPI Messages: 4.082e+03 2.00000 3.572e+03 2.857e+04<br>
MPI Message Lengths: 2.612e+07 2.00000 6.398e+03 1.828e+08<br>
MPI Reductions: 5.034e+02 1.00000<br>
<br>
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)<br>
e.g., VecAXPY() for real vectors of length
N --> 2N flops<br>
and VecAXPY() for complex vectors of length
N --> 8N flops<br>
<br>
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --<br>
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total<br>
0: Main Stage: 3.8725e+01 100.0% 1.0819e+11 100.0% 2.857e+04
100.0% 6.398e+03 100.0% 4.027e+03 100.0%<br>
<br>
------------------------------------------------------------------------------------------------------------------------<br>
See the 'Profiling' chapter of the users' manual for details on
interpreting output.<br>
Phase summary info: <br>
Count: number of times phase was executed<br>
Time and Flops/sec: Max - maximum over all processors<br>
Ratio - ratio of maximum to minimum over all
processors<br>
Mess: number of messages sent<br>
Avg. len: average message length<br>
Reduct: number of global reductions<br>
Global: entire computation<br>
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().<br>
%T - percent time in this phase %F - percent flops in
this phase<br>
%M - percent messages in this phase %L - percent message
lengths in this phase<br>
%R - percent reductions in this phase<br>
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
<br>
##########################################################<br>
# #<br>
# WARNING!!! #<br>
# #<br>
# This code was run without the PreLoadBegin() #<br>
# macros. To get timing results we always recommend #<br>
# preloading. otherwise timing numbers may be #<br>
# meaningless. #<br>
##########################################################<br>
<br>
<br>
<br>
<br>
Event Count Time (sec)
Flops/sec --- Global --- --- Stage --- Total<br>
Max Ratio Max Ratio Max Ratio Mess Avg
len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
--- Event Stage 0: Main Stage<br>
<br>
MatMult 2040 1.0 5.3584e+00 1.4 3.94e+08 1.4 2.9e+04
6.4e+03 0.0e+00 11 11100100 0 11 11100100 0 2190<br>
MatSolve 2040 1.0 9.1180e+00 1.6 2.61e+08 1.6 0.0e+00
0.0e+00 0.0e+00 16 11 0 0 0 16 11 0 0 0 1282<br>
MatLUFactorNum 1 1.0 2.0827e-02 3.0 1.01e+08 3.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 274<br>
MatILUFactorSym 1 1.0 3.3652e-0159.9 0.00e+00 0.0 0.0e+00
0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatAssemblyBegin 1 1.0 6.9557e-02649.8 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatAssemblyEnd 1 1.0 2.1714e-01 1.0 0.00e+00 0.0 1.4e+01
3.2e+03 7.0e+00 1 0 0 0 0 1 0 0 0 0 0<br>
MatGetRowIJ 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
MatGetOrdering 1 1.0 8.8410e-03 9.6 0.00e+00 0.0 0.0e+00
0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecMDot 1974 1.0 1.3835e+01 1.5 5.43e+08 1.5 0.0e+00
0.0e+00 2.0e+03 33 36 0 0 49 33 36 0 0 49 2824<br>
VecNorm 2041 1.0 4.8508e+00 3.1 2.07e+08 3.1 0.0e+00
0.0e+00 2.0e+03 10 2 0 0 51 10 2 0 0 51 539<br>
VecScale 2040 1.0 9.5685e-02 1.3 2.28e+09 1.3 0.0e+00
0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 13645<br>
VecCopy 66 1.0 2.8788e-02 1.3 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
VecSet 2108 1.0 5.7849e-01 1.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0<br>
VecAXPY 132 1.0 9.8424e-02 4.1 8.77e+08 4.1 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1717<br>
VecMAXPY 2040 1.0 8.5986e+00 1.6 9.60e+08 1.6 0.0e+00
0.0e+00 0.0e+00 16 38 0 0 0 16 38 0 0 0 4838<br>
VecScatterBegin 2040 1.0 3.5096e-02 1.6 0.00e+00 0.0 2.9e+04
6.4e+03 0.0e+00 0 0100100 0 0 0100100 0 0<br>
VecScatterEnd 2040 1.0 1.9975e+00 8.8 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0<br>
VecNormalize 2040 1.0 4.9265e+00 2.9 2.92e+08 2.9 0.0e+00
0.0e+00 2.0e+03 11 4 0 0 51 11 4 0 0 51 795<br>
KSPGMRESOrthog 1974 1.0 1.8947e+01 1.1 5.71e+08 1.1 0.0e+00
0.0e+00 2.0e+03 47 72 0 0 49 47 72 0 0 49 4124<br>
KSPSetup 2 1.0 1.8385e-01 1.4 0.00e+00 0.0 0.0e+00
0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
<b>KSPSolve 1 1.0 3.7776e+01 1.0 3.59e+08 1.0 2.9e+04
6.4e+03 4.0e+03 97100100100100 97100100100100 2864</b><br>
PCSetUp 2 1.0 3.6882e-0126.7 5.18e+0726.7 0.0e+00
0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 16<br>
PCSetUpOnBlocks 1 1.0 3.7064e-0127.0 5.20e+0727.0 0.0e+00
0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 15<br>
PCApply 2040 1.0 9.7035e+00 1.6 2.42e+08 1.6 0.0e+00
0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 1205<br>
------------------------------------------------------------------------------------------------------------------------<br>
<br>
Memory usage is given in bytes:<br>
<br>
Object Type Creations Destructions Memory Descendants'
Mem.<br>
<br>
--- Event Stage 0: Main Stage<br>
<br>
Matrix 4 4 15341620 0<br>
Index Set 5 5 964920 0<br>
Vec 41 41 23717592 0<br>
Vec Scatter 1 1 0 0<br>
Krylov Solver 2 2 17216 0<br>
Preconditioner 2 2 256 0<br>
========================================================================================================================<br>
Average time to get PetscTime(): 1.90735e-07<br>
Average time for MPI_Barrier(): 1.75953e-05<br>
Average time for zero size MPI_Send(): 3.83854e-05<br>
OptionTable: -log_summary<br>
Compiled without FORTRAN kernels<br>
Compiled with full precision matrices (default)<br>
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8<br>
Configure run at: Tue Jan 8 22:22:08 2008<br>
Configure options: --with-memcmp-ok --sizeof_char=1 --sizeof_void_p=8
--sizeof_short=2 --sizeof_int=4 --sizeof_long=8 --sizeof_long_long=8
--sizeof_float=4 --sizeof_double=8 --bits_per_byte=8
--sizeof_MPI_Comm=4 --sizeof_MPI_Fint=4 --with-vendor-compilers=intel
--with-x=0 --with-hypre-dir=/home/enduser/g0306332/lib/hypre
--with-debugging=0 --with-batch=1 --with-mpi-shared=0
--with-mpi-include=/usr/local/topspin/mpi/mpich/include
--with-mpi-lib=/usr/local/topspin/mpi/mpich/lib/libmpich.a
--with-mpirun=/usr/local/topspin/mpi/mpich/bin/mpirun
--with-blas-lapack-dir=/opt/intel/cmkl/8.1.1/lib/em64t --with-shared=0<br>
<br>
<br>
Sanjay Govindjee wrote:
<blockquote cite="mid:48060CD5.1010308@ethz.ch" type="cite"><br>
<br>
<blockquote type="cite">
<blockquote type="cite"><br>
Also, with a smart enough LSF scheduler, I will be assured of getting
separate processors ie 1 core from each different processor instead of
2-4 cores from just 1 processor. In that case, if I use 1 core from
processor A and 1 core from processor B, I should be able to get a
decent speedup of more than 1, is that so?
<br>
</blockquote>
<br>
<br>
</blockquote>
<br>
You still need to be careful with the hardware you choose. If the
processor's live on the same motherboard then you still need to make
sure that
<br>
they each have their own memory bus. Otherwise you will still face
memory bottlenecks as each single core, from the different processors,
fights for bandwidth on the bus. It all
<br>
depends on the memory bus architecture of your system. In this regard,
I recommend staying away from Intel style systems. -sg
<br>
<br>
<br>
<br>
</blockquote>
</body>
</html>