[petsc-dev] ex19 on GPU
Shiyuan
gshy2014 at gmail.com
Sat Sep 17 21:56:11 CDT 2011
On Sat, Sep 17, 2011 at 4:14 PM, Matthew Knepley <petsc-maint at mcs.anl.gov>wrote:
> On Sat, Sep 17, 2011 at 3:26 PM, Shiyuan <gshy2014 at gmail.com> wrote:
>
>> I configure petsc-dev with --with-cuda-arch=sm_20 and rebuild, but it
>> doesn't help. The performance is essentially the same. The machine has two
>> Tesla M2050, with CUDA driver 4.0 and cusp 2.0 and I use -cuda-set-device
>> to
>> choose one. Any clues what's going wrong ? configure.log is attached.
>>
>
> Can you show me the output of -cuda_show_devices?
>
CUDA device 0: Tesla M2050
CUDA device 1: Tesla M2050
>
>
> In order to investigate further, please run with
>
> -da_vec_type mpicusp -da_mat_type mpiaijcusp
>
> and then a series of sizes
>
> -da_grid_x {100,200,300,400} -da_grid_y {100,200,300,400}
>
> and send the log summaries.
>
> ./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none
-dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -log_summary -mat_no_inode
-preload off -cusp_synchronize -cuda_show_devices
CUDA device 0: Tesla M2050
CUDA device 1: Tesla M2050
Time (sec): 1.928e+01 1.00000 1.928e+01
Objects: 1.320e+02 1.00000 1.320e+02
Flops: 9.039e+09 1.00000 9.039e+09 9.039e+09
Flops/sec: 4.687e+08 1.00000 4.687e+08 4.687e+08
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
-- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 4.3905e+00 22.8% 0.0000e+00 0.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
1: SetUp: 6.0178e-02 0.3% 0.0000e+00 0.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
2: Solve: 1.4834e+01 76.9% 9.0389e+09 100.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
VecMDot 2024 1.0 8.6724e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00
0.0e+00 45 28 0 0 0 58 28 0 0 0 293
VecNorm 2096 1.0 1.5712e+00 1.0 3.35e+08 1.0 0.0e+00 0.0e+00
0.0e+00 8 4 0 0 0 11 4 0 0 0 213
VecCUSPCopyTo 2140 1.0 2.4991e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 2 0 0 0 0 0
VecCUSPCopyFrom 2135 1.0 1.0437e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 5 0 0 0 0 7 0 0 0 0 0
KSPSolve 2 1.0 1.4543e+01 1.0 8.99e+09 1.0 0.0e+00 0.0e+00
0.0e+00 75 99 0 0 0 98 99 0 0 0 618
MatMult 2092 1.0 2.8551e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00
0.0e+00 15 37 0 0 0 19 37 0 0 0 1163
MatCUSPCopyTo 4 1.0 1.6344e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none
-dmmg_nlevels 1 -da_grid_x 200 -da_grid_y 200 -log_summary -mat_no_inode
-preload off -cusp_synchronize -cuda_set_device 0
Time (sec): 5.042e+01 1.00000 5.042e+01
Objects: 1.320e+02 1.00000 1.320e+02
Flops: 8.283e+10 1.00000 8.283e+10 8.283e+10
Flops/sec: 1.643e+09 1.00000 1.643e+09 1.643e+09
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
-- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 4.6509e+00 9.2% 0.0000e+00 0.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
1: SetUp: 2.5148e-01 0.5% 0.0000e+00 0.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
2: Solve: 4.5517e+01 90.3% 8.2826e+10 100.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
VecMDot 4637 1.0 2.1155e+01 1.0 2.34e+10 1.0 0.0e+00 0.0e+00
0.0e+00 42 28 0 0 0 46 28 0 0 0 1104
VecNorm 4796 1.0 3.7077e+00 1.0 3.07e+09 1.0 0.0e+00 0.0e+00
0.0e+00 7 4 0 0 0 8 4 0 0 0 828
VecCUSPCopyTo 4840 1.0 6.1929e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecCUSPCopyFrom 4835 1.0 5.0045e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 10 0 0 0 0 11 0 0 0 0 0
KSPSolve 2 1.0 4.4465e+01 1.0 8.26e+10 1.0 0.0e+00 0.0e+00
0.0e+00 88100 0 0 0 98100 0 0 0 1859
MatMult 4792 1.0 1.4925e+01 1.0 3.05e+10 1.0 0.0e+00 0.0e+00
0.0e+00 30 37 0 0 0 33 37 0 0 0 2047
MatCUSPCopyTo 4 1.0 4.9795e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none
-dmmg_nlevels 1 -da_grid_x 300 -da_grid_y 300 -log_summary -mat_no_inode
-preload off -cusp_synchronize -cuda_set_device 0 >> ex19p.txt
Time (sec): 1.095e+02 1.00000 1.095e+02
Objects: 1.320e+02 1.00000 1.320e+02
Flops: 3.136e+11 1.00000 3.136e+11 3.136e+11
Flops/sec: 2.865e+09 1.00000 2.865e+09 2.865e+09
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
-- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 4.4090e+00 4.0% 0.0000e+00 0.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
1: SetUp: 5.6010e-01 0.5% 0.0000e+00 0.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
2: Solve: 1.0449e+02 95.5% 3.1360e+11 100.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
VecMDot 7803 1.0 3.8534e+01 1.0 8.85e+10 1.0 0.0e+00 0.0e+00
0.0e+00 35 28 0 0 0 37 28 0 0 0 2297
VecNorm 8068 1.0 6.5087e+00 1.0 1.16e+10 1.0 0.0e+00 0.0e+00
0.0e+00 6 4 0 0 0 6 4 0 0 0 1785
VecCUSPCopyTo 8112 1.0 1.0913e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecCUSPCopyFrom 8107 1.0 1.2753e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 12 0 0 0 0 12 0 0 0 0 0
KSPSolve 2 1.0 1.0228e+02 1.0 3.13e+11 1.0 0.0e+00 0.0e+00
0.0e+00 93100 0 0 0 98100 0 0 0 3062
MatMult 8064 1.0 4.5629e+01 1.0 1.16e+11 1.0 0.0e+00 0.0e+00
0.0e+00 42 37 0 0 0 44 37 0 0 0 2538
MatCUSPCopyTo 4 1.0 8.1736e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none
-dmmg_nlevels 1 -da_grid_x 400 -da_grid_y 400 -log_summary -mat_no_inode
-preload off -cusp_synchronize -cuda_set_device 0 >> ex19p.txt
Time (sec): 1.909e+02 1.00000 1.909e+02
Objects: 1.320e+02 1.00000 1.320e+02
Flops: 7.167e+11 1.00000 7.167e+11 7.167e+11
Flops/sec: 3.753e+09 1.00000 3.753e+09 3.753e+09
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages ---
-- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 4.4291e+00 2.3% 0.0000e+00 0.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
1: SetUp: 1.0122e+00 0.5% 0.0000e+00 0.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
2: Solve: 1.8551e+02 97.2% 7.1669e+11 100.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
VecMDot 10031 1.0 5.4102e+01 1.0 2.02e+11 1.0 0.0e+00 0.0e+00
0.0e+00 28 28 0 0 0 29 28 0 0 0 3739
VecNorm 10370 1.0 8.5987e+00 1.0 2.65e+10 1.0 0.0e+00 0.0e+00
0.0e+00 5 4 0 0 0 5 4 0 0 0 3087
VecCUSPCopyTo 10414 1.0 1.5341e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecCUSPCopyFrom 10409 1.0 2.5585e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 13 0 0 0 0 14 0 0 0 0 0
KSPSolve 2 1.0 1.8163e+02 1.0 7.16e+11 1.0 0.0e+00 0.0e+00
0.0e+00 95100 0 0 0 98100 0 0 0 3942
MatMult 10366 1.0 9.7032e+01 1.0 2.65e+11 1.0 0.0e+00 0.0e+00
0.0e+00 51 37 0 0 0 52 37 0 0 0 2729
MatCUSPCopyTo 4 1.0 1.3754e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
The complete log_summaries are attached.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110917/891ffc20/attachment.html>
-------------- next part --------------
./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_show_devices
CUDA device 0: Tesla M2050
CUDA device 1: Tesla M2050
lid velocity = 0.0001, prandtl # = 1, grashof # = 1
Number of SNES iterations = 2
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ex19 on a gpu00CCT- named gpu00.cct.lsu.edu with 1 processor, by sgu Sat Sep 17 20:34:38 2011
Using Petsc Development HG revision: 94fea4d40b1fcca2e886a14e7fdb916b8f6fecf3 HG Date: Sat Sep 17 00:48:29 2011 -0500
Max Max/Min Avg Total
Time (sec): 1.928e+01 1.00000 1.928e+01
Objects: 1.320e+02 1.00000 1.320e+02
Flops: 9.039e+09 1.00000 9.039e+09 9.039e+09
Flops/sec: 4.687e+08 1.00000 4.687e+08 4.687e+08
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 4.3905e+00 22.8% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
1: SetUp: 6.0178e-02 0.3% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
2: Solve: 1.4834e+01 76.9% 9.0389e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
PetscBarrier 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: SetUp
MatAssemblyBegin 1 1.0 1.1921e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 2.0661e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0
MatFDColorCreate 1 1.0 1.8455e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 31 0 0 0 0 0
--- Event Stage 2: Solve
VecDot 2 1.0 1.6947e-03 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 94
VecMDot 2024 1.0 8.6724e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00 0.0e+00 45 28 0 0 0 58 28 0 0 0 293
VecNorm 2096 1.0 1.5712e+00 1.0 3.35e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 4 0 0 0 11 4 0 0 0 213
VecScale 2092 1.0 3.7956e-01 1.0 8.37e+07 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 3 1 0 0 0 220
VecCopy 2072 1.0 3.8405e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
VecSet 70 1.0 1.3284e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 108 1.0 4.7269e-02 1.0 8.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 183
VecWAXPY 68 1.0 1.2537e-02 1.0 2.72e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 217
VecMAXPY 2092 1.0 6.4375e-01 1.0 2.71e+09 1.0 0.0e+00 0.0e+00 0.0e+00 3 30 0 0 0 4 30 0 0 0 4203
VecScatterBegin 2097 1.0 1.0270e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
VecReduceArith 2 1.0 3.7239e-03 1.0 1.60e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 43
VecReduceComm 1 1.0 9.5367e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUSPCopyTo 2140 1.0 2.4991e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0
VecCUSPCopyFrom 2135 1.0 1.0437e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0
SNESSolve 1 1.0 1.4807e+01 1.0 9.04e+09 1.0 0.0e+00 0.0e+00 0.0e+00 77100 0 0 0 100100 0 0 0 610
SNESLineSearch 2 1.0 1.2360e-02 1.0 5.81e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 470
SNESFunctionEval 3 1.0 2.7061e-03 1.0 2.52e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 931
SNESJacobianEval 2 1.0 2.4291e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 158
KSPGMRESOrthog 2024 1.0 9.2966e+00 1.0 5.09e+09 1.0 0.0e+00 0.0e+00 0.0e+00 48 56 0 0 0 63 56 0 0 0 547
KSPSetup 2 1.0 6.2943e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 1.4543e+01 1.0 8.99e+09 1.0 0.0e+00 0.0e+00 0.0e+00 75 99 0 0 0 98 99 0 0 0 618
PCSetUp 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 2024 1.0 3.8127e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 3 0 0 0 0 0
MatMult 2092 1.0 2.8551e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 37 0 0 0 19 37 0 0 0 1163
MatAssemblyBegin 2 1.0 1.8120e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 3.1030e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 2 1.0 1.8611e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorApply 2 1.0 2.4285e-01 1.0 3.85e+07 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 158
MatFDColorFunc 42 1.0 1.2794e-02 1.0 3.53e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2758
MatCUSPCopyTo 4 1.0 1.6344e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
--- Event Stage 1: SetUp
Distributed Mesh 1 0 0 0
Vector 11 3 4424 0
Vector Scatter 4 0 0 0
Index Set 29 9 46600 0
IS L to G Mapping 3 0 0 0
SNES 1 0 0 0
Krylov Solver 2 1 1064 0
Preconditioner 2 1 752 0
Matrix 3 0 0 0
Matrix FD Coloring 1 0 0 0
--- Event Stage 2: Solve
Distributed Mesh 0 1 204840 0
Vector 74 82 13242416 0
Vector Scatter 0 4 2448 0
Index Set 0 20 174720 0
IS L to G Mapping 0 3 161668 0
SNES 0 1 1288 0
Krylov Solver 0 1 18864 0
Preconditioner 0 1 952 0
Matrix 0 3 10810468 0
Matrix FD Coloring 0 1 6510068 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
#PETSc Option Table entries:
-cuda_show_devices
-cusp_synchronize
-da_grid_x 100
-da_grid_y 100
-da_mat_type mpiaijcusp
-da_vec_type mpicusp
-dmmg_nlevels 1
-log_summary
-mat_no_inode
-pc_type none
-preload off
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Sat Sep 17 11:25:49 2011
Configure options: PETSC_DIR=/home/sgu/softwares/petsc-dev PETSC_ARCH=gpu00CCT-cxx-nompi-release -with-clanguage=cxx --with-mpi=0 --download-f2cblaslapack=1 --download-f-blas-lapack=1 --with-debugging=0 --with-c2html=0 --with-valgrind-dir=~/softwares/valgrind --with-cuda=1 --with-cusp=1 --with-thrust=1 --with-cuda-arch=sm_20
-----------------------------------------
Libraries compiled on Sat Sep 17 11:25:49 2011 on gpu00.cct.lsu.edu
Machine characteristics: Linux-2.6.32-131.6.1.el6.x86_64-x86_64-with-redhat-6.1-Santiago
Using PETSc directory: /home/sgu/softwares/petsc-dev
Using PETSc arch: gpu00CCT-cxx-nompi-release
-----------------------------------------
Using C compiler: g++ -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
-----------------------------------------
Using include paths: -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/valgrind/include -I/usr/local/cuda/include -I/home/sgu/softwares/petsc-dev/include/mpiuni
-----------------------------------------
Using C linker: g++
Using libraries: -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lcublas -lcudart -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lf2clapack -lf2cblas -lm -lm -lstdc++ -ldl
-----------------------------------------
./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 200 -da_grid_y 200 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_set_device 0
lid velocity = 2.5e-05, prandtl # = 1, grashof # = 1
Number of SNES iterations = 2
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ex19 on a gpu00CCT- named gpu00.cct.lsu.edu with 1 processor, by sgu Sat Sep 17 20:36:14 2011
Using Petsc Development HG revision: 94fea4d40b1fcca2e886a14e7fdb916b8f6fecf3 HG Date: Sat Sep 17 00:48:29 2011 -0500
Max Max/Min Avg Total
Time (sec): 5.042e+01 1.00000 5.042e+01
Objects: 1.320e+02 1.00000 1.320e+02
Flops: 8.283e+10 1.00000 8.283e+10 8.283e+10
Flops/sec: 1.643e+09 1.00000 1.643e+09 1.643e+09
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 4.6509e+00 9.2% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
1: SetUp: 2.5148e-01 0.5% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
2: Solve: 4.5517e+01 90.3% 8.2826e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
PetscBarrier 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: SetUp
MatAssemblyBegin 1 1.0 1.4067e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 8.0690e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0
MatFDColorCreate 1 1.0 7.4871e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 30 0 0 0 0 0
--- Event Stage 2: Solve
VecDot 2 1.0 1.6088e-03 1.0 6.40e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 398
VecMDot 4637 1.0 2.1155e+01 1.0 2.34e+10 1.0 0.0e+00 0.0e+00 0.0e+00 42 28 0 0 0 46 28 0 0 0 1104
VecNorm 4796 1.0 3.7077e+00 1.0 3.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 7 4 0 0 0 8 4 0 0 0 828
VecScale 4792 1.0 9.7300e-01 1.0 7.67e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 788
VecCopy 4685 1.0 9.9265e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecSet 157 1.0 3.0819e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 195 1.0 9.1851e-02 1.0 6.24e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 679
VecWAXPY 155 1.0 3.3326e-02 1.0 2.48e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 744
VecMAXPY 4792 1.0 2.6158e+00 1.0 2.48e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 30 0 0 0 6 30 0 0 0 9498
VecScatterBegin 4797 1.0 4.9713e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 11 0 0 0 0 0
VecReduceArith 2 1.0 5.0960e-03 1.0 6.40e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 126
VecReduceComm 1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUSPCopyTo 4840 1.0 6.1929e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecCUSPCopyFrom 4835 1.0 5.0045e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 11 0 0 0 0 0
SNESSolve 1 1.0 4.5474e+01 1.0 8.28e+10 1.0 0.0e+00 0.0e+00 0.0e+00 90100 0 0 0 100100 0 0 0 1821
SNESLineSearch 2 1.0 2.3559e-02 1.0 2.33e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 989
SNESFunctionEval 3 1.0 8.9130e-03 1.0 1.01e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1131
SNESJacobianEval 2 1.0 9.7259e-01 1.0 1.54e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 158
KSPGMRESOrthog 4637 1.0 2.3658e+01 1.0 4.67e+10 1.0 0.0e+00 0.0e+00 0.0e+00 47 56 0 0 0 52 56 0 0 0 1975
KSPSetup 2 1.0 6.1035e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 4.4465e+01 1.0 8.26e+10 1.0 0.0e+00 0.0e+00 0.0e+00 88100 0 0 0 98100 0 0 0 1859
PCSetUp 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 4637 1.0 9.8032e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatMult 4792 1.0 1.4925e+01 1.0 3.05e+10 1.0 0.0e+00 0.0e+00 0.0e+00 30 37 0 0 0 33 37 0 0 0 2047
MatAssemblyBegin 2 1.0 2.0027e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 1.2705e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 2 1.0 7.4351e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorApply 2 1.0 9.7253e-01 1.0 1.54e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 158
MatFDColorFunc 42 1.0 5.1462e-02 1.0 1.41e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2742
MatCUSPCopyTo 4 1.0 4.9795e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
--- Event Stage 1: SetUp
Distributed Mesh 1 0 0 0
Vector 11 3 4424 0
Vector Scatter 4 0 0 0
Index Set 29 9 166600 0
IS L to G Mapping 3 0 0 0
SNES 1 0 0 0
Krylov Solver 2 1 1064 0
Preconditioner 2 1 752 0
Matrix 3 0 0 0
Matrix FD Coloring 1 0 0 0
--- Event Stage 2: Solve
Distributed Mesh 0 1 804840 0
Vector 74 82 52602416 0
Vector Scatter 0 4 2448 0
Index Set 0 20 654720 0
IS L to G Mapping 0 3 641668 0
SNES 0 1 1288 0
Krylov Solver 0 1 18864 0
Preconditioner 0 1 952 0
Matrix 0 3 43373668 0
Matrix FD Coloring 0 1 26138868 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 0
#PETSc Option Table entries:
-cuda_set_device 0
-cusp_synchronize
-da_grid_x 200
-da_grid_y 200
-da_mat_type mpiaijcusp
-da_vec_type mpicusp
-dmmg_nlevels 1
-log_summary
-mat_no_inode
-pc_type none
-preload off
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Sat Sep 17 11:25:49 2011
Configure options: PETSC_DIR=/home/sgu/softwares/petsc-dev PETSC_ARCH=gpu00CCT-cxx-nompi-release -with-clanguage=cxx --with-mpi=0 --download-f2cblaslapack=1 --download-f-blas-lapack=1 --with-debugging=0 --with-c2html=0 --with-valgrind-dir=~/softwares/valgrind --with-cuda=1 --with-cusp=1 --with-thrust=1 --with-cuda-arch=sm_20
-----------------------------------------
Libraries compiled on Sat Sep 17 11:25:49 2011 on gpu00.cct.lsu.edu
Machine characteristics: Linux-2.6.32-131.6.1.el6.x86_64-x86_64-with-redhat-6.1-Santiago
Using PETSc directory: /home/sgu/softwares/petsc-dev
Using PETSc arch: gpu00CCT-cxx-nompi-release
-----------------------------------------
Using C compiler: g++ -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
-----------------------------------------
Using include paths: -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/valgrind/include -I/usr/local/cuda/include -I/home/sgu/softwares/petsc-dev/include/mpiuni
-----------------------------------------
Using C linker: g++
Using libraries: -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lcublas -lcudart -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lf2clapack -lf2cblas -lm -lm -lstdc++ -ldl
-----------------------------------------
./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 300 -da_grid_y 300 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_set_device 0
lid velocity = 1.11111e-05, prandtl # = 1, grashof # = 1
Number of SNES iterations = 2
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ex19 on a gpu00CCT- named gpu00.cct.lsu.edu with 1 processor, by sgu Sat Sep 17 20:38:29 2011
Using Petsc Development HG revision: 94fea4d40b1fcca2e886a14e7fdb916b8f6fecf3 HG Date: Sat Sep 17 00:48:29 2011 -0500
Max Max/Min Avg Total
Time (sec): 1.095e+02 1.00000 1.095e+02
Objects: 1.320e+02 1.00000 1.320e+02
Flops: 3.136e+11 1.00000 3.136e+11 3.136e+11
Flops/sec: 2.865e+09 1.00000 2.865e+09 2.865e+09
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 4.4090e+00 4.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
1: SetUp: 5.6010e-01 0.5% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
2: Solve: 1.0449e+02 95.5% 3.1360e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
PetscBarrier 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: SetUp
MatAssemblyBegin 1 1.0 1.4067e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 1.7501e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0
MatFDColorCreate 1 1.0 1.6907e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 30 0 0 0 0 0
--- Event Stage 2: Solve
VecDot 2 1.0 1.6890e-03 1.0 1.44e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 853
VecMDot 7803 1.0 3.8534e+01 1.0 8.85e+10 1.0 0.0e+00 0.0e+00 0.0e+00 35 28 0 0 0 37 28 0 0 0 2297
VecNorm 8068 1.0 6.5087e+00 1.0 1.16e+10 1.0 0.0e+00 0.0e+00 0.0e+00 6 4 0 0 0 6 4 0 0 0 1785
VecScale 8064 1.0 1.8853e+00 1.0 2.90e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 1540
VecCopy 7851 1.0 1.9321e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecSet 263 1.0 5.4441e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 301 1.0 1.5158e-01 1.0 2.17e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1430
VecWAXPY 261 1.0 6.9037e-02 1.0 9.40e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1361
VecMAXPY 8064 1.0 7.6110e+00 1.0 9.41e+10 1.0 0.0e+00 0.0e+00 0.0e+00 7 30 0 0 0 7 30 0 0 0 12366
VecScatterBegin 8069 1.0 1.2707e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0
VecReduceArith 2 1.0 6.5138e-03 1.0 1.44e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 221
VecReduceComm 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUSPCopyTo 8112 1.0 1.0913e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecCUSPCopyFrom 8107 1.0 1.2753e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0
SNESSolve 1 1.0 1.0444e+02 1.0 3.14e+11 1.0 0.0e+00 0.0e+00 0.0e+00 95100 0 0 0 100100 0 0 0 3003
SNESLineSearch 2 1.0 3.9190e-02 1.0 5.25e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1339
SNESFunctionEval 3 1.0 1.7656e-02 1.0 2.27e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1285
SNESJacobianEval 2 1.0 2.0955e+00 1.0 3.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 165
KSPGMRESOrthog 7803 1.0 4.5761e+01 1.0 1.77e+11 1.0 0.0e+00 0.0e+00 0.0e+00 42 56 0 0 0 44 56 0 0 0 3868
KSPSetup 2 1.0 4.4107e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 1.0228e+02 1.0 3.13e+11 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 98100 0 0 0 3062
PCSetUp 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 7803 1.0 1.9026e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatMult 8064 1.0 4.5629e+01 1.0 1.16e+11 1.0 0.0e+00 0.0e+00 0.0e+00 42 37 0 0 0 44 37 0 0 0 2538
MatAssemblyBegin 2 1.0 2.0981e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 2.8598e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 2 1.0 1.9902e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorApply 2 1.0 2.0955e+00 1.0 3.46e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 165
MatFDColorFunc 42 1.0 1.1288e-01 1.0 3.18e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2813
MatCUSPCopyTo 4 1.0 8.1736e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
--- Event Stage 1: SetUp
Distributed Mesh 1 0 0 0
Vector 11 3 4424 0
Vector Scatter 4 0 0 0
Index Set 29 9 366600 0
IS L to G Mapping 3 0 0 0
SNES 1 0 0 0
Krylov Solver 2 1 1064 0
Preconditioner 2 1 752 0
Matrix 3 0 0 0
Matrix FD Coloring 1 0 0 0
--- Event Stage 2: Solve
Distributed Mesh 0 1 1804840 0
Vector 74 82 118202416 0
Vector Scatter 0 4 2448 0
Index Set 0 20 1454720 0
IS L to G Mapping 0 3 1441668 0
SNES 0 1 1288 0
Krylov Solver 0 1 18864 0
Preconditioner 0 1 952 0
Matrix 0 3 97696868 0
Matrix FD Coloring 0 1 58887668 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
#PETSc Option Table entries:
-cuda_set_device 0
-cusp_synchronize
-da_grid_x 300
-da_grid_y 300
-da_mat_type mpiaijcusp
-da_vec_type mpicusp
-dmmg_nlevels 1
-log_summary
-mat_no_inode
-pc_type none
-preload off
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Sat Sep 17 11:25:49 2011
Configure options: PETSC_DIR=/home/sgu/softwares/petsc-dev PETSC_ARCH=gpu00CCT-cxx-nompi-release -with-clanguage=cxx --with-mpi=0 --download-f2cblaslapack=1 --download-f-blas-lapack=1 --with-debugging=0 --with-c2html=0 --with-valgrind-dir=~/softwares/valgrind --with-cuda=1 --with-cusp=1 --with-thrust=1 --with-cuda-arch=sm_20
-----------------------------------------
Libraries compiled on Sat Sep 17 11:25:49 2011 on gpu00.cct.lsu.edu
Machine characteristics: Linux-2.6.32-131.6.1.el6.x86_64-x86_64-with-redhat-6.1-Santiago
Using PETSc directory: /home/sgu/softwares/petsc-dev
Using PETSc arch: gpu00CCT-cxx-nompi-release
-----------------------------------------
Using C compiler: g++ -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
-----------------------------------------
Using include paths: -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/valgrind/include -I/usr/local/cuda/include -I/home/sgu/softwares/petsc-dev/include/mpiuni
-----------------------------------------
Using C linker: g++
Using libraries: -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lcublas -lcudart -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lf2clapack -lf2cblas -lm -lm -lstdc++ -ldl
-----------------------------------------
./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 400 -da_grid_y 400 -log_summary -mat_no_inode -preload off -cusp_synchronize -cuda_set_device 0
lid velocity = 6.25e-06, prandtl # = 1, grashof # = 1
Number of SNES iterations = 2
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
./ex19 on a gpu00CCT- named gpu00.cct.lsu.edu with 1 processor, by sgu Sat Sep 17 20:42:05 2011
Using Petsc Development HG revision: 94fea4d40b1fcca2e886a14e7fdb916b8f6fecf3 HG Date: Sat Sep 17 00:48:29 2011 -0500
Max Max/Min Avg Total
Time (sec): 1.909e+02 1.00000 1.909e+02
Objects: 1.320e+02 1.00000 1.320e+02
Flops: 7.167e+11 1.00000 7.167e+11 7.167e+11
Flops/sec: 3.753e+09 1.00000 3.753e+09 3.753e+09
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 4.4291e+00 2.3% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
1: SetUp: 1.0122e+00 0.5% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
2: Solve: 1.8551e+02 97.2% 7.1669e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
PetscBarrier 1 1.0 1.1921e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: SetUp
MatAssemblyBegin 1 1.0 1.5974e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 1 1.0 3.1045e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0
MatFDColorCreate 1 1.0 3.1857e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 31 0 0 0 0 0
--- Event Stage 2: Solve
VecDot 2 1.0 1.8530e-03 1.0 2.56e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1382
VecMDot 10031 1.0 5.4102e+01 1.0 2.02e+11 1.0 0.0e+00 0.0e+00 0.0e+00 28 28 0 0 0 29 28 0 0 0 3739
VecNorm 10370 1.0 8.5987e+00 1.0 2.65e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 4 0 0 0 5 4 0 0 0 3087
VecScale 10366 1.0 2.9179e+00 1.0 6.63e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 1 0 0 0 2 1 0 0 0 2274
VecCopy 10079 1.0 2.9971e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecSet 337 1.0 7.6832e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 375 1.0 2.3210e-01 1.0 4.80e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2068
VecWAXPY 335 1.0 1.1250e-01 1.0 2.14e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1906
VecMAXPY 10366 1.0 1.5716e+01 1.0 2.15e+11 1.0 0.0e+00 0.0e+00 0.0e+00 8 30 0 0 0 8 30 0 0 0 13687
VecScatterBegin 10371 1.0 2.5508e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 13 0 0 0 0 14 0 0 0 0 0
VecReduceArith 2 1.0 8.3668e-03 1.0 2.56e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 306
VecReduceComm 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecCUSPCopyTo 10414 1.0 1.5341e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecCUSPCopyFrom 10409 1.0 2.5585e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 13 0 0 0 0 14 0 0 0 0 0
SNESSolve 1 1.0 1.8546e+02 1.0 7.17e+11 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 100100 0 0 0 3864
SNESLineSearch 2 1.0 6.2440e-02 1.0 9.33e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1495
SNESFunctionEval 3 1.0 3.0468e-02 1.0 4.03e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1323
SNESJacobianEval 2 1.0 3.7313e+00 1.0 6.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 165
KSPGMRESOrthog 10031 1.0 6.8969e+01 1.0 4.05e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 56 0 0 0 37 56 0 0 0 5865
KSPSetup 2 1.0 4.6015e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 2 1.0 1.8163e+02 1.0 7.16e+11 1.0 0.0e+00 0.0e+00 0.0e+00 95100 0 0 0 98100 0 0 0 3942
PCSetUp 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCApply 10031 1.0 2.9429e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
MatMult 10366 1.0 9.7032e+01 1.0 2.65e+11 1.0 0.0e+00 0.0e+00 0.0e+00 51 37 0 0 0 52 37 0 0 0 2729
MatAssemblyBegin 2 1.0 2.1935e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 2 1.0 5.1729e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 2 1.0 3.1707e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatFDColorApply 2 1.0 3.7312e+00 1.0 6.16e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 165
MatFDColorFunc 42 1.0 2.3831e-01 1.0 5.64e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2369
MatCUSPCopyTo 4 1.0 1.3754e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
--- Event Stage 1: SetUp
Distributed Mesh 1 0 0 0
Vector 11 3 4424 0
Vector Scatter 4 0 0 0
Index Set 29 9 646600 0
IS L to G Mapping 3 0 0 0
SNES 1 0 0 0
Krylov Solver 2 1 1064 0
Preconditioner 2 1 752 0
Matrix 3 0 0 0
Matrix FD Coloring 1 0 0 0
--- Event Stage 2: Solve
Distributed Mesh 0 1 3204840 0
Vector 74 82 210042416 0
Vector Scatter 0 4 2448 0
Index Set 0 20 2574720 0
IS L to G Mapping 0 3 2561668 0
SNES 0 1 1288 0
Krylov Solver 0 1 18864 0
Preconditioner 0 1 952 0
Matrix 0 3 173780068 0
Matrix FD Coloring 0 1 104756468 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
#PETSc Option Table entries:
-cuda_set_device 0
-cusp_synchronize
-da_grid_x 400
-da_grid_y 400
-da_mat_type mpiaijcusp
-da_vec_type mpicusp
-dmmg_nlevels 1
-log_summary
-mat_no_inode
-pc_type none
-preload off
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8
Configure run at: Sat Sep 17 11:25:49 2011
Configure options: PETSC_DIR=/home/sgu/softwares/petsc-dev PETSC_ARCH=gpu00CCT-cxx-nompi-release -with-clanguage=cxx --with-mpi=0 --download-f2cblaslapack=1 --download-f-blas-lapack=1 --with-debugging=0 --with-c2html=0 --with-valgrind-dir=~/softwares/valgrind --with-cuda=1 --with-cusp=1 --with-thrust=1 --with-cuda-arch=sm_20
-----------------------------------------
Libraries compiled on Sat Sep 17 11:25:49 2011 on gpu00.cct.lsu.edu
Machine characteristics: Linux-2.6.32-131.6.1.el6.x86_64-x86_64-with-redhat-6.1-Santiago
Using PETSc directory: /home/sgu/softwares/petsc-dev
Using PETSc arch: gpu00CCT-cxx-nompi-release
-----------------------------------------
Using C compiler: g++ -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
-----------------------------------------
Using include paths: -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/include -I/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/include -I/home/sgu/softwares/valgrind/include -I/usr/local/cuda/include -I/home/sgu/softwares/petsc-dev/include/mpiuni
-----------------------------------------
Using C linker: g++
Using libraries: -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lcublas -lcudart -Wl,-rpath,/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -L/home/sgu/softwares/petsc-dev/gpu00CCT-cxx-nompi-release/lib -lf2clapack -lf2cblas -lm -lm -lstdc++ -ldl
-----------------------------------------
More information about the petsc-dev
mailing list