<br><br><br><br><div class="gmail_quote">On Sat, Sep 17, 2011 at 4:14 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:petsc-maint@mcs.anl.gov" target="_blank">petsc-maint@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div>On Sat, Sep 17, 2011 at 3:26 PM, Shiyuan <span dir="ltr"><<a href="mailto:gshy2014@gmail.com" target="_blank">gshy2014@gmail.com</a>></span> wrote:<br>
</div><div class="gmail_quote"><div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
I configure petsc-dev with --with-cuda-arch=sm_20 and rebuild, but it<br>
doesn't help. The performance is essentially the same. The machine has two<br>
Tesla M2050, with CUDA driver 4.0 and cusp 2.0 and I use -cuda-set-device to<br>
choose one. Any clues what's going wrong ? configure.log is attached.<br></blockquote><div><br></div></div><div>Can you show me the output of -cuda_show_devices?</div></div></blockquote><div>CUDA device 0: Tesla M2050<br>
CUDA device 1: Tesla M2050 <br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="gmail_quote"><div><div> </div><br></div><div>
In order to investigate further, please run with</div>
<div><div><br></div><div> -da_vec_type mpicusp -da_mat_type mpiaijcusp</div>
<div><br></div></div><div>and then a series of sizes</div><div><br></div><div> -da_grid_x {100,200,300,400} -da_grid_y {100,200,300,400}</div><div><br></div><div>and send the log summaries.</div><div><br></div></div></blockquote>
<div>./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none
-dmmg_nlevels 1 -da_grid_x 100 -da_grid_y 100 -log_summary -mat_no_inode
-preload off -cusp_synchronize -cuda_show_devices<br>
CUDA device 0: Tesla M2050<br>
CUDA device 1: Tesla M2050<br>
<br>
Time (sec): 1.928e+01 1.00000 1.928e+01<br>
Objects: 1.320e+02 1.00000 1.320e+02<br>
Flops: 9.039e+09 1.00000 9.039e+09 9.039e+09<br>
Flops/sec: 4.687e+08 1.00000 4.687e+08 4.687e+08<br>
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --<br>
Avg %Total Avg %Total counts %Total Avg %Total counts %Total<br>
0: Main Stage: 4.3905e+00 22.8% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br>
1: SetUp: 6.0178e-02 0.3% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br>
2: Solve: 1.4834e+01 76.9% 9.0389e+09 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br>
<br>
------------------------------------------------------------------------------------------------------------------------<br>
VecMDot 2024 1.0 8.6724e+00 1.0 2.54e+09 1.0 0.0e+00 0.0e+00 0.0e+00 45 28 0 0 0 58 28 0 0 0 293<br>
VecNorm 2096 1.0 1.5712e+00 1.0 3.35e+08 1.0 0.0e+00 0.0e+00 0.0e+00 8 4 0 0 0 11 4 0 0 0 213<br>
VecCUSPCopyTo 2140 1.0 2.4991e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0<br>
VecCUSPCopyFrom 2135 1.0 1.0437e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 5 0 0 0 0 7 0 0 0 0 0<br>
KSPSolve 2 1.0 1.4543e+01 1.0 8.99e+09 1.0 0.0e+00 0.0e+00 0.0e+00 75 99 0 0 0 98 99 0 0 0 618<br>
MatMult 2092 1.0 2.8551e+00 1.0 3.32e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 37 0 0 0 19 37 0 0 0 1163<br>
MatCUSPCopyTo 4 1.0 1.6344e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
<br>
<br>
<br>
<br>
./ex19 -da_vec_type mpicusp -da_mat_type mpiaijcusp -pc_type none
-dmmg_nlevels 1 -da_grid_x 200 -da_grid_y 200 -log_summary -mat_no_inode
-preload off -cusp_synchronize -cuda_set_device 0<br>
<br>
Time (sec): 5.042e+01 1.00000 5.042e+01<br>
Objects: 1.320e+02 1.00000 1.320e+02<br>
Flops: 8.283e+10 1.00000 8.283e+10 8.283e+10<br>
Flops/sec: 1.643e+09 1.00000 1.643e+09 1.643e+09<br>
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --<br>
Avg %Total Avg %Total counts %Total Avg %Total counts %Total<br>
0: Main Stage: 4.6509e+00 9.2% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br>
1: SetUp: 2.5148e-01 0.5% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br>
2: Solve: 4.5517e+01 90.3% 8.2826e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br>
<br>
------------------------------------------------------------------------------------------------------------------------<br>
VecMDot 4637 1.0 2.1155e+01 1.0 2.34e+10 1.0 0.0e+00 0.0e+00 0.0e+00 42 28 0 0 0 46 28 0 0 0 1104<br>
VecNorm 4796 1.0 3.7077e+00 1.0 3.07e+09 1.0 0.0e+00 0.0e+00 0.0e+00 7 4 0 0 0 8 4 0 0 0 828<br>
VecCUSPCopyTo 4840 1.0 6.1929e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0<br>
VecCUSPCopyFrom 4835 1.0 5.0045e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 11 0 0 0 0 0<br>
KSPSolve 2 1.0 4.4465e+01 1.0 8.26e+10 1.0 0.0e+00 0.0e+00 0.0e+00 88100 0 0 0 98100 0 0 0 1859<br>
MatMult 4792 1.0 1.4925e+01 1.0 3.05e+10 1.0 0.0e+00 0.0e+00 0.0e+00 30 37 0 0 0 33 37 0 0 0 2047<br>
MatCUSPCopyTo 4 1.0 4.9795e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
<br>
<br>
<br>
<br>
<br>
./ex19 -da_vec_type mpicusp -da_mat_type
mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 300 -da_grid_y 300
-log_summary -mat_no_inode -preload off -cusp_synchronize
-cuda_set_device 0 >> ex19p.txt<br>
Time (sec): 1.095e+02 1.00000 1.095e+02<br>
Objects: 1.320e+02 1.00000 1.320e+02<br>
Flops: 3.136e+11 1.00000 3.136e+11 3.136e+11<br>
Flops/sec: 2.865e+09 1.00000 2.865e+09 2.865e+09<br>
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --<br>
Avg %Total Avg %Total counts %Total Avg %Total counts %Total<br>
0: Main Stage: 4.4090e+00 4.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br>
1: SetUp: 5.6010e-01 0.5% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br>
2: Solve: 1.0449e+02 95.5% 3.1360e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br>
<br>
------------------------------------------------------------------------------------------------------------------------<br>
VecMDot 7803 1.0 3.8534e+01 1.0 8.85e+10 1.0 0.0e+00 0.0e+00 0.0e+00 35 28 0 0 0 37 28 0 0 0 2297<br>
VecNorm 8068 1.0 6.5087e+00 1.0 1.16e+10 1.0 0.0e+00 0.0e+00 0.0e+00 6 4 0 0 0 6 4 0 0 0 1785<br>
VecCUSPCopyTo 8112 1.0 1.0913e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0<br>
VecCUSPCopyFrom 8107 1.0 1.2753e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 12 0 0 0 0 12 0 0 0 0 0<br>
KSPSolve 2 1.0 1.0228e+02 1.0 3.13e+11 1.0 0.0e+00 0.0e+00 0.0e+00 93100 0 0 0 98100 0 0 0 3062<br>
MatMult 8064 1.0 4.5629e+01 1.0 1.16e+11 1.0 0.0e+00 0.0e+00 0.0e+00 42 37 0 0 0 44 37 0 0 0 2538<br>
MatCUSPCopyTo 4 1.0 8.1736e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br>
<br>
<br>
<br>
<br>
./ex19 -da_vec_type mpicusp -da_mat_type
mpiaijcusp -pc_type none -dmmg_nlevels 1 -da_grid_x 400 -da_grid_y 400
-log_summary -mat_no_inode -preload off -cusp_synchronize
-cuda_set_device 0 >> ex19p.txt<br>
<br>
Time (sec): 1.909e+02 1.00000 1.909e+02<br>
Objects: 1.320e+02 1.00000 1.320e+02<br>
Flops: 7.167e+11 1.00000 7.167e+11 7.167e+11<br>
Flops/sec: 3.753e+09 1.00000 3.753e+09 3.753e+09<br>
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --<br>
Avg %Total Avg %Total counts %Total Avg %Total counts %Total<br>
0: Main Stage: 4.4291e+00 2.3% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br>
1: SetUp: 1.0122e+00 0.5% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br>
2: Solve: 1.8551e+02 97.2% 7.1669e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%<br>
<br>
------------------------------------------------------------------------------------------------------------------------<br>
VecMDot 10031 1.0 5.4102e+01 1.0 2.02e+11 1.0 0.0e+00 0.0e+00 0.0e+00 28 28 0 0 0 29 28 0 0 0 3739<br>
VecNorm 10370 1.0 8.5987e+00 1.0 2.65e+10 1.0 0.0e+00 0.0e+00 0.0e+00 5 4 0 0 0 5 4 0 0 0 3087<br>
VecCUSPCopyTo 10414 1.0 1.5341e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0<br>
VecCUSPCopyFrom 10409 1.0 2.5585e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 13 0 0 0 0 14 0 0 0 0 0<br>
KSPSolve 2 1.0 1.8163e+02 1.0 7.16e+11 1.0 0.0e+00 0.0e+00 0.0e+00 95100 0 0 0 98100 0 0 0 3942<br>
MatMult 10366 1.0 9.7032e+01 1.0 2.65e+11 1.0 0.0e+00 0.0e+00 0.0e+00 51 37 0 0 0 52 37 0 0 0 2729<br>
MatCUSPCopyTo 4 1.0 1.3754e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0<br><br>
The complete log_summaries are attached. <br></div></div>