<div class="gmail_quote">On Sun, Aug 5, 2012 at 10:44 PM, Xiangze Zeng <span dir="ltr"><<a href="mailto:zengshixiangze@163.com" target="_blank">zengshixiangze@163.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5">Do you mean all the computational work are done on the GPU?<br>

<br>

<br>

When I run ex5 with  -dm_vec_type veccusp -dm_mat_type mataijcusp, it appears the following error:<br></div></div></blockquote><div><br></div><div>-dm_vec_type cusp -dm_mat_type aijcusp</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5">

<br>

<br>

~/ex5\>./ex5 -dm_vec_type veccusp -dm_mat_type -log_summary ex5_log<br>

[0]PETSC ERROR: --------------------- Error Message ------------------------------------<br>

[0]PETSC ERROR: Unknown type. Check for miss-spelling or missing external package needed for type:<br>

see <a href="http://www.mcs.anl.gov/petsc/documentation/installation.html#external" target="_blank">http://www.mcs.anl.gov/petsc/documentation/installation.html#external</a>!<br>

[0]PETSC ERROR: Unknown vector type: veccusp!<br>

[0]PETSC ERROR: ------------------------------------------------------------------------<br>

[0]PETSC ERROR: Petsc Development HG revision: d01946145980533f72b6500bd243b1dd3666686c  HG Date: Mon Jul 30 17:03:27 2012 -0500<br>

[0]PETSC ERROR: See docs/changes/index.html for recent updates.<br>

[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.<br>

[0]PETSC ERROR: See docs/index.html for manual pages.<br>

[0]PETSC ERROR: ------------------------------------------------------------------------<br>

[0]PETSC ERROR: ./ex5 on a arch-cuda named hohhot by hongwang Mon Aug  6 12:27:19 2012<br>

[0]PETSC ERROR: Libraries linked from /usr/src/petsc/petsc-dev/arch-cuda-double/lib<br>

[0]PETSC ERROR: Configure run at Sat Aug  4 15:10:44 2012<br>

[0]PETSC ERROR: Configure options --doCleanup=1 --with-gnu-compilers=1 --with-vendor-compilers=0 --CFLAGS=-march=x86-64 --CXXFLAGS=-march=x86-64 --with-dynamic-loading --with-python=1 --with-debugging=0 --with-log=1 --download-mpich=1 --with-hypre=0 --with-64-bit-indices=yes --with-x11=1 --with-x11-include=/usr/include/X11 --download-f-blas-lapack=1 --with-cuda=1 --with-cusp=1 --with-thrust=1 --download-txpetscgpu=1 --with-precision=double --with-cudac="nvcc -m64" --download-txpetscgpu=1 --with-clanguage=c --with-cuda-arch=sm_20<br>


[0]PETSC ERROR: ------------------------------------------------------------------------<br>

[0]PETSC ERROR: VecSetType() line 44 in src/vec/vec/interface/vecreg.c<br>

[0]PETSC ERROR: DMCreateGlobalVector_DA() line 36 in src/dm/impls/da/dadist.c<br>

[0]PETSC ERROR: DMCreateGlobalVector() line 443 in src/dm/interface/dm.c<br>

[0]PETSC ERROR: DMDASetUniformCoordinates() line 58 in src/dm/impls/da/gr1.c<br>

[0]PETSC ERROR: main() line 113 in src/snes/examples/tutorials/ex5.c<br>

application called MPI_Abort(MPI_COMM_WORLD, 86) - process 0<br>

[unset]: aborting job:<br>

application called MPI_Abort(MPI_COMM_WORLD, 86) - process 0<br>

<br>

<br>

Is there something wrong with the CUSP? My PETSc version is -dev, the cusp version I use is 0.3.1, CUDA version is 4.2.<br>

<br>

<br>

Zeng Xiangze<br>

在 2012-08-06 03:18:58，"Matthew Knepley" <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> 写道：<br>

On Sun, Aug 5, 2012 at 10:24 AM, Xiangze Zeng <<a href="mailto:zengshixiangze@163.com">zengshixiangze@163.com</a>> wrote:<br>

<br>

Dear Matt,<br>

<br>

<br>

Thank you for your suggestion. I'm learning to use the GPU effectively step by step. I think it's useful for the novice if there is a manual about using PETSc with CUDA.<br>

Each iteration is done, the VEC will be copied to the host to evaluate the stopping condition, is it right?<br>

<br>

<br>

No, if that was true, we would have given up long ago. My guess is that some of your Vecs are not the correct type.<br>

Can you look at ex5 suing -dm_vec_type veccusp -dm_mat_type mataijcusp and mail <a href="mailto:petsc-maint@mcs.anl.gov">petsc-maint@mcs.anl.gov</a>?<br>

<br>

<br>

   Matt<br>

<br>

Sincerely,<br>

Zeng Xiangze<br>

<br>

<br>

<br>

在 2012-08-05 20:27:55，"Matthew Knepley" <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> 写道：<br>

On Sat, Aug 4, 2012 at 11:23 PM, Xiangze Zeng <<a href="mailto:zengshixiangze@163.com">zengshixiangze@163.com</a>> wrote:<br>

<br>

When I change the PC type to JACOBI, the KSP type to BICG, although the computational speed both in the GPU and CPU are higher than that when I use SOR+BCGS, the computational work in the GPU doesn't seem much more efficient, the speed only 20% higher. Is there any proposal? The attachments are the output of the log_summary.<br>


<br>

<br>

You also have to look at the log_summary:<br>

<br>

<br>

VecCUSPCopyTo       3967 1.0 1.3152e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0   2  0  0  0  0     0<br>

VecCUSPCopyFrom     3969 1.0 5.5139e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0<br>

MatCUSPCopyTo          1 1.0 4.5194e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0<br>

<br>

<br>

1) I said to use GMRES for a reason. Listen to me. BiCG uses the transpose, which right now confuses the results<br>

<br>

<br>

2) Look at the copies to/from the GPU. You should not be copying the vector 4000 times. Start simple until you understand<br>

    everything about how the code is running. Use -pc_type none -ksp_type gmres and see if you can understand the results.<br>

    Then try different KSP and PC. Trying everything at once does not help anyone, and it is not science.<br>

<br>

<br>

    Matt<br>

<br>

Thank you!<br>

<br>

<br>

Zeng Xiangze<br>

<br>

At 2012-08-05 00:01:11,"Xiangze Zeng" <<a href="mailto:zengshixiangze@163.com">zengshixiangze@163.com</a>> wrote:<br>

<br>

JACOBI+GMRES takes 124s to solve one system on the GPU, 172s on the CPU. When I use JACOBI+BICG, it takes 123s on the GPU, 162s on the CPU. In <a href="http://www.mcs.anl.gov/petsc/features/gpus.html" target="_blank">http://www.mcs.anl.gov/petsc/features/gpus.html</a>, I see "All of the Krylov methods except KSPIBCGS run on the GPU. "  I don't find KSPIBCGS in the manual, is it KSPBCGS?<br>


<br>

在 2012-08-04 23:04:55，"Matthew Knepley" <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> 写道：<br>

On Sat, Aug 4, 2012 at 9:42 AM, Xiangze Zeng <<a href="mailto:zengshixiangze@163.com">zengshixiangze@163.com</a>> wrote:<br>

<br>

Another error happens when I change the PC type. When I change it to PCJACOBI,  it appears the following error message:<br>

<br>

<br>

[0]PETSC ERROR: --------------------- Error Message ------------------------------------<br>

[0]PETSC ERROR: Petsc has generated inconsistent data!<br>

[0]PETSC ERROR: Divide by zero!<br>

[0]PETSC ERROR: ------------------------------------------------------------------------<br>

[0]PETSC ERROR: Petsc Development HG revision: d01946145980533f72b6500bd243b1dd3666686c  HG Date: Mon Jul 30 17:03:27 2012 -0500<br>

[0]PETSC ERROR: See docs/changes/index.html for recent updates.<br>

[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.<br>

[0]PETSC ERROR: See docs/index.html for manual pages.<br>

[0]PETSC ERROR: ------------------------------------------------------------------------<br>

[0]PETSC ERROR: ../../femsolcu/./femsolcu on a arch-cuda named hohhot by hongwang Sat Aug  4 22:23:58 2012<br>

[0]PETSC ERROR: Libraries linked from /usr/src/petsc/petsc-dev/arch-cuda-double/lib<br>

[0]PETSC ERROR: Configure run at Sat Aug  4 15:10:44 2012<br>

[0]PETSC ERROR: Configure options --doCleanup=1 --with-gnu-compilers=1 --with-vendor-compilers=0 --CFLAGS=-march=x86-64 --CXXFLAGS=-march=x86-64 --with-dynamic-loading --with-python=1 --with-debugging=0 --with-log=1 --download-mpich=1 --with-hypre=0 --with-64-bit-indices=yes --with-x11=1 --with-x11-include=/usr/include/X11 --download-f-blas-lapack=1 --with-cuda=1 --with-cusp=1 --with-thrust=1 --download-txpetscgpu=1 --with-precision=double --with-cudac="nvcc -m64" --download-txpetscgpu=1 --with-clanguage=c --with-cuda-arch=sm_20<br>


[0]PETSC ERROR: ------------------------------------------------------------------------<br>

[0]PETSC ERROR: KSPSolve_BCGS() line 105 in src/ksp/ksp/impls/bcgs/bcgs.c<br>

[0]PETSC ERROR: KSPSolve() line 446 in src/ksp/ksp/interface/itfunc.c<br>

[0]PETSC ERROR: sol_comp() line 39 in "unknowndirectory/"solve.c<br>

<br>

<br>

 And when I change it to PCSACUSP, PCSACUSPPOLY, it both prompts out of memory(I guess it's the GPU's memory). When I change it to  PCAINVCUSP, the result is not better than that when I don't change the type.<br>


<br>

<br>

This is breakdown in that algorithm. Try GMRES.<br>

<br>

<br>

   Matt<br>

<br>

Does it have something to do with the KSP type? Should I look for a suited KSP type to match the PC type which can work on the GPU?<br>

<br>

在 2012-08-04 21:44:02，"Matthew Knepley" <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> 写道：<br>

On Sat, Aug 4, 2012 at 5:58 AM, Xiangze Zeng <<a href="mailto:zengshixiangze@163.com">zengshixiangze@163.com</a>> wrote:<br>

<br>

After I rerun with "deugging=no", the CPU takes 30 minutes, GPU 22 minutes, a little better than before. The attachment are the output of -log_summary.<br>

<br>

<br>

1) Notice how the PCApply takes most of the time, so MatMult is not very important<br>

<br>

<br>

2) In g_log_3, notice that every time your PC is called, the vector is pulled from the GPU to the CPU.<br>

    This means we do not support that PC on the GPU<br>

<br>

<br>

There is a restriction on PCs since not many are coded for the GPU. Only PCJACOBI, PCSACUSP, PCSACUSPPOLY, and PCAINVCUSP<br>

work there, see <a href="http://www.mcs.anl.gov/petsc/features/gpus.html" target="_blank">http://www.mcs.anl.gov/petsc/features/gpus.html</a>.<br>

<br>

<br>

   Matt<br>

<br>

At 2012-08-04 14:40:33,"Azamat Mametjanov" <<a href="mailto:azamat.mametjanov@gmail.com">azamat.mametjanov@gmail.com</a>> wrote:<br>

What happens if you try to re-run with "--with-debugging=no"?<br>

<br>

<br>

On Fri, Aug 3, 2012 at 10:00 PM, Xiangze Zeng <<a href="mailto:zengshixiangze@163.com">zengshixiangze@163.com</a>> wrote:<br>

<br>

Dear Matt,<br>

<br>

<br>

My CPU is Intel Xeon E5-2609, GPU is Nvidia GF100 [Quadro 4000].<br>

The size of the system is 2522469 x 2522469, and the number non-0 elements is 71773925, about 0.000012 of the total.<br>

The output of -log_summary is in the attachment. The G_log_summary is the output when using GPU, C_log_summary when using CPU.<br>

<br>

<br>

Zeng Xiangze<br>

<br>

<br>

在 2012-08-03 22:28:07，"Matthew Knepley" <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> 写道：<br>

<br>

On Fri, Aug 3, 2012 at 9:18 AM, Xiangze Zeng <<a href="mailto:zengshixiangze@163.com">zengshixiangze@163.com</a>> wrote:<br>

<br>

Dear all,<br>

<br>

<br>

When I use the CPU solve the equations, it takes 78 minutes, when I change to use GPU, it uses 64 minutes, only 15 minutes faster. I see some paper say when using PETCs with GPU to solve the large sparse matrix equations, it can be several times faster? What's the matter?<br>


<br>

<br>

For all performance questions, we at least need the output of -log_summary. However, we would also need to know<br>

<br>

<br>

  - The size and sparsity of your system<br>

<br>

<br>

  - The CPU and GPU you used (saying anything without knowing this is impossible)<br>

<br>

<br>

   Matt<br>

<br>

Thank you!<br>

<br>

<br>

Sincerely,<br>

Zeng Xiangze<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

--<br>

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

--<br>

Mailbox 379, School of Physics<br>

Shandong University<br>

27 South Shanda Road, Jinan, Shandong, P.R.China, 250100<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

--<br>

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>

<br>

<br>

<br>

--<br>

Mailbox 379, School of Physics<br>

Shandong University<br>

27 South Shanda Road, Jinan, Shandong, P.R.China, 250100<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

--<br>

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>

<br>

<br>

<br>

--<br>

Mailbox 379, School of Physics<br>

Shandong University<br>

27 South Shanda Road, Jinan, Shandong, P.R.China, 250100<br>

<br>

<br>

<br>

<br>

<br>

--<br>

Mailbox 379, School of Physics<br>

Shandong University<br>

27 South Shanda Road, Jinan, Shandong, P.R.China, 250100<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

--<br>

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>

<br>

<br>

<br>

--<br>

Mailbox 379, School of Physics<br>

Shandong University<br>

27 South Shanda Road, Jinan, Shandong, P.R.China, 250100<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

<br>

--<br>

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener<br>

<br>

<br>

<br>

--<br>

Mailbox 379, School of Physics<br>

Shandong University<br>

27 South Shanda Road, Jinan, Shandong, P.R.China, 250100<br>

</div></div></blockquote></div><br>