[petsc-users] [petsc-maint #127209] Re: Why does GPU solve the large sparse matrix equations only a little faster than CPU?
Jed Brown
jedbrown at mcs.anl.gov
Sun Aug 5 23:46:57 CDT 2012
On Sun, Aug 5, 2012 at 10:44 PM, Xiangze Zeng <zengshixiangze at 163.com>wrote:
> Do you mean all the computational work are done on the GPU?
> When I run ex5 with -dm_vec_type veccusp -dm_mat_type mataijcusp, it
> appears the following error:
-dm_vec_type cusp -dm_mat_type aijcusp
> ~/ex5\>./ex5 -dm_vec_type veccusp -dm_mat_type -log_summary ex5_log
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Unknown type. Check for miss-spelling or missing external
> package needed for type:
> see http://www.mcs.anl.gov/petsc/documentation/installation.html#external!
> [0]PETSC ERROR: Unknown vector type: veccusp!
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Development HG revision:
> d01946145980533f72b6500bd243b1dd3666686c HG Date: Mon Jul 30 17:03:27 2012
> -0500
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ./ex5 on a arch-cuda named hohhot by hongwang Mon Aug 6
> 12:27:19 2012
> [0]PETSC ERROR: Libraries linked from
> /usr/src/petsc/petsc-dev/arch-cuda-double/lib
> [0]PETSC ERROR: Configure run at Sat Aug 4 15:10:44 2012
> [0]PETSC ERROR: Configure options --doCleanup=1 --with-gnu-compilers=1
> --with-vendor-compilers=0 --CFLAGS=-march=x86-64 --CXXFLAGS=-march=x86-64
> --with-dynamic-loading --with-python=1 --with-debugging=0 --with-log=1
> --download-mpich=1 --with-hypre=0 --with-64-bit-indices=yes --with-x11=1
> --with-x11-include=/usr/include/X11 --download-f-blas-lapack=1
> --with-cuda=1 --with-cusp=1 --with-thrust=1 --download-txpetscgpu=1
> --with-precision=double --with-cudac="nvcc -m64" --download-txpetscgpu=1
> --with-clanguage=c --with-cuda-arch=sm_20
> ------------------------------------------------------------------------
> [0]PETSC ERROR: VecSetType() line 44 in src/vec/vec/interface/vecreg.c
> [0]PETSC ERROR: DMCreateGlobalVector_DA() line 36 in
> src/dm/impls/da/dadist.c
> [0]PETSC ERROR: DMCreateGlobalVector() line 443 in src/dm/interface/dm.c
> [0]PETSC ERROR: DMDASetUniformCoordinates() line 58 in
> src/dm/impls/da/gr1.c
> [0]PETSC ERROR: main() line 113 in src/snes/examples/tutorials/ex5.c
> application called MPI_Abort(MPI_COMM_WORLD, 86) - process 0
> [unset]: aborting job:
> application called MPI_Abort(MPI_COMM_WORLD, 86) - process 0
> Is there something wrong with the CUSP? My PETSc version is -dev, the cusp
> version I use is 0.3.1, CUDA version is 4.2.
> Zeng Xiangze
> 在 2012-08-06 03:18:58,"Matthew Knepley" <knepley at gmail.com> 写道:
> On Sun, Aug 5, 2012 at 10:24 AM, Xiangze Zeng <zengshixiangze at 163.com>
> wrote:
> Dear Matt,
> Thank you for your suggestion. I'm learning to use the GPU effectively
> step by step. I think it's useful for the novice if there is a manual about
> using PETSc with CUDA.
> Each iteration is done, the VEC will be copied to the host to evaluate the
> stopping condition, is it right?
> No, if that was true, we would have given up long ago. My guess is that
> some of your Vecs are not the correct type.
> Can you look at ex5 suing -dm_vec_type veccusp -dm_mat_type mataijcusp and
> mail petsc-maint at mcs.anl.gov?
> Matt
> Sincerely,
> Zeng Xiangze
> 在 2012-08-05 20:27:55,"Matthew Knepley" <knepley at gmail.com> 写道:
> On Sat, Aug 4, 2012 at 11:23 PM, Xiangze Zeng <zengshixiangze at 163.com>
> wrote:
> When I change the PC type to JACOBI, the KSP type to BICG, although the
> computational speed both in the GPU and CPU are higher than that when I use
> SOR+BCGS, the computational work in the GPU doesn't seem much more
> efficient, the speed only 20% higher. Is there any proposal? The
> attachments are the output of the log_summary.
> You also have to look at the log_summary:
> VecCUSPCopyTo 3967 1.0 1.3152e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0
> VecCUSPCopyFrom 3969 1.0 5.5139e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 9 0 0 0 0 9 0 0 0 0 0
> MatCUSPCopyTo 1 1.0 4.5194e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
> 1) I said to use GMRES for a reason. Listen to me. BiCG uses the
> transpose, which right now confuses the results
> 2) Look at the copies to/from the GPU. You should not be copying the
> vector 4000 times. Start simple until you understand
> everything about how the code is running. Use -pc_type none -ksp_type
> gmres and see if you can understand the results.
> Then try different KSP and PC. Trying everything at once does not help
> anyone, and it is not science.
> Matt
> Thank you!
> Zeng Xiangze
> At 2012-08-05 00:01:11,"Xiangze Zeng" <zengshixiangze at 163.com> wrote:
> JACOBI+GMRES takes 124s to solve one system on the GPU, 172s on the CPU.
> When I use JACOBI+BICG, it takes 123s on the GPU, 162s on the CPU. In
> http://www.mcs.anl.gov/petsc/features/gpus.html, I see "All of the Krylov
> methods except KSPIBCGS run on the GPU. " I don't find KSPIBCGS in the
> manual, is it KSPBCGS?
> 在 2012-08-04 23:04:55,"Matthew Knepley" <knepley at gmail.com> 写道:
> On Sat, Aug 4, 2012 at 9:42 AM, Xiangze Zeng <zengshixiangze at 163.com>
> wrote:
> Another error happens when I change the PC type. When I change it to
> PCJACOBI, it appears the following error message:
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Petsc has generated inconsistent data!
> [0]PETSC ERROR: Divide by zero!
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Development HG revision:
> d01946145980533f72b6500bd243b1dd3666686c HG Date: Mon Jul 30 17:03:27 2012
> -0500
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ../../femsolcu/./femsolcu on a arch-cuda named hohhot by
> hongwang Sat Aug 4 22:23:58 2012
> [0]PETSC ERROR: Libraries linked from
> /usr/src/petsc/petsc-dev/arch-cuda-double/lib
> [0]PETSC ERROR: Configure run at Sat Aug 4 15:10:44 2012
> [0]PETSC ERROR: Configure options --doCleanup=1 --with-gnu-compilers=1
> --with-vendor-compilers=0 --CFLAGS=-march=x86-64 --CXXFLAGS=-march=x86-64
> --with-dynamic-loading --with-python=1 --with-debugging=0 --with-log=1
> --download-mpich=1 --with-hypre=0 --with-64-bit-indices=yes --with-x11=1
> --with-x11-include=/usr/include/X11 --download-f-blas-lapack=1
> --with-cuda=1 --with-cusp=1 --with-thrust=1 --download-txpetscgpu=1
> --with-precision=double --with-cudac="nvcc -m64" --download-txpetscgpu=1
> --with-clanguage=c --with-cuda-arch=sm_20
> ------------------------------------------------------------------------
> [0]PETSC ERROR: KSPSolve_BCGS() line 105 in src/ksp/ksp/impls/bcgs/bcgs.c
> [0]PETSC ERROR: KSPSolve() line 446 in src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: sol_comp() line 39 in "unknowndirectory/"solve.c
> And when I change it to PCSACUSP, PCSACUSPPOLY, it both prompts out of
> memory(I guess it's the GPU's memory). When I change it to PCAINVCUSP, the
> result is not better than that when I don't change the type.
> This is breakdown in that algorithm. Try GMRES.
> Matt
> Does it have something to do with the KSP type? Should I look for a suited
> KSP type to match the PC type which can work on the GPU?
> 在 2012-08-04 21:44:02,"Matthew Knepley" <knepley at gmail.com> 写道:
> On Sat, Aug 4, 2012 at 5:58 AM, Xiangze Zeng <zengshixiangze at 163.com>
> wrote:
> After I rerun with "deugging=no", the CPU takes 30 minutes, GPU 22
> minutes, a little better than before. The attachment are the output of
> -log_summary.
> 1) Notice how the PCApply takes most of the time, so MatMult is not very
> important
> 2) In g_log_3, notice that every time your PC is called, the vector is
> pulled from the GPU to the CPU.
> This means we do not support that PC on the GPU
> There is a restriction on PCs since not many are coded for the GPU. Only
> work there, see http://www.mcs.anl.gov/petsc/features/gpus.html.
> Matt
> At 2012-08-04 14:40:33,"Azamat Mametjanov" <azamat.mametjanov at gmail.com>
> wrote:
> What happens if you try to re-run with "--with-debugging=no"?
> On Fri, Aug 3, 2012 at 10:00 PM, Xiangze Zeng <zengshixiangze at 163.com>
> wrote:
> Dear Matt,
> My CPU is Intel Xeon E5-2609, GPU is Nvidia GF100 [Quadro 4000].
> The size of the system is 2522469 x 2522469, and the number non-0 elements
> is 71773925, about 0.000012 of the total.
> The output of -log_summary is in the attachment. The G_log_summary is the
> output when using GPU, C_log_summary when using CPU.
> Zeng Xiangze
> 在 2012-08-03 22:28:07,"Matthew Knepley" <knepley at gmail.com> 写道:
> On Fri, Aug 3, 2012 at 9:18 AM, Xiangze Zeng <zengshixiangze at 163.com>
> wrote:
> Dear all,
> When I use the CPU solve the equations, it takes 78 minutes, when I change
> to use GPU, it uses 64 minutes, only 15 minutes faster. I see some paper
> say when using PETCs with GPU to solve the large sparse matrix equations,
> it can be several times faster? What's the matter?
> For all performance questions, we at least need the output of
> -log_summary. However, we would also need to know
> - The size and sparsity of your system
> - The CPU and GPU you used (saying anything without knowing this is
> impossible)
> Matt
> Thank you!
> Sincerely,
> Zeng Xiangze
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
> --
> Mailbox 379, School of Physics
> Shandong University
> 27 South Shanda Road, Jinan, Shandong, P.R.China, 250100
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
> --
> Mailbox 379, School of Physics
> Shandong University
> 27 South Shanda Road, Jinan, Shandong, P.R.China, 250100
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
> --
> Mailbox 379, School of Physics
> Shandong University
> 27 South Shanda Road, Jinan, Shandong, P.R.China, 250100
> --
> Mailbox 379, School of Physics
> Shandong University
> 27 South Shanda Road, Jinan, Shandong, P.R.China, 250100
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
> --
> Mailbox 379, School of Physics
> Shandong University
> 27 South Shanda Road, Jinan, Shandong, P.R.China, 250100
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
> --
> Mailbox 379, School of Physics
> Shandong University
> 27 South Shanda Road, Jinan, Shandong, P.R.China, 250100
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120805/8814cc25/attachment.html>
More information about the petsc-users
mailing list