[petsc-users] Why does GPU solve the large sparse matrix equations only a little faster than CPU?
Matthew Knepley
knepley at gmail.com
Sat Aug 4 10:04:55 CDT 2012
On Sat, Aug 4, 2012 at 9:42 AM, Xiangze Zeng <zengshixiangze at 163.com> wrote:
> Another error happens when I change the PC type. When I change it to
> PCJACOBI, it appears the following error message:
>
> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Petsc has generated inconsistent data!
> [0]PETSC ERROR: Divide by zero!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Development HG revision:
> d01946145980533f72b6500bd243b1dd3666686c HG Date: Mon Jul 30 17:03:27 2012
> -0500
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ../../femsolcu/./femsolcu on a arch-cuda named hohhot by
> hongwang Sat Aug 4 22:23:58 2012
> [0]PETSC ERROR: Libraries linked from
> /usr/src/petsc/petsc-dev/arch-cuda-double/lib
> [0]PETSC ERROR: Configure run at Sat Aug 4 15:10:44 2012
> [0]PETSC ERROR: Configure options --doCleanup=1 --with-gnu-compilers=1
> --with-vendor-compilers=0 --CFLAGS=-march=x86-64 --CXXFLAGS=-march=x86-64
> --with-dynamic-loading --with-python=1 --with-debugging=0 --with-log=1
> --download-mpich=1 --with-hypre=0 --with-64-bit-indices=yes --with-x11=1
> --with-x11-include=/usr/include/X11 --download-f-blas-lapack=1
> --with-cuda=1 --with-cusp=1 --with-thrust=1 --download-txpetscgpu=1
> --with-precision=double --with-cudac="nvcc -m64" --download-txpetscgpu=1
> --with-clanguage=c --with-cuda-arch=sm_20
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: KSPSolve_BCGS() line 105 in src/ksp/ksp/impls/bcgs/bcgs.c
> [0]PETSC ERROR: KSPSolve() line 446 in src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: sol_comp() line 39 in "unknowndirectory/"solve.c
>
> And when I change it to PCSACUSP, PCSACUSPPOLY, it both prompts out of
> memory(I guess it's the GPU's memory). When I change it to PCAINVCUSP, the
> result is not better than that when I don't change the type.
>
This is breakdown in that algorithm. Try GMRES.
Matt
> Does it have something to do with the KSP type? Should I look for a suited
> KSP type to match the PC type which can work on the GPU?
>
> 在 2012-08-04 21:44:02,"Matthew Knepley" <knepley at gmail.com> 写道:
>
> On Sat, Aug 4, 2012 at 5:58 AM, Xiangze Zeng <zengshixiangze at 163.com>wrote:
>
>> After I rerun with "deugging=no", the CPU takes 30 minutes, GPU 22
>> minutes, a little better than before. The attachment are the output of
>> -log_summary.
>>
>
> 1) Notice how the PCApply takes most of the time, so MatMult is not very
> important
>
> 2) In g_log_3, notice that every time your PC is called, the vector is
> pulled from the GPU to the CPU.
> This means we do not support that PC on the GPU
>
> There is a restriction on PCs since not many are coded for the GPU.
> Only PCJACOBI, PCSACUSP, PCSACUSPPOLY, and PCAINVCUSP
> work there, see http://www.mcs.anl.gov/petsc/features/gpus.html.
>
> Matt
>
>
>> At 2012-08-04 14:40:33,"Azamat Mametjanov" <azamat.mametjanov at gmail.com>
>> wrote:
>>
>> What happens if you try to re-run with "--with-debugging=no"?
>>
>> On Fri, Aug 3, 2012 at 10:00 PM, Xiangze Zeng <zengshixiangze at 163.com>wrote:
>>
>>> Dear Matt,
>>>
>>> My CPU is Intel Xeon E5-2609, GPU is Nvidia GF100 [Quadro 4000].
>>> The size of the system is 2522469 x 2522469, and the number non-0
>>> elements is 71773925, about 0.000012 of the total.
>>> The output of -log_summary is in the attachment. The G_log_summary is
>>> the output when using GPU, C_log_summary when using CPU.
>>>
>>> Zeng Xiangze
>>>
>>> 在 2012-08-03 22:28:07,"Matthew Knepley" <knepley at gmail.com> 写道:
>>>
>>> On Fri, Aug 3, 2012 at 9:18 AM, Xiangze Zeng <zengshixiangze at 163.com>wrote:
>>>
>>>> Dear all,
>>>>
>>>> When I use the CPU solve the equations, it takes 78 minutes, when I
>>>> change to use GPU, it uses 64 minutes, only 15 minutes faster. I see some
>>>> paper say when using PETCs with GPU to solve the large sparse matrix
>>>> equations, it can be several times faster? What's the matter?
>>>>
>>>
>>> For all performance questions, we at least need the output of
>>> -log_summary. However, we would also need to know
>>>
>>> - The size and sparsity of your system
>>>
>>> - The CPU and GPU you used (saying anything without knowing this is
>>> impossible)
>>>
>>> Matt
>>>
>>>
>>>> Thank you!
>>>>
>>>> Sincerely,
>>>> Zeng Xiangze
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Mailbox 379, School of Physics
>> Shandong University
>> 27 South Shanda Road, Jinan, Shandong, P.R.China, 250100
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
> --
> Mailbox 379, School of Physics
> Shandong University
> 27 South Shanda Road, Jinan, Shandong, P.R.China, 250100
>
>
>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120804/5e27b0bd/attachment.html>
More information about the petsc-users
mailing list