[petsc-users] GPU speedup in Poisson solvers

Dominic Meiser dmeiser at txcorp.com
Tue Sep 23 09:52:23 CDT 2014


Hi Karli,

PR #178 gets you most of the way. src/ksp/ksp/examples/tests/ex32.c uses 
DMDA's which require a few additional fixes. I haven't opened a pull 
request for these yet but I will do that before Thursday.

Regarding the rebase, wouldn't it be preferable to just resolve the 
conflicts in the merge commit? In any event, I've merged these branches 
several times into local integration branches created off of recent 
petsc/master branches so I'm pretty familiar with the conflicts and how 
to resolve them. I can help with the merge or do a rebase, whichever you 
prefer.

Cheers,
Dominic


On 09/22/2014 10:37 PM, Karl Rupp wrote:
> Hi Dominic,
>
> I've got some time available at the end of this week for a merge to 
> next. Is there anything other than PR #178 needed? It currently shows 
> some conflicts, so is there any chance to rebase it on ~Thursday?
>
> Best regards,
> Karli
>
>
>
> On 09/22/2014 09:38 PM, Dominic Meiser wrote:
>> On 09/22/2014 12:57 PM, Chung Shen wrote:
>>> Dear PETSc Users,
>>>
>>> I am new to PETSc and trying to determine if GPU speedup is possible
>>> with the 3D Poisson solvers. I configured 2 copies of 'petsc-master'
>>> on a standalone machine, one with CUDA toolkit 5.0 and one without
>>> (both without MPI):
>>> Machine: HP Z820 Workstation, Redhat Enterprise Linux 5.0
>>> CPU: (x2) 8-core Xeon E5-2650 2.0GHz, 128GB Memory
>>> GPU: (x2) Tesla K20c (706MHz, 5.12GB Memory, Cuda Compatibility: 3.5,
>>> Driver: 313.09)
>>>
>>> I used 'src/ksp/ksp/examples/tests/ex32.c' as a test and was getting
>>> about 20% speedup with GPU. Is this reasonable or did I miss something?
>>>
>>> Attached is a comparison chart with two sample logs. The y-axis is the
>>> elapsed time in seconds and the x-axis corresponds to the size of the
>>> problem. In particular, I wonder if the numbers of calls to
>>> 'VecCUSPCopyTo' and 'VecCUSPCopyFrom' shown in the GPU log are 
>>> excessive?
>>>
>>> Thanks in advance for your reply.
>>>
>>> Best Regards,
>>>
>>> Chung Shen
>> A few comments:
>>
>> - To get reliable timing you should configure PETSc without debugging
>> (i.e. --with-debugging=no)
>> - The ILU preconditioning in your GPU benchmark is done on the CPU. The
>> host-device data transfers are killing performance. Can you try to run
>> with the additional option --pc_factor_mat_solver_packe cusparse? This
>> will perform the preconditioning on the GPU.
>> - If you're interested in running benchmarks in parallel you will need a
>> few patches that are not yet in petsc/master. I can put together a
>> branch that has the needed fixes.
>>
>> Cheers,
>> Dominic
>>
>


-- 
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036
Fax: 303-448-7756
www.txcorp.com



More information about the petsc-users mailing list