[petsc-users] Offloading linear solves in time stepper to GPU

Sat May 30 22:50:01 CDT 2015

Harshad Sahasrabudhe <hsahasra at purdue.edu> writes:

>>
>>  Surely you're familiar with this.
>
>
> Yes, I'm familiar with this. We are running on Intel Xeon E5 processor. It
> has enough bandwidth and performance. 

One core saturates a sizeable fraction of the memory bandwidth for the
socket.  You certainly can't expect 10x speedups when moving from 1 to
16 cores for a memory bandwidth limited application.

> Is the poor scaling due to increased iteration count?  What method are you
>> using?
>
> This is exactly why we have poor scaling. We have tried KSPGMRES.

GMRES is secondary for this discussion; which preconditioner are you
using and how many iterations does it require?

> This sounds like a problem with your code (non-scalable data structure).
>
> We need to work on the algorithm for matrix assembly. In it's current
> state, one CPU ends up doing much of the work.This could be the cause of
> bad memory scaling. This doesn't contribute to the bad scaling to time
> stepping, time taken for time stepping is counted separately from assembly.

This is a linear autonomous system?

> How long does it take to solve that system stand-alone using MAGMA, including
>> the data transfers?
>
> I'm still working on these tests.

Do that first.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150531/9a3bcb0c/attachment.pgp>