[petsc-users] Offloading linear solves in time stepper to GPU
Jed Brown
jed at jedbrown.org
Sat May 30 22:50:01 CDT 2015
Harshad Sahasrabudhe <hsahasra at purdue.edu> writes:
>>
>> Surely you're familiar with this.
>
>
> Yes, I'm familiar with this. We are running on Intel Xeon E5 processor. It
> has enough bandwidth and performance.
One core saturates a sizeable fraction of the memory bandwidth for the
socket. You certainly can't expect 10x speedups when moving from 1 to
16 cores for a memory bandwidth limited application.
> Is the poor scaling due to increased iteration count? What method are you
>> using?
>
> This is exactly why we have poor scaling. We have tried KSPGMRES.
GMRES is secondary for this discussion; which preconditioner are you
using and how many iterations does it require?
> This sounds like a problem with your code (non-scalable data structure).
>
> We need to work on the algorithm for matrix assembly. In it's current
> state, one CPU ends up doing much of the work.This could be the cause of
> bad memory scaling. This doesn't contribute to the bad scaling to time
> stepping, time taken for time stepping is counted separately from assembly.
This is a linear autonomous system?
> How long does it take to solve that system stand-alone using MAGMA, including
>> the data transfers?
>
> I'm still working on these tests.
Do that first.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20150531/9a3bcb0c/attachment.pgp>
More information about the petsc-users
mailing list