<div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-size:12.8000001907349px"> Surely you're familiar with this.</span></blockquote><div><br></div><div>Yes, I'm familiar with this. We are running on Intel Xeon E5 processor. It has enough bandwidth and performance. Also, we are just running on one node currently.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-size:12.8000001907349px">Is the poor scaling due to increased iteration count?  What method are </span><span style="font-size:12.8000001907349px">you using?</span></blockquote><div><br></div><div>This is exactly why we have poor scaling. We have tried KSPGMRES. </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-size:12.8000001907349px">This sounds like a problem with your code (non-scalable data structure).</span></blockquote><div><br></div><div>We need to work on the algorithm for matrix assembly. In it's current state, one CPU ends up doing much of the work.This could be the cause of bad memory scaling. This doesn't contribute to the bad scaling to time stepping, time taken for time stepping is counted separately from assembly.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span style="font-size:12.8000001907349px">How long does it take to solve that system stand-alone using MAGMA, </span><span style="font-size:12.8000001907349px">including the data transfers?</span></blockquote><div><br></div><div>I'm still working on these tests. </div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, May 30, 2015 at 11:22 PM, Jed Brown <span dir="ltr"><<a href="mailto:jed@jedbrown.org" target="_blank">jed@jedbrown.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">Harshad Sahasrabudhe <<a href="mailto:hsahasra@purdue.edu">hsahasra@purdue.edu</a>> writes:<br>

<br>

>><br>

>> Is your intent to solve a problem that matters in a way that makes sense<br>

>> for a scientist or engineer<br>

><br>

><br>

> I want to see if we can speed up the time stepper for a large system using<br>

> GPUs. For large systems with sparse matrix of size 420,000^2, each time<br>

> step takes 341 sec on a single process and 180 seconds on 16 processes. So<br>

> the scaling isn't that good.<br>

<br>

</span> Surely you're familiar with this.<br>

<br>

<a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#computers" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html#computers</a><br>

<br>

Is the poor scaling due to increased iteration count?  What method are<br>

you using?<br>

<span class=""><br>

> We also run out of memory with more number of processes.<br>

<br>

</span>This sounds like a problem with your code (non-scalable data structure).<br>

<br>

Also, the GPU doesn't have more memory than the CPU.<br>

<br>

How long does it take to solve that system stand-alone using MAGMA,<br>

including the data transfers?<br>

</blockquote></div><br></div>