On Tue, Mar 13, 2012 at 3:22 PM, Dave May <span dir="ltr"><<a href="mailto:dave.mayhem23@gmail.com">dave.mayhem23@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hey Matt,<br>
Do you have any guidance or ideas regarding how large the subdomains<br>
should be to offset the cost of this copy?<br></blockquote><div><br></div><div>They have to be substantial, but it depends on your arithmetic intensity. What I can say</div><div>for sure is that we maxed out the memory on the machines we have (like my laptop) and</div>
<div>saw 3-5x speed up with SpMV.</div><div><br></div><div>Also, I saw a TON of overhead on my FEM benchmark, even though it is screaming on</div><div>the GPU, but I now think that is cudaMalloc()/Free() rather than all communication.</div>
<div><br></div><div> Matt</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Cheers,<br>
Dave<br>
<br>
<br>
On 13 March 2012 15:03, Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br>
> On Tue, Mar 13, 2012 at 8:59 AM, Xiangze Zeng <<a href="mailto:zengshixiangze@163.com">zengshixiangze@163.com</a>><br>
> wrote:<br>
>><br>
>> Hi, Jed.<br>
>> At the beginning and end of the codes for setting the matrices values, I<br>
>> add "printf", and compute the time of this period. It is much longer than<br>
>> that when I don't use the GPU. I just guess the time is used for copping<br>
>> data. My PCTYPE is sor. And 2000 iterations. Do you have any suggestion<br>
>> about this?<br>
><br>
><br>
> 1) You do not have to guess. Use -log_summary, and there are explicit events<br>
> for copying to the GPU<br>
><br>
> 2) GPUs only really become effective for large systems due to this overhead.<br>
> I suggest looking at the<br>
> performance and overhead as a function of system size.<br>
><br>
> Matt<br>
><br>
>><br>
>> Zeng<br>
>><br>
>> 在 2012-03-13 20:12:09,"Jed Brown" <<a href="mailto:jedbrown@mcs.anl.gov">jedbrown@mcs.anl.gov</a>> 写道:<br>
>><br>
>> 2012/3/13 Xiangze Zeng <<a href="mailto:zengshixiangze@163.com">zengshixiangze@163.com</a>><br>
>>><br>
>>> After I configure PETSc using --with-precision=single, I can run both<br>
>>> ex19 and my own code. Good news! But it seems lots of time is using for<br>
>>> copping the data from CPU to GPU.<br>
>><br>
>><br>
>> How are you measuring? What preconditioner are you using and how many<br>
>> iterations are typically required?<br>
>><br>
>><br>
>><br>
><br>
><br>
<span class="HOEnZb"><font color="#888888">><br>
> --<br>
> What most experimenters take for granted before they begin their experiments<br>
> is infinitely more interesting than any results to which their experiments<br>
> lead.<br>
> -- Norbert Wiener<br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>