<br><br>On Monday, February 29, 2016, Karl Rupp <<a href="mailto:rupp@iue.tuwien.ac.at">rupp@iue.tuwien.ac.at</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Jeff,<br>

<br>

>     Ok, this is on fairly short notice considering the changes required.<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

    I recommend to start with copying the CUSP sources and migrate it<br>

    over to VECCUDA by replacing any use of cusp::array1d to a raw CUDA<br>

    handle. Operations from CUSP should be replaced by CUBLAS calls.<br>

<br>

<br>

It's hard to imagine any performance benefit from this unless CUSP<br>

sucks. What am I missing?<br>

</blockquote>

<br>

This is not about performance, but about providing the ability for users to 'implant' their own memory buffers. CUSP doesn't allow it (which was the initial point of this thread).<br>

<br></blockquote><div><br></div>Thanks. Sorry I missed that. Given CPU memcpy is an order of magnitude more bandwidth than PCI offload, I still don't get the point, but I don't need to. <div><br></div><div>Jeff <br><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Best regards,<br>

Karli<br>

</blockquote></div><br><br>-- <br>Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a><br>