<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<meta name="Generator" content="Microsoft Exchange Server">

<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>

</head>

<body>

<div>

<div id="x_compose-container" itemscope="" itemtype="https://schema.org/EmailMessage" style="direction:ltr">

<span itemprop="creator" itemscope="" itemtype="https://schema.org/Organization"><span itemprop="name"></span></span>

<div>

<div style="direction:ltr">Good point, thank you so much for the advice! I'll take that into consideration.</div>

<div><br>

</div>

<div style="direction:ltr">Best regards,</div>

<div style="direction:ltr">Yuyun</div>

<div><br>

</div>

<div class="x_acompli_signature">Get <a href="https://aka.ms/o0ukef">Outlook for iOS</a></div>

</div>

</div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Jed Brown <jed@jedbrown.org><br>

<b>Sent:</b> Friday, March 15, 2019 7:06:29 PM<br>

<b>To:</b> Yuyun Yang; Smith, Barry F.<br>

<b>Cc:</b> petsc-users@mcs.anl.gov<br>

<b>Subject:</b> Re: [petsc-users] Using PETSc with GPU</font>

<div> </div>

</div>

</div>

<font size="2"><span style="font-size:11pt;">

<div class="PlainText">Yuyun Yang via petsc-users <petsc-users@mcs.anl.gov> writes:<br>

<br>

> Currently we are forming the sparse matrices explicitly, but I think the goal is to move towards matrix-free methods and use a stencil, which I suppose is good to use GPUs for and more efficient. On the other hand, I've also read about matrix-free operations

 in the manual just on the CPUs. Would there be any benefit then to switching to GPU (looks like matrix-free in PETSc is rather straightforward to use, whereas writing the kernel function for GPU stencil would require quite a lot of work)?<br>

<br>

It all depends what kind of computation happens in there and how well<br>

you can implement it for the GPU.  It's important to have a clear idea<br>

of what you expect to achieve.  For example, if you write an excellent<br>

GPU implementation of your SNES residual/matrix-free Jacobian, it might<br>

be 2-3x faster than a good CPU implementation on hardware of similar<br>

cost ($ or Watt).  But you still need preconditioning, which is usually<br>

at least half the work, and perhaps a preconditioner runs the same speed<br>

on GPU and CPU (CPU version often converges a bit faster;<br>

preconditioning operations are often less amenable to GPUs).  So after<br>

all that effort, and now with code that is likely harder to maintain,<br>

you go from 4 seconds per solve to 3 seconds per solve on hardware of<br>

the same cost.  Is that worth it?<br>

<br>

Maybe, but you probably want that to be in the critical path for your<br>

research and/or customers.<br>

</div>

</span></font>

</body>

</html>