<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div>Hi Matt:<br>    Thank you for your kind reply. I am aware of  this problem from my test case. I simulate the lid driven cavity <br>by the code, and the grid is a 100x100 2D domain. I use the routine DMPlexReconstructGradientsFVM to compute <br>the gradients and limiters. The limiter which I used in the code is the PETSCLIMITERMINMOD. I have march 1000 <br>steps, and the time costs are more higher than I expected. Then, I have loop the function DMPlexReconstructGradientsFVM <br>for 1000 times, and it costs nearly 170 seconds. I have browse the code of the routine DMPlexReconstructGradientsFVM.<br>The arithmetic is very clean, so I think It was because of the  lots of function calls， such as <pre width="80">          <a href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetArray.html#VecGetArray">VecGetArray, </a>DMPlexGetSupport and DMPlexPointLocalRef.<br></pre><a name="DMPlexReconstructGradientsFVM"></a>I make a further test and recode the DMPlexReconstructGradientsFVM and named it as DMPlexReconstructGradientsFVM_1  by myself. <br>When I loop the DMPlexReconstructGradientsFVM_1 for 1000 times, the time costs were reduced as 30 seconds. The modification in my <br>own code is that I calls the function outside the loops, and then pass the data into the function DMPlexReconstructGradientsFVM_1. The<br>program flow is like as follow<br>VecGetArray<br>DMPlexGetSupport<br>DMPlexPointLocalRef<br>...<br>for(i=0; i<1000;++i)<br>{<br>    DMPlexReconstructGradientsFVM_1(data, ....) <br>    /* Here the data represent the data I extract from the DMPlex using the function  VecGetArray and etc. */<br>}<br>The code using DMPlexReconstructGradientsFVM look like <br>for(i=0; i<1000;++i)<br>{<br>    function DMPlexReconstructGradientsFVM<br>   {<br>        VecGetArray<br>        DMPlexGetSupport<br>        DMPlexPointLocalRef<br>        ...<br>   }<br>}<br></div><br>Compared with DMPlexReconstructGradientsFVM_1, DMPlexReconstructGradientsFVM has too many function calls.<br><div>It makes the time costs very expensive. So, I write to you for helps that whether I can use some compiler options to<br>reduce the time coses.<br>    Thanks.<br><br>leejearl<br><br></div><br><div style="position:relative;zoom:1"><br></div><div id="divNeteaseMailCard"></div><br>At 2016-12-04 21:34:49, "Matthew Knepley" <knepley@gmail.com> wrote:<br> <blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sun, Dec 4, 2016 at 1:58 AM, leejearl <span dir="ltr"><<a href="mailto:leejearl@126.com" target="_blank">leejearl@126.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  
  <div bgcolor="#FFFFFF">

    <p>Hi, all PETSc developer:</p>

        Thank you for your great works. I have deploy my fvm code based

    on the PETSc.<br>

    It works well, and the results are beautiful. But I found a problem

    that some of the <br>

    functions, such as DMPlexGetSupport and <a href="http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DM/DMPlexPointLocalRef.html" target="_blank">DMPlexPointLocalRef,

      are very expensive.</a></div></blockquote><div><br></div><div>I can believe that some parts are expensive, but I think it is probably something other than</div><div>GetSupport() and PointLocalRef(). Lets look at the code. First support is just two pointer lookups</div><div><br></div><div>  <a href="https://bitbucket.org/petsc/petsc/src/8191f1e31285033beeebf70760bc9786361aefca/src/dm/impls/plex/plex.c?at=master&fileviewer=file-view-default#plex.c-1502">https://bitbucket.org/petsc/petsc/src/8191f1e31285033beeebf70760bc9786361aefca/src/dm/impls/plex/plex.c?at=master&fileviewer=file-view-default#plex.c-1502</a></div><div><br></div><div>and for Point LocalRef() its one lookup and arithmetic</div><div><br></div><div>  <a href="https://bitbucket.org/petsc/petsc/src/8191f1e31285033beeebf70760bc9786361aefca/src/dm/impls/plex/plexpoint.c?at=master&fileviewer=file-view-default#plexpoint.c-105">https://bitbucket.org/petsc/petsc/src/8191f1e31285033beeebf70760bc9786361aefca/src/dm/impls/plex/plexpoint.c?at=master&fileviewer=file-view-default#plexpoint.c-105</a></div><div><br></div><div>I have benchmark code that runs these, and they should definitely take < 1e-7s, and maybe</div><div>10-100 times less. You can look at Plex test ex9 to see some of it.</div><div><br></div><div>What is taking a lot of time?</div><div><br></div><div>  Thanks,</div><div><br></div><div>     Matt</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF">

    It costs a lot of times if such routines are involved. Is there any

    method one can use to reduce <br>

    the time costs and improve the efficiency of the executable

    applications?<br>

         Thanks<span class="gmail-HOEnZb"><font color="#888888"><br>

    leejearl</font></span></div></blockquote></div>-- <br><div class="gmail_signature">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>-- Norbert Wiener</div>

</div></div>

</blockquote></div><br><br><span title="neteasefooter"><p> </p></span>