<div dir="ltr">Dear both,<div><br></div><div>I have recompiled slepc and petsc without debugging, as well as with the recommended --with-fortran-kernels=1. In the attachment I show the scaling for a typical "large" simulation with about 120 000 unkowns, using Krylov-Schur.</div><div><br></div><div>There are two sets of datapoints there, as I do two EPS solves in one simulations. The second solve is faster as it results from a grid refinement of the first solve, and takes the solution of the first solve as a first, good guess. Note that there are two pages in the PDF and in the second page I show the time · n_procs.</div><div><br></div><div>As you can see, the scaling is better than before, especially up to 8 processes (which means about 15,000 unknowns per process, which is, as I recall, cited as a good minimum on the website.<br><br>I am currently trying to run <span class="inbox-inbox-Apple-converted-space" style="line-height:1.5"> </span><span style="line-height:1.5">make streams NPMAX=8, but the cluster is extraordinarily crowded today and it does not like my interactive jobs. I will try to run them asap.<br><br>The main issue now, however, is again the first issue: the Generalizeid Davidson method does not converge to the physically correct negative eigenvalue (it should be about -0.05 as Krylov-Schur gives me). In stead it stays stuck at some small positive eigenvalue of about +0.0002. It looks as if the solver really does not like passing the eigenvalue = 0 barrier, a behavior I also see in smaller simulations, where the convergence is greatly slowed down when crossing this.</span></div><div><span style="line-height:1.5"><br></span></div><div><span style="line-height:1.5">However, this time, for this big simulation, just increasing NCV does <b>not</b> do the trick, at least not until NCV=2048.</span></div><div><span style="line-height:1.5"><br></span></div><div><span style="line-height:1.5">Also, I tried to use target magnitude without success either.</span></div><div><span style="line-height:1.5"><br></span></div><div><span style="line-height:1.5">I started implementing the capability to start with Krylov-Schur and then switch to GD with EPSSetInitialSpace when a certain precision has been reached, but then realized it might be a bit of overkill as the SLEPC solution phase in my code is generally not more than 15% of the time. There are probably other places where I can gain more than a few percents.<br><br>However, if there is another trick that can make GD to work, it would certainly be appreciated, as in my experience it is really about 5 times faster than Krylov-Schur!<br><br>Thanks!<br><br>Toon</span></div><div><span style="line-height:1.5"><br></span></div><div class="gmail_quote"><div dir="ltr">On Thu, Mar 30, 2017 at 2:47 PM Matthew Knepley <<a href="mailto:knepley@gmail.com">knepley@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg">On Thu, Mar 30, 2017 at 3:05 AM, Jose E. Roman <span dir="ltr" class="gmail_msg"><<a href="mailto:jroman@dsic.upv.es" class="gmail_msg" target="_blank">jroman@dsic.upv.es</a>></span> wrote:<br class="gmail_msg"><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="gmail_msg"><br class="gmail_msg">

> El 30 mar 2017, a las 9:27, Toon Weyens <<a href="mailto:toon.weyens@gmail.com" class="gmail_msg" target="_blank">toon.weyens@gmail.com</a>> escribió:<br class="gmail_msg">

><br class="gmail_msg">

> Hi, thanks for the answer.<br class="gmail_msg">

><br class="gmail_msg">

> I use MUMPS as a PC. The options -ksp_converged_reason, -ksp_monitor_true_residual and -ksp_view  are not used.<br class="gmail_msg">

><br class="gmail_msg">

> The difference between the log_view outputs of running a simple solution with 1, 2, 3 or 4 MPI procs is attached (debug version).<br class="gmail_msg">

><br class="gmail_msg">

> I can see that with 2 procs it takes about 22 seconds, versus 7 seconds for 1 proc. For 3 and 4 the situation is worse: 29 and 37 seconds.<br class="gmail_msg">

><br class="gmail_msg">

> Looks like the difference is mainly in the BVmult and especially in the BVorthogonalize routines:<br class="gmail_msg">

><br class="gmail_msg">

> BVmult takes 1, 6.5, 10 or even a whopping 17 seconds for the different number of proceses<br class="gmail_msg">

> BVorthogonalize takes 1, 4, 6, 10.<br class="gmail_msg">

><br class="gmail_msg">

> Calculating the preconditioner does not take more time for different number of proceses, and applying it only slightly increases. So it cannot be mumps' fault...<br class="gmail_msg">

><br class="gmail_msg">

> Does this makes sense? Is there any way to improve this?<br class="gmail_msg">

><br class="gmail_msg">

> Thanks!<br class="gmail_msg">

<br class="gmail_msg">

</span>Cannot trust performance data in a debug build:<br class="gmail_msg"></blockquote><div class="gmail_msg"><br class="gmail_msg"></div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg">Yes, you should definitely make another build configured using --with-debugging=no.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">What do you get for STREAMS on this machine</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">  make streams NP=4</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">From this data, it looks like you have already saturated the bandwidth at 2 procs.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">  Thanks,</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">    Matt</div></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><div class="gmail_msg"> </div><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br class="gmail_msg">

      ##########################################################<br class="gmail_msg">

      #                                                        #<br class="gmail_msg">

      #                          WARNING!!!                    #<br class="gmail_msg">

      #                                                        #<br class="gmail_msg">

      #   This code was compiled with a debugging option,      #<br class="gmail_msg">

      #   To get timing results run ./configure                #<br class="gmail_msg">

      #   using --with-debugging=no, the performance will      #<br class="gmail_msg">

      #   be generally two or three times faster.              #<br class="gmail_msg">

      #                                                        #<br class="gmail_msg">

      ##########################################################<br class="gmail_msg">

<br class="gmail_msg">

<br class="gmail_msg">

<br class="gmail_msg">

</blockquote></div></div></div><div dir="ltr" class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div>-- <br class="gmail_msg"><div class="m_-3760734499255775781gmail_signature gmail_msg" data-smartmail="gmail_signature">What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br class="gmail_msg">-- Norbert Wiener</div>

</div></div></blockquote></div></div>