<div dir="ltr">Hi Justin,<div><br></div><div>I don't see anything obviously wrong that would be causing this variation in iterations due to number of processors. Is it at all feasible to send be an example code that reproduces the problem (perhaps a smaller version)? I'm still guessing the problem lies in numerical precision, it would be nice to find a way to avoid them. I don't think the job scheduling or compute nodes would affect this.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> Basically I call TaoSolve at each time level. What I find strange is that the number of TAO solve iterations vary at each time level for a given number of processors</blockquote><div>Just for clarification, do you mean that for a given problem, you run the same problem several times the only difference being the number of processors, and that on each time step you get (close to) the same solution for each run, just with a different number of TAO iterations?</div><div><br></div><div>If you run the same problem twice using the same number of processors, is the output identical? No OpenMP threads or GPU's?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">2) Sometimes, I get Tao Termination reason of -5, and from what I see from the online documentation, it means the number of function evaluations exceeds the maximum number of function evaluations. I only get this at certain time levels, and it also varies when I change the number of processors.</blockquote><div><br></div><div>This looks like the same issue, unless the number of iterations is hugely different (say it converges on some number of processors after 200 iterations, but still hasn't converged after 2000 on a different number).</div><div><br></div><div><br></div><div>Jason</div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jun 23, 2015 at 6:36 PM, Justin Chang <span dir="ltr"><<a href="mailto:jychang48@gmail.com" target="_blank">jychang48@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div><span class="">

<div dir="ltr">

<div>

<div>

<div>

<div>

<div>

<div>I was unable to do quad precision or even with 64 bit integers because my data files rely on intricate binary files that have been written in 32 bit.

<br>

<br>

</div>

However, I noticed a couple things which are puzzling to me:<br>

<br>

</div>

1) I am solving a transient problem using my own backward euler function. Basically I call TaoSolve at each time level. What I find strange is that the number of TAO solve iterations vary at each time level for a given number of processors. The solution is

 roughly the same when I change the number of processors. Any idea why this is happening, or might this have more to do with the job scheduling/compute nodes on my HPC machine?<br>

<br>

</div>

2) Sometimes, I get Tao Termination reason of -5, and from what I see from the online documentation, it means the number of function evaluations exceeds the maximum number of function evaluations. I only get this at certain time levels, and it also varies when

 I change the number of processors.<br>

<br>

</div>

I can understand the number of iterations going down the further in time i go (this is due to the nature of my problem), but I am not sure why the above two observations are happening. Any thoughts?<br>

<br>

</div>

Thanks,<br>

</div>

Justin<br>

</div>

</span><div><div class="h5"><div class="gmail_extra"><br>

<div class="gmail_quote">On Fri, Jun 19, 2015 at 11:52 AM, Justin Chang <span dir="ltr">

<<a href="mailto:jychang48@gmail.com" target="_blank">jychang48@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

My code sort of requires HDF5 so installing quad precision might be a little difficult. I could try to work around this but that might take some effort.

<div><br>

</div>

<div>In the mean time, is there any other potential explanation or alternative to figuring this out?</div>

<div><br>

</div>

<div>Thanks,</div>

<div>Justin

<div>

<div><br>

<div><br>

On Thursday, June 18, 2015, Matthew Knepley <<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div dir="ltr">

<div class="gmail_extra">

<div class="gmail_quote">On Thu, Jun 18, 2015 at 1:52 PM, Jason Sarich <span dir="ltr">

<<a>jason.sarich@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div dir="ltr">BLMVM doesn't use a KSP or preconditioner, it updates using the L-BFGS-B formula</div>

</blockquote>

<div><br>

</div>

<div>Then this sounds like a bug, unless one of the constants is partition dependent.</div>

<div><br>

</div>

<div>  Matt</div>

<div> </div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="gmail_extra">

<div class="gmail_quote">On Thu, Jun 18, 2015 at 1:45 PM, Matthew Knepley <span dir="ltr">

<<a>knepley@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div>

<div dir="ltr">

<div class="gmail_extra">

<div class="gmail_quote"><span>On Thu, Jun 18, 2015 at 12:15 PM, Jason Sarich <span dir="ltr">

<<a>jason.sarich@gmail.com</a>></span> wrote:<br>

</span><span>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div dir="ltr">Hi Justin,

<div><br>

</div>

<div>I can't tell for sure why this is happening, have you tried using quad precision to make sure that numerical cutoffs isn't the problem?</div>

<div><br>

</div>

<div>1 The Hessian being approximate and the resulting implicit computation is the source of the cutoff, but would not be causing different convergence rates in infinite precision.</div>

<div><br>

</div>

<div>2 the local size may affect load balancing but not the resulting norms/convergence rate.</div>

</div>

</blockquote>

<div><br>

</div>

</span><span>

<div>This sounds to be like the preconditioner is dependent on the partition. Can you send -tao_view -snes_view</div>

<div><br>

</div>

<div>  Matt</div>

<div> </div>

</span>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div dir="ltr"><span><font color="#888888">

<div>Jason</div>

<div><br>

</div>

</font></span></div>

<span>

<div>

<div>

<div class="gmail_extra"><br>

<div class="gmail_quote">On Thu, Jun 18, 2015 at 10:44 AM, Justin Chang <span dir="ltr">

<<a>jychang48@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div>

<div dir="ltr">I solved a transient diffusion across multiple cores using TAO BLMVM. When I simulate the same problem but on different numbers of processing cores, the number of solve iterations change quite drastically. The numerical solution is the same,

 but these changes are quite vast. I attached a PDF showing a comparison between KSP and TAO. KSP remains largely invariant with number of processors but TAO (with bounded constraints) fluctuates.<br>

<br>

My question is, why is this happening? I understand that accumulation of numerical round-offs may attribute to this, but the differences seem quite vast to me. My initial thought was that 

<div><br>

</div>

<div>1) the Hessian is only projected and not explicitly computed, which may have something to do with the rate of convergence<br>

<br>

2) local problem size. Certain regions of my domain have different number of "violations" which need to be corrected by the bounded constraints so the rate of convergence depends on how these regions are partitioned?<br>

<br>

Any thoughts?<br>

<br>

Thanks,<br>

Justin</div>

</div>

</div>

</blockquote>

</div>

<br>

</div>

</div>

</div>

</span></blockquote>

</div>

<br>

<br clear="all">

<span><font color="#888888"><span>

<div><br>

</div>

-- <br>

<div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener</div>

</span></font></span></div>

</div>

</div>

</blockquote>

</div>

<br>

</div>

</blockquote>

</div>

<br>

<br clear="all">

<div><br>

</div>

-- <br>

<div>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>

-- Norbert Wiener</div>

</div>

</div>

</blockquote>

</div>

</div>

</div>

</div>

</blockquote>

</div>

<br>

</div>

</div></div></div>

</blockquote></div><br></div>