Generally you do not expect digit-for-digit identical results between parallel runs, and yes, round off errors might result from different loading in-between. If a problem is ill-posed or poorly preconditioned this may even lead to divergence in some cases.<div>
<br></div><div>Dominik<br><br><div class="gmail_quote">On Thu, Aug 18, 2011 at 9:49 AM, Harald Pfeiffer <span dir="ltr"><<a href="mailto:pfeiffer@cita.utoronto.ca">pfeiffer@cita.utoronto.ca</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div bgcolor="#FFFFFF" text="#000000">
Hello,<br>
<br>
we use PETSc to solve the nonlinear system arising from
pseudo-spectral discretization of certain elliptic PDEs in
Einstein's equations. When running the same job multiple times on
the same number of processors on the same workstation, we find
roundoff differences. Is this expected, e.g. because MPI reduction
calls may behave differently depending on the load of the machine?
Or should we be concerned and investigate further?<br>
<br>
Thanks,<br>
Harald <br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
-------- Original Message --------
<table border="0" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<th align="RIGHT" nowrap valign="BASELINE">Subject: </th>
<td>Re: Quick question about derivatives in SpEC</td>
</tr>
<tr>
<th align="RIGHT" nowrap valign="BASELINE">Date: </th>
<td>Tue, 16 Aug 2011 09:45:27 -0400</td>
</tr>
<tr>
<th align="RIGHT" nowrap valign="BASELINE">From: </th>
<td>Gregory B. Cook <a href="mailto:cookgb@wfu.edu" target="_blank"><cookgb@wfu.edu></a></td>
</tr>
<tr>
<th align="RIGHT" nowrap valign="BASELINE">To: </th>
<td>Harald Pfeiffer <a href="mailto:pfeiffer@cita.utoronto.ca" target="_blank"><pfeiffer@cita.utoronto.ca></a></td>
</tr>
<tr>
<th align="RIGHT" nowrap valign="BASELINE">CC: </th>
<td>Larry Kidder <a href="mailto:kidder@astro.cornell.edu" target="_blank"><kidder@astro.cornell.edu></a>, Mark Scheel
<a href="mailto:scheel@tapir.caltech.edu" target="_blank"><scheel@tapir.caltech.edu></a></td>
</tr>
</tbody>
</table>
<br>
<br>
<pre>Hi Harald,
All of the tests I was doing were on the same 8 cores on my office
workstation. It is running Ubuntu 11, and uses the default OpenMPI
communication approach. To make sure it wasn't something I was doing, I
ran two elliptic solves of the ExtendedConformalThinSandwich() volume
terms. Here are the outputs of snes.dat for the different levels:
Run 1 Run 2
Six0/snes.dat
0 7.3385297958166698 0 7.3385297958166698
1 5.1229060531500723 1 5.1229060531500723
2 0.32616852761238285 2 0.32616852761238285
3 0.012351417186533147 3 0.012351417186800266 <*****
4 9.7478354935351385e-06 4 9.7478351511500114e-06
Six1/snes.dat
0 0.13405558402489681 0 0.13405558402540407
1 0.00068002100028642610 1 0.00068002089609322440
2 6.8764357250058596e-08 2 6.3738394418031232e-08
Six2/snes.dat
0 0.0063028244769771681 0 0.0063028058475922306
1 1.4538921141731714e-06 1 1.4545032695605256e-06
Six3/snes.dat
0 0.00061476105672438877 0 0.00061476093499534406
1 6.0267672358059814e-08 1 5.4897793428123648e-08
Six4/snes.dat
0 0.00053059501859595651 0 0.00053059591479892143
1 4.8003269489205705e-08 1 4.8079799390886591e-08
Six5/snes.dat
0 3.6402372419546429e-05 0 3.6402169997838670e-05
1 5.3117360561476420e-09 1 5.2732089856727503e-09
The differences are clearly at the level of roundoff, but it is
"strange" that you cannot reproduce identical results.
I've attached all of the .input files for this run in case you want to
try to reproduce my findings.
Greg
On 08/16/2011 06:21 AM, Harald Pfeiffer wrote:
> Hi Greg,
>
> some thoughts:
>
> Petsc is using standard MPI reduction calls, which may give results that
> differ by roundoff. We have positively noticed this happening for
> different number of processes, but perhaps this also happens depending
> on where in a cluster your jobs run (different network topology
> depending on whether all processors are on the same rack, vs. split
> among racks; dynamic load-balancing of network communication).
>
> You might want to try reserving a few nodes interactively, and then
> running the elliptic solver multiple times on this same set of nodes.
>
> The Mover does indeed load-balanced interpolation, but when doing so,
> the MPI communication should not affect identical results.
>
> Once there are roundoff differences, they are typically amplified during
> a petsc linear solve. The iterative algorithm takes different paths
> toward the solution, and a difference of 1e-10 doesn't seem excessive.
>
> Harald
>
> ps. Preconditioning is done differently on-processor and off-processor
> and depends therefore highly on the processor count. So if you were to
> change number of processors, the iterative solve will proceed very
> differently.
>
> On 8/15/11 10:27 PM, Gregory B. Cook wrote:
>> Hi Larry,
>>
>> I ran a check using the default ExtendedConformalThinSandwich() volume
>> terms and this also produced roundoff error differences between
>> identical runs, so I feel better about that. I am using the same
>> number of processors, but if there is any kind of dynamic load
>> balancing for interpolation/communication/etc, then I can see that
>> different runs might end up using different boundary communications.
>> Maybe that's all there is to it?
>>
>> Greg
>>
>> On 08/15/2011 04:16 PM, Larry Kidder wrote:
>>> Hi Greg,
>>>
>>> Harald is traveling, so I am not sure when he will answer.
>>> My vague recollection is that there is something about how PETSc does
>>> preconditioning in parallel that leads to not producing the same result;
>>> but I don't recall if this happens in general, or only if you change the
>>> distribution of processes.
>>>
>>> Larry
>>>
>>> Gregory B. Cook wrote:
>>>> Hi Guys,
>>>>
>>>> I have a follow-up question that may be tangentially related to my
>>>> original question about derivatives. This one is targeted at Harald.
>>>>
>>>> When I run a version of my code where the very small errors in the
>>>> derivative of the metric are not present (I code them in differently),
>>>> I find that running the exact same input files successively does not
>>>> produce exactly the same results. This is a multi-level elliptic solve
>>>> on a complex domain for binary black holes. On Level-0, the the
>>>> results returned in snes.dat are identical. On Level-1, the initial
>>>> and second snes norms are identical, but the third differs. After
>>>> this, all snes norms differ.
>>>>
>>>> Is this to be expected? Does PETSc not produce identical results on
>>>> consecutive solves with the same starting point? Is there something in
>>>> the MPI communication that means that the results should differ? The
>>>> differences start at the order of 10^-13, but grow by the 6th level to
>>>> be of order 10^-10.
>>>>
>>>> Greg
>>>>
>>>> On 08/15/2011 01:02 PM, Larry Kidder wrote:
>>>>> Hi Greg,
>>>>>
>>>>> Did you compute the norm of the metric itself?
>>>>> What domain did you use?
>>>>>
>>>>> Larry
>>>>>
>>>>> Gregory B. Cook wrote:
>>>>>> Hi Guys,
>>>>>>
>>>>>> I was doing a simple test as part of debugging some code I'm writing.
>>>>>> I ended up placing the following relevant lines of code into the
>>>>>> EllipticItems.input and EllipticObservers.input files:
>>>>>>
>>>>>> ---EllipticItems.input---
>>>>>> EvaluateMatrixFormula(Output=InvConformalMetric; Dim=3; Symm=11;
>>>>>> M[0,0]=1; M[1,1]=1; M[2,2]=1),
>>>>>> FirstDeriv(Input=InvConformalMetric; Output=dInvConformalMetric),
>>>>>> SecondDeriv(Input=InvConformalMetric; Output=ddInvConformalMetric),
>>>>>>
>>>>>> FlattenDeriv(Input=dInvConformalMetric;
>>>>>> Output=fdInvConformalMetric;DerivPosition=Last),
>>>>>> FlattenDeriv(Input=ddInvConformalMetric;
>>>>>> Output=fddInvConformalMetric;DerivPosition=Last),
>>>>>>
>>>>>> ---EllipticObservers.input---
>>>>>> NormOfTensor(Input=fdInvConformalMetric, fddInvConformalMetric;
>>>>>> Filename=dInvCM_L2.dat;Op=L2; MetricForTensors=None),
>>>>>> NormOfTensor(Input=fdInvConformalMetric, fddInvConformalMetric;
>>>>>> Filename=dInvCM_Linf.dat;Op=Linf; MetricForTensors=None),
>>>>>>
>>>>>>
>>>>>> The odd thing is that the norms that I get out are not exactly zero.
>>>>>> They are very small, but I'm taking the first and second derivatives
>>>>>> of the identity matrix, so I would expect them to evaluate to exactly
>>>>>> zero. The fact that they don't leads me to think that there is
>>>>>> something wrong either in my code or in how I have written the input
>>>>>> files.
>>>>>>
>>>>>> Should these derivatives evaluate to exactly zero?
>>>>>>
>>>>>> Greg
>>>>>
>>>
>
</pre>
</div>
</blockquote></div><br></div>