VecSetValues and irreproducible results in parallel

Wed Jan 28 11:55:35 CST 2009

Hi, Folks --

I recently noticed an irreproducibility issue with some parallel code 
which uses PETSc.  Basically, I can run the same parallel simulation on 
the same number of processors and get different answers.  The 
discrepancies were only apparent to the eye after very long integration 
times (e.g., approx. 20k timesteps), but in fact the computations start 
diverging after only a few handfuls (30--40) of timesteps.  (Running the 
same code in serial yields reproducible results, but changing the MPI 
library or running the code on a different system did not make any 
difference.)

I believe that I have tracked this issue down to 
VecSetValues/VecSetValuesBlocked.  A part of the computation uses 
VecSetValues to accumulate the values of forces at the nodes of a mesh. 
  At some nodes, force contributions come from multiple processors.  It 
appears that there is some "randomness" in the way that this 
accumulation is performed in parallel, presumably related to the order 
in which values are communicated.

I have found that I can make modified versions of VecSetValues_MPI(), 
VecSetValuesBlocked_MPI(), and VecAssemblyEnd_MPI() to ensure 
reproducible results.  I make the following modifications:

(1) VecSetValues_MPI() and VecSetValuesBlocked_MPI() place all values 
into the stash instead of immediately adding the local components.

(2) VecAssemblyEnd_MPI() "extracts" all of the values provided by the 
stash and bstash, placing these values into lists corresponding to each 
of the local entries in the vector.  Next, once all values have been 
extracted, I sort each of these lists (e.g., by descending or ascending 
magnitude).

Making these changes appears to yield exactly reproducible results, 
although I am still performing tests to try to shake out any other 
problems.  Another approach which seems to work is to perform the final 
summation in higher precision (e.g., using "double-double precision" 
arithmetic).  Using double-double precision allows me to skip the 
sorting step, although since the number of values to sort is relatively 
small, it may be cheaper to sort.

Using more accurate summation methods (e.g., compensated summation) does 
not appear to fix the lack or reproducibility problem.

I was wondering if anyone has run into similar issues with PETSc and has 
a better solution.

Thanks!

-- Boyce

PS: These tests were done using PETSc 2.3.3.  I have just finished 
upgrading the code to PETSc 3.0.0 and will re-run them.  However, 
looking at VecSetValues_MPI()/VecSetValuesBlocked_MPI()/etc, it looks 
like this code is essentially the same in PETSc 2.3.3 and 3.0.0.