VecSetValues and irreproducible results in parallel
Boyce Griffith
griffith at cims.nyu.edu
Wed Jan 28 11:55:35 CST 2009
Hi, Folks --
I recently noticed an irreproducibility issue with some parallel code
which uses PETSc. Basically, I can run the same parallel simulation on
the same number of processors and get different answers. The
discrepancies were only apparent to the eye after very long integration
times (e.g., approx. 20k timesteps), but in fact the computations start
diverging after only a few handfuls (30--40) of timesteps. (Running the
same code in serial yields reproducible results, but changing the MPI
library or running the code on a different system did not make any
difference.)
I believe that I have tracked this issue down to
VecSetValues/VecSetValuesBlocked. A part of the computation uses
VecSetValues to accumulate the values of forces at the nodes of a mesh.
At some nodes, force contributions come from multiple processors. It
appears that there is some "randomness" in the way that this
accumulation is performed in parallel, presumably related to the order
in which values are communicated.
I have found that I can make modified versions of VecSetValues_MPI(),
VecSetValuesBlocked_MPI(), and VecAssemblyEnd_MPI() to ensure
reproducible results. I make the following modifications:
(1) VecSetValues_MPI() and VecSetValuesBlocked_MPI() place all values
into the stash instead of immediately adding the local components.
(2) VecAssemblyEnd_MPI() "extracts" all of the values provided by the
stash and bstash, placing these values into lists corresponding to each
of the local entries in the vector. Next, once all values have been
extracted, I sort each of these lists (e.g., by descending or ascending
magnitude).
Making these changes appears to yield exactly reproducible results,
although I am still performing tests to try to shake out any other
problems. Another approach which seems to work is to perform the final
summation in higher precision (e.g., using "double-double precision"
arithmetic). Using double-double precision allows me to skip the
sorting step, although since the number of values to sort is relatively
small, it may be cheaper to sort.
Using more accurate summation methods (e.g., compensated summation) does
not appear to fix the lack or reproducibility problem.
I was wondering if anyone has run into similar issues with PETSc and has
a better solution.
Thanks!
-- Boyce
PS: These tests were done using PETSc 2.3.3. I have just finished
upgrading the code to PETSc 3.0.0 and will re-run them. However,
looking at VecSetValues_MPI()/VecSetValuesBlocked_MPI()/etc, it looks
like this code is essentially the same in PETSc 2.3.3 and 3.0.0.
More information about the petsc-users
mailing list