VecSetValues and irreproducible results in parallel
Boyce Griffith
griffith at cims.nyu.edu
Wed Jan 28 12:27:50 CST 2009
Hi, Barry --
I suppose that each of the runs could be thought of as a particular
"realization" or something like that. In my case, I figure they are all
"equally wrong." ;-) I have a hunch that the discrepancies between
runs is exacerbated by under-resolution, but I haven't confirmed this to
be the case.
What got me started on this was I was looking into the effect of
changing a convergence threshold: how tight do I need to make my
convergence tolerance before I start getting the "same" answer? I was
distressed to discover that no matter how tight the tolerance, the
results were always different. Then I noticed that they were different
even for the same tolerance. This made it a little difficult to choose
the tolerance!
I'd be happy to supply the code for inclusion in petsc-dev. I use C++
STL vectors and sorting algorithms in VecAssemblyEnd, however. Can I
toss you all the code with C++ stuff in it, or would it really only be
useful if I turn it into C-only code first?
-- Boyce
Barry Smith wrote:
>
> Boyce,
>
> As you know, any of the runs give equally valid answers (all right
> or all wrong depending on how you look at it). I assume you
> want reproducibility to help track down other bugs in the code and to
> see if changes in some places that are not suppose to change
> the solution are not wrong and actually changing the solution (which is
> hard to find if in each run you get a different answer).
>
> We would be interested in including your additions with petsc-dev
> with a config/configure.py option (say --with-reproducibility) that turns
> on the more precise (but slower) version if you would like to contribute
> it.
>
> Barry
>
> We see this issue every once it a while and never got the energy to
> comprehensively go through the code and support to make the computations
> much
> more reproducible.
>
>
> On Jan 28, 2009, at 11:55 AM, Boyce Griffith wrote:
>
>> Hi, Folks --
>>
>> I recently noticed an irreproducibility issue with some parallel code
>> which uses PETSc. Basically, I can run the same parallel simulation
>> on the same number of processors and get different answers. The
>> discrepancies were only apparent to the eye after very long
>> integration times (e.g., approx. 20k timesteps), but in fact the
>> computations start diverging after only a few handfuls (30--40) of
>> timesteps. (Running the same code in serial yields reproducible
>> results, but changing the MPI library or running the code on a
>> different system did not make any difference.)
>>
>> I believe that I have tracked this issue down to
>> VecSetValues/VecSetValuesBlocked. A part of the computation uses
>> VecSetValues to accumulate the values of forces at the nodes of a
>> mesh. At some nodes, force contributions come from multiple
>> processors. It appears that there is some "randomness" in the way
>> that this accumulation is performed in parallel, presumably related to
>> the order in which values are communicated.
>>
>> I have found that I can make modified versions of VecSetValues_MPI(),
>> VecSetValuesBlocked_MPI(), and VecAssemblyEnd_MPI() to ensure
>> reproducible results. I make the following modifications:
>>
>> (1) VecSetValues_MPI() and VecSetValuesBlocked_MPI() place all values
>> into the stash instead of immediately adding the local components.
>>
>> (2) VecAssemblyEnd_MPI() "extracts" all of the values provided by the
>> stash and bstash, placing these values into lists corresponding to
>> each of the local entries in the vector. Next, once all values have
>> been extracted, I sort each of these lists (e.g., by descending or
>> ascending magnitude).
>>
>> Making these changes appears to yield exactly reproducible results,
>> although I am still performing tests to try to shake out any other
>> problems. Another approach which seems to work is to perform the
>> final summation in higher precision (e.g., using "double-double
>> precision" arithmetic). Using double-double precision allows me to
>> skip the sorting step, although since the number of values to sort is
>> relatively small, it may be cheaper to sort.
>>
>> Using more accurate summation methods (e.g., compensated summation)
>> does not appear to fix the lack or reproducibility problem.
>>
>> I was wondering if anyone has run into similar issues with PETSc and
>> has a better solution.
>>
>> Thanks!
>>
>> -- Boyce
>>
>> PS: These tests were done using PETSc 2.3.3. I have just finished
>> upgrading the code to PETSc 3.0.0 and will re-run them. However,
>> looking at VecSetValues_MPI()/VecSetValuesBlocked_MPI()/etc, it looks
>> like this code is essentially the same in PETSc 2.3.3 and 3.0.0.
>
More information about the petsc-users
mailing list