VecSetValues and irreproducible results in parallel

Wed Jan 28 12:27:50 CST 2009

Hi, Barry --

I suppose that each of the runs could be thought of as a particular 
"realization" or something like that.  In my case, I figure they are all 
"equally wrong."  ;-)  I have a hunch that the discrepancies between 
runs is exacerbated by under-resolution, but I haven't confirmed this to 
be the case.

What got me started on this was I was looking into the effect of 
changing a convergence threshold: how tight do I need to make my 
convergence tolerance before I start getting the "same" answer?  I was 
distressed to discover that no matter how tight the tolerance, the 
results were always different.  Then I noticed that they were different 
even for the same tolerance.  This made it a little difficult to choose 
the tolerance!

I'd be happy to supply the code for inclusion in petsc-dev.  I use C++ 
STL vectors and sorting algorithms in VecAssemblyEnd, however.  Can I 
toss you all the code with C++ stuff in it, or would it really only be 
useful if I turn it into C-only code first?

-- Boyce

Barry Smith wrote:
> 
>    Boyce,
> 
>     As you know, any of the runs give equally valid answers (all right 
> or all wrong depending on how you look at it). I assume you
> want reproducibility to help track down other bugs in the code and to 
> see if changes in some places that are not suppose to change
> the solution are not wrong and actually changing the solution (which is 
> hard to find if in each run you get a different answer).
> 
>    We would be interested in including your additions with petsc-dev 
> with a config/configure.py option (say --with-reproducibility) that turns
> on the more precise (but slower) version if you would like to contribute 
> it.
> 
>    Barry
> 
>   We see this issue every once it a while and never got the energy to 
> comprehensively go through the code and support to make the computations 
> much
> more reproducible.
> 
> 
> On Jan 28, 2009, at 11:55 AM, Boyce Griffith wrote:
> 
>> Hi, Folks --
>>
>> I recently noticed an irreproducibility issue with some parallel code 
>> which uses PETSc.  Basically, I can run the same parallel simulation 
>> on the same number of processors and get different answers.  The 
>> discrepancies were only apparent to the eye after very long 
>> integration times (e.g., approx. 20k timesteps), but in fact the 
>> computations start diverging after only a few handfuls (30--40) of 
>> timesteps.  (Running the same code in serial yields reproducible 
>> results, but changing the MPI library or running the code on a 
>> different system did not make any difference.)
>>
>> I believe that I have tracked this issue down to 
>> VecSetValues/VecSetValuesBlocked.  A part of the computation uses 
>> VecSetValues to accumulate the values of forces at the nodes of a 
>> mesh.  At some nodes, force contributions come from multiple 
>> processors.  It appears that there is some "randomness" in the way 
>> that this accumulation is performed in parallel, presumably related to 
>> the order in which values are communicated.
>>
>> I have found that I can make modified versions of VecSetValues_MPI(), 
>> VecSetValuesBlocked_MPI(), and VecAssemblyEnd_MPI() to ensure 
>> reproducible results.  I make the following modifications:
>>
>> (1) VecSetValues_MPI() and VecSetValuesBlocked_MPI() place all values 
>> into the stash instead of immediately adding the local components.
>>
>> (2) VecAssemblyEnd_MPI() "extracts" all of the values provided by the 
>> stash and bstash, placing these values into lists corresponding to 
>> each of the local entries in the vector.  Next, once all values have 
>> been extracted, I sort each of these lists (e.g., by descending or 
>> ascending magnitude).
>>
>> Making these changes appears to yield exactly reproducible results, 
>> although I am still performing tests to try to shake out any other 
>> problems.  Another approach which seems to work is to perform the 
>> final summation in higher precision (e.g., using "double-double 
>> precision" arithmetic).  Using double-double precision allows me to 
>> skip the sorting step, although since the number of values to sort is 
>> relatively small, it may be cheaper to sort.
>>
>> Using more accurate summation methods (e.g., compensated summation) 
>> does not appear to fix the lack or reproducibility problem.
>>
>> I was wondering if anyone has run into similar issues with PETSc and 
>> has a better solution.
>>
>> Thanks!
>>
>> -- Boyce
>>
>> PS: These tests were done using PETSc 2.3.3.  I have just finished 
>> upgrading the code to PETSc 3.0.0 and will re-run them.  However, 
>> looking at VecSetValues_MPI()/VecSetValuesBlocked_MPI()/etc, it looks 
>> like this code is essentially the same in PETSc 2.3.3 and 3.0.0.
>