VecSetValues and irreproducible results in parallel

Thu Jan 29 19:13:06 CST 2009

    Since the code would only be compiled in when the special  
reproducability flag is on we can live with the C++.

    Barry

On Jan 28, 2009, at 12:27 PM, Boyce Griffith wrote:

> Hi, Barry --
>
> I suppose that each of the runs could be thought of as a particular  
> "realization" or something like that.  In my case, I figure they are  
> all "equally wrong."  ;-)  I have a hunch that the discrepancies  
> between runs is exacerbated by under-resolution, but I haven't  
> confirmed this to be the case.
>
> What got me started on this was I was looking into the effect of  
> changing a convergence threshold: how tight do I need to make my  
> convergence tolerance before I start getting the "same" answer?  I  
> was distressed to discover that no matter how tight the tolerance,  
> the results were always different.  Then I noticed that they were  
> different even for the same tolerance.  This made it a little  
> difficult to choose the tolerance!
>
> I'd be happy to supply the code for inclusion in petsc-dev.  I use C+ 
> + STL vectors and sorting algorithms in VecAssemblyEnd, however.   
> Can I toss you all the code with C++ stuff in it, or would it really  
> only be useful if I turn it into C-only code first?
>
> -- Boyce
>
> Barry Smith wrote:
>>   Boyce,
>>    As you know, any of the runs give equally valid answers (all  
>> right or all wrong depending on how you look at it). I assume you
>> want reproducibility to help track down other bugs in the code and  
>> to see if changes in some places that are not suppose to change
>> the solution are not wrong and actually changing the solution  
>> (which is hard to find if in each run you get a different answer).
>>   We would be interested in including your additions with petsc-dev  
>> with a config/configure.py option (say --with-reproducibility) that  
>> turns
>> on the more precise (but slower) version if you would like to  
>> contribute it.
>>   Barry
>>  We see this issue every once it a while and never got the energy  
>> to comprehensively go through the code and support to make the  
>> computations much
>> more reproducible.
>> On Jan 28, 2009, at 11:55 AM, Boyce Griffith wrote:
>>> Hi, Folks --
>>>
>>> I recently noticed an irreproducibility issue with some parallel  
>>> code which uses PETSc.  Basically, I can run the same parallel  
>>> simulation on the same number of processors and get different  
>>> answers.  The discrepancies were only apparent to the eye after  
>>> very long integration times (e.g., approx. 20k timesteps), but in  
>>> fact the computations start diverging after only a few handfuls  
>>> (30--40) of timesteps.  (Running the same code in serial yields  
>>> reproducible results, but changing the MPI library or running the  
>>> code on a different system did not make any difference.)
>>>
>>> I believe that I have tracked this issue down to VecSetValues/ 
>>> VecSetValuesBlocked.  A part of the computation uses VecSetValues  
>>> to accumulate the values of forces at the nodes of a mesh.  At  
>>> some nodes, force contributions come from multiple processors.  It  
>>> appears that there is some "randomness" in the way that this  
>>> accumulation is performed in parallel, presumably related to the  
>>> order in which values are communicated.
>>>
>>> I have found that I can make modified versions of  
>>> VecSetValues_MPI(), VecSetValuesBlocked_MPI(), and  
>>> VecAssemblyEnd_MPI() to ensure reproducible results.  I make the  
>>> following modifications:
>>>
>>> (1) VecSetValues_MPI() and VecSetValuesBlocked_MPI() place all  
>>> values into the stash instead of immediately adding the local  
>>> components.
>>>
>>> (2) VecAssemblyEnd_MPI() "extracts" all of the values provided by  
>>> the stash and bstash, placing these values into lists  
>>> corresponding to each of the local entries in the vector.  Next,  
>>> once all values have been extracted, I sort each of these lists  
>>> (e.g., by descending or ascending magnitude).
>>>
>>> Making these changes appears to yield exactly reproducible  
>>> results, although I am still performing tests to try to shake out  
>>> any other problems.  Another approach which seems to work is to  
>>> perform the final summation in higher precision (e.g., using  
>>> "double-double precision" arithmetic).  Using double-double  
>>> precision allows me to skip the sorting step, although since the  
>>> number of values to sort is relatively small, it may be cheaper to  
>>> sort.
>>>
>>> Using more accurate summation methods (e.g., compensated  
>>> summation) does not appear to fix the lack or reproducibility  
>>> problem.
>>>
>>> I was wondering if anyone has run into similar issues with PETSc  
>>> and has a better solution.
>>>
>>> Thanks!
>>>
>>> -- Boyce
>>>
>>> PS: These tests were done using PETSc 2.3.3.  I have just finished  
>>> upgrading the code to PETSc 3.0.0 and will re-run them.  However,  
>>> looking at VecSetValues_MPI()/VecSetValuesBlocked_MPI()/etc, it  
>>> looks like this code is essentially the same in PETSc 2.3.3 and  
>>> 3.0.0.