VecSetValues / etc code

Fri Jan 30 18:15:37 CST 2009

Hi, Folks --

I'm attaching the C++ code that I use when I need to add values in 
parallel using VecSetValues/VecSetValuesBlocked.  The only reason to use 
this code instead of the standard implementation is to (try to) 
guarantee exact reproducibility in parallel when values accumulated to a 
particular location, say X[i], are supplied simultaneously by multiple 
processors.  I think that this implementation should be as scalable as 
the original implementation, but so far I have only tested it on a 
handful of processors.

I have placed my versions of VecSetValues(), VecSetValuesBlocked(), 
VecAssemblyBegin(), and VecAssemblyEnd() in a static C++ class which has 
a small amount of state stored as public static variables.  The static 
variables determine some algorithmic options, as detailed below.

The basic algorithm is to insert values as usual.  For adding values, 
however, ALL values are inserted into the stash (VecSetValues) or bstash 
(VecSetValuesBlocked).  In particular, local contributions ARE NOT 
summed in VecSetValues or VecSetValuesBlocked.

VecAssemblyBegin works as usual.

VecAssemblyEnd also works as normal for insertion.  For summation, it 
first extracts all values from the stash/bstash.  I have a vector of 
std::vector<double>'s, one corresponding to each of the local vector 
indices.  As values are extracted, they are pushed to the back of the 
appropriate vector.

Once all stash and bstash values have been extracted, I process the 
values for each index in one of several ways:

(1) sort the values?
     - no
     - yes, by ascending magnitude
     - yes, by descending magnitude

(2) sum the values in higher precision?
     - no, use double precision
     - yes, use double-double precision (using the datatype dd_real 
provided by QD 2.3.7 by David Bailey et al.)

(3) what summation algorithm?
     - standard ("recursive") summation algorithm
     - Kahan's compensated summation algorithm

Sorting is performed using std::sort().  Summation is performed using 
either std::accumulate() or my implementation of compensated summation, 
compensated_accumulate(), which attempts to mimic the interface of 
std::accumulate().

In my experience so far, doing things in double-double precision 
(without sorting and without any fancy pants summation algorithm) is 
sufficient to guarantee reproducibility.  I am still doing tests, 
however.  (If you have a scheme which requires quad-double precision to 
guarantee reproducibility when the rest of the code is double precision, 
you probably have bigger problems than reproducibility.  But adding 
support for quad-double precision using QD is trivially easy.)

If I do things in double precision, I do not get reproducibility unless 
I sort.

In double precision, sorting by decreasing magnitude (when there are 
both positive and negative values) and using Kahan's compensated 
summation algorithm yields results that are essentially identical to 
those obtained using double-double precision.  YMMV.

I didn't stick to the PETSc programming style when re-working this code. 
  Please let me know if there are any modifications you'd like for me to 
make to make this more useful.

It is probably good to have some mechanism for switching the sorting 
order, so that increasing magnitude can be used when one knows all the 
values are of a particular sign, and decreasing magnitude can be used 
otherwise.

Also, please let me know if there are any boneheaded things that I've 
done here!  :-)

-- Boyce
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PETScVecOps.tar
Type: application/x-tar
Size: 20480 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090130/b2211b54/attachment.tar>