VecSetValues / etc code
Boyce Griffith
griffith at cims.nyu.edu
Fri Jan 30 18:15:37 CST 2009
Hi, Folks --
I'm attaching the C++ code that I use when I need to add values in
parallel using VecSetValues/VecSetValuesBlocked. The only reason to use
this code instead of the standard implementation is to (try to)
guarantee exact reproducibility in parallel when values accumulated to a
particular location, say X[i], are supplied simultaneously by multiple
processors. I think that this implementation should be as scalable as
the original implementation, but so far I have only tested it on a
handful of processors.
I have placed my versions of VecSetValues(), VecSetValuesBlocked(),
VecAssemblyBegin(), and VecAssemblyEnd() in a static C++ class which has
a small amount of state stored as public static variables. The static
variables determine some algorithmic options, as detailed below.
The basic algorithm is to insert values as usual. For adding values,
however, ALL values are inserted into the stash (VecSetValues) or bstash
(VecSetValuesBlocked). In particular, local contributions ARE NOT
summed in VecSetValues or VecSetValuesBlocked.
VecAssemblyBegin works as usual.
VecAssemblyEnd also works as normal for insertion. For summation, it
first extracts all values from the stash/bstash. I have a vector of
std::vector<double>'s, one corresponding to each of the local vector
indices. As values are extracted, they are pushed to the back of the
appropriate vector.
Once all stash and bstash values have been extracted, I process the
values for each index in one of several ways:
(1) sort the values?
- no
- yes, by ascending magnitude
- yes, by descending magnitude
(2) sum the values in higher precision?
- no, use double precision
- yes, use double-double precision (using the datatype dd_real
provided by QD 2.3.7 by David Bailey et al.)
(3) what summation algorithm?
- standard ("recursive") summation algorithm
- Kahan's compensated summation algorithm
Sorting is performed using std::sort(). Summation is performed using
either std::accumulate() or my implementation of compensated summation,
compensated_accumulate(), which attempts to mimic the interface of
std::accumulate().
In my experience so far, doing things in double-double precision
(without sorting and without any fancy pants summation algorithm) is
sufficient to guarantee reproducibility. I am still doing tests,
however. (If you have a scheme which requires quad-double precision to
guarantee reproducibility when the rest of the code is double precision,
you probably have bigger problems than reproducibility. But adding
support for quad-double precision using QD is trivially easy.)
If I do things in double precision, I do not get reproducibility unless
I sort.
In double precision, sorting by decreasing magnitude (when there are
both positive and negative values) and using Kahan's compensated
summation algorithm yields results that are essentially identical to
those obtained using double-double precision. YMMV.
I didn't stick to the PETSc programming style when re-working this code.
Please let me know if there are any modifications you'd like for me to
make to make this more useful.
It is probably good to have some mechanism for switching the sorting
order, so that increasing magnitude can be used when one knows all the
values are of a particular sign, and decreasing magnitude can be used
otherwise.
Also, please let me know if there are any boneheaded things that I've
done here! :-)
-- Boyce
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PETScVecOps.tar
Type: application/x-tar
Size: 20480 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090130/b2211b54/attachment.tar>
More information about the petsc-dev
mailing list