[petsc-dev] How far should internal error checking go?
bsmith at petsc.dev
Tue Aug 4 14:05:29 CDT 2020
So long as the error checking is not much more expensive then the computation than I see no harm in including it with the if (debug) model. For example, checking a sorted array is sorted is less expensive than the sort so can be included at the end of sort algorithm.
For the numerical solvers things are much trickery because it is difficult to check if the result is "correct", especially given the different criteria possible for "convergence" and norms that might be used. Thus checking the "little" things throughout the code is a good idea, because we can't catch the "big things".
One check we don't do consistently which requires elbow grease is to use VecGetArrayWrite() instead of VecGetArray() when the routine has to fill ALL the values in the array. Then VecGetArrayWrite() can fill the array with Nan, and VecRestoreArrayWrite() verifies that there are no Nan in the result to do the basic test that the routine did not miss setting some values.
We could extend this concept to other places where a routine is suppose to "fill up" memory obtained with PetscMalloc(), have something like PetscMallocVerifyFilled() that checks malloced space after it has been filled by the code to make sure no places where missing. Note that valgrind does some of this checking, but not all, valgrind only generates messages when a resulting unset value is used in an if (something) or something that controls program flow etc. It will not detect when a numerical location is not filled in but is used later in a numerical computation that never controls program flow. Hence in debug mode I would like PetscMalloc() to fill all numerical arrays with Nan.
This is the simplest error checking, not even checking that correct values are used, just checking that something is used and we don't even do this everywhere yet.
> On Aug 4, 2020, at 12:24 PM, Jacob Faibussowitsch <jacob.fai at gmail.com> wrote:
> Hello All,
> How far should one go in error checking when using #if defined(PETSC_USE_DEBUG)? So far I have gone with the mantra that internal petsc routines (including ones authored by myself) should be considered infallible and that I should only be checking for garbage *input* from the user. But there is not a person in existence that doesn’t write buggy code, as evidenced by the somewhat routine “fixing bug in XYZ” merge requests one sees.
> On the other hand it is also not reasonable to check every single output with a fine-tooth comb because for the majority of cases code written by petsc developers is working as intended. Take for example writing an array sorting algorithm. Since every operation is "performance critical” these are often written in less logical or less readable formats leading to some subtle bugs that the writer doesn’t immediately catch. If these remain uncaught through CI/CD and then bleeds into a user code I see absolutely no chance of the user (or even other devs) ever being able to identify that the sorting algorithm deep in some function stack is the one producing the bug without significant effort. One could include a “dumb” version of the same algorithm that checks a copy of the initial array for missing/misplaced elements but as mentioned above this is a pointless slowdown 99% of the time.
> CI/CD -- while excellent at catching a lot of machine-specific bugs -- isn’t bulletproof either. It relies on the assumption that the writer knows all possible sources of bugs in their code and provides a test case for each, but to quote Isaac Newton “what we know is a drop, what we don’t know is an ocean”. I have been mulling over this problem for a while now, and have looked through the user manual/developers manual but have not found a definitive answer.
> Best regards,
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the petsc-dev