čtvrtek 14. prosince 2017

What to test in data processing code

Based on the past experience, data are dirty. If the current data are not dirty, the next one will be. Hence, it is reasonable to test the code for such scenarios. My checklist follows:
  1. Does the code do what it is supposed to do on a trivial example? The purpose of this unit test is two fold: it is a documentation of how the code can be used. And if this test fails on the client's computer but works correctly on your own computer, it is a sign of an integration problem.
  2.  If the code accepts numeric data, check following scenarios:
    1. Zero (a smoke test for division by zero error)
    2. One
    3. Negative number (not all algorithms accept negative values)
    4. Decimal number (not all algorithms accept decimals)
    5. NaN (not a number - permissible for example in doubles) 
    6. null (missing value)
    7. Empty set (no data at the input at all)
    8. Plus/minus infinity
    9. Constant vector (for example when variance is calculated and used in a denominator, we get division by zero)
    10. Vector of extremely large values and extremely small values (test of numerical stability)
  3. If the code accepts character data, check following scenarios:
    1. Plain ASCII
    2. Accented characters
    3. null (missing value)
    4. Empty set (no data at the input at all)
    5. Ridiculously long text
    6. Empty value ("")
    7. White space character (e.g. " ")

Žádné komentáře: