čtvrtek 14. prosince 2017

When do I write unite tests


  1. Whenever I am updating some piece of code that already works. By writing the unit tests, I make sure that the new code is at least as good as the old code. And that the new code does not brake things.
  2. Whenever I am fixing a bug. The unit test demonstrates that the bug was fixed. And it makes sure that the bug does not return in the future due to refactoring.
  3. Whenever I am not able to write a working code on the first attempt, it is a sign of complexity. And since it is said that debugging is tougher than writing a code, I want to make sure that some bug did not pass unnoticed.
  4. Whenever I am assigning a programming task to someone. A set of unit tests helps to communicate what do I want to get. And it nudges the assignee to use my interface, simplifying integration of the delivered code on my side.
  5. Whenever I get the code from the assignee. Reasoning: Whenever I am assigning a task, I generally provide just a very limited set of examples that the code has to pass because:
    1. Writing a comprehensive set of tests takes a lot of effort. Frequently more than writing the function itself.
    2. The assignee may find a much better solution to the problem that is incompatible with the original unit tests. When this happens, I genuinely want to use the better solution. But I do not want to waste a lot of my work.
    Unfortunately, when people write a code that passes all my tests they think that the code may not contain any more bugs. I enjoy proving them wrong.
  6. Before deploying the code. It happened to me in the past that my code was passing all the tests that I was using during the development. To enjoy the victory, I thrown at the code some newly generated data, expecting a beautiful result. But the code failed. Just like the assignee tend to overffit the "training" unit test set, so do I.
  7. Before publishing the code. A good unit test works as a nice illustration of how the code can be used. 

What to test in data processing code

Based on the past experience, data are dirty. If the current data are not dirty, the next one will be. Hence, it is reasonable to test the code for such scenarios. My checklist follows:
  1. Does the code do what it is supposed to do on a trivial example? The purpose of this unit test is two fold: it is a documentation of how the code can be used. And if this test fails on the client's computer but works correctly on your own computer, it is a sign of an integration problem.
  2.  If the code accepts numeric data, check following scenarios:
    1. Zero (a smoke test for division by zero error)
    2. One
    3. Negative number (not all algorithms accept negative values)
    4. Decimal number (not all algorithms accept decimals)
    5. NaN (not a number - permissible for example in doubles) 
    6. null (missing value)
    7. Empty set (no data at the input at all)
    8. Plus/minus infinity
    9. Constant vector (for example when variance is calculated and used in a denominator, we get division by zero)
    10. Vector of extremely large values and extremely small values (test of numerical stability)
  3. If the code accepts character data, check following scenarios:
    1. Plain ASCII
    2. Accented characters
    3. null (missing value)
    4. Empty set (no data at the input at all)
    5. Ridiculously long text
    6. Empty value ("")
    7. White space character (e.g. " ")