čtvrtek 14. prosince 2017

When do I write unite tests


  1. Whenever I am updating some piece of code that already works. By writing the unit tests, I make sure that the new code is at least as good as the old code. And that the new code does not brake things.
  2. Whenever I am fixing a bug. The unit test demonstrates that the bug was fixed. And it makes sure that the bug does not return in the future due to refactoring.
  3. Whenever I am not able to write a working code on the first attempt, it is a sign of complexity. And since it is said that debugging is tougher than writing a code, I want to make sure that some bug did not pass unnoticed.
  4. Whenever I am assigning a programming task to someone. A set of unit tests helps to communicate what do I want to get. And it nudges the assignee to use my interface, simplifying integration of the delivered code on my side.
  5. Whenever I get the code from the assignee. Reasoning: Whenever I am assigning a task, I generally provide just a very limited set of examples that the code has to pass because:
    1. Writing a comprehensive set of tests takes a lot of effort. Frequently more than writing the function itself.
    2. The assignee may find a much better solution to the problem that is incompatible with the original unit tests. When this happens, I genuinely want to use the better solution. But I do not want to waste a lot of my work.
    Unfortunately, when people write a code that passes all my tests they think that the code may not contain any more bugs. I enjoy proving them wrong.
  6. Before deploying the code. It happened to me in the past that my code was passing all the tests that I was using during the development. To enjoy the victory, I thrown at the code some newly generated data, expecting a beautiful result. But the code failed. Just like the assignee tend to overffit the "training" unit test set, so do I.
  7. Before publishing the code. A good unit test works as a nice illustration of how the code can be used. 

What to test in data processing code

Based on the past experience, data are dirty. If the current data are not dirty, the next one will be. Hence, it is reasonable to test the code for such scenarios. My checklist follows:
  1. Does the code do what it is supposed to do on a trivial example? The purpose of this unit test is two fold: it is a documentation of how the code can be used. And if this test fails on the client's computer but works correctly on your own computer, it is a sign of an integration problem.
  2.  If the code accepts numeric data, check following scenarios:
    1. Zero (a smoke test for division by zero error)
    2. One
    3. Negative number (not all algorithms accept negative values)
    4. Decimal number (not all algorithms accept decimals)
    5. NaN (not a number - permissible for example in doubles) 
    6. null (missing value)
    7. Empty set (no data at the input at all)
    8. Plus/minus infinity
    9. Constant vector (for example when variance is calculated and used in a denominator, we get division by zero)
    10. Vector of extremely large values and extremely small values (test of numerical stability)
  3. If the code accepts character data, check following scenarios:
    1. Plain ASCII
    2. Accented characters
    3. null (missing value)
    4. Empty set (no data at the input at all)
    5. Ridiculously long text
    6. Empty value ("")
    7. White space character (e.g. " ")

pondělí 4. září 2017

Keyboard shortcuts on MacOS

MacOs does some things right. And some not so much.

Positives:
  1. A dedicated (and simple!) shortcut for displaying the configuration of an application. On windows, you generally have to fish for "setting" or "preferences" in the application menu. Not so on MacOS - it is always command+",".
  2. A variety of shortcuts for different screenshots.
Negatives:
  1. No dedicated keyboard shortcut for find and replace. On Windows, it is always ctrl+"r". On MacOS, it is application specific.  Sometimes it is command+alt+"f", sometimes command+shift+"f" and sometimes there is simply no keyboard shortcut at all and you have to use mouse to tell the app to perform the find and replace and not just find! If you wonder, both, F5 and command+"r" perform refresh.
  2. No keyboard shortcut for displaying context menu - you have to use right click on the mouse.

středa 30. srpna 2017

Not invented here syndrom

Whenever I find that my code can be replaced with a system call, I go a long way to replace my call with the system call because I generally trust more the sytem libraries than my own code. When it comes to third party libraries, I am conservative. Sometimes they are outright buggy, sometimes they are correct but get deprecated and removed. And sometimes the whole system gets so complex due to the gross amount of third party code that no one wants to deal with the mess anymore and the project dies off. Hence, I like to try new 3rd party libraries and study them. But I commit into using them extensively only after thorough consideration of alternatives and testing.

pátek 11. srpna 2017

A proper handling of nulls

I am familiar with two approaches how to deal with nulls in programming languages: either completely avoid them or embrace them. Languages for logic programming generally avoid the nulls (a great apologetic is given by Design and Implementation of the LogicBlox System). But languages that permit them should provide following facilities:

1) meta information about why the value is missing
2) nullable & non-nullable variables

For example, SAS got it right 40 years ago. In SAS, a missing value is represented with a dot. That by itself is not great whenever you need to print out the code or the data, because you never know whether that dot really represents a missing variable or it is just an imperfection of the paper or the printer. But it permits to easily define the reason why the value is missing:
    .                 // Generic missing value
    .refusedToAnswer  // Missing value with a metadatum
    .didntKnow        // Missing value with a different metadatum
Hence, generic algorithms can threat all missing values the same way. But if you want to treat them differently, for example because refusedToAnswer can have a vastly different meaning in a questionary than didntKnow, you can do it.

Furthermore, SAS provides optional non-null constraints on attributes, just like SQL. The only ward on SAS's implementation is that it raises exceptions only during the runtime, not during the compilation time as, for example, Kotlin does. Note also that nullability must be configurable for all variables. For example in C#, nullability is configurable only for value types, not class types. And this omission is a source of many null-pointer exceptions.

There is just one thing where I am not sure which approach is better. If we have a function sum, it can:
  1. Accept nullable variables and use some default strategy to deal with nulls (as SQL does).
  2. Accept nullable variables and blow during the runtime when null is encountered, unless some strategy is defined (R takes this approach).
  3. Have a dedicated sumnan function, which accepts nullable variables and takes a parameter, which determines, how nulls should be treated. Non-nulable variables get accepted by sum function (something like this is used in MATLAB, minus the type control).
The first approach is convenient to use. But potentially dangerous, because the user may never realize that a null leaked into the data and the conclusions are wrong.

The second approach is safer, because the user at least learns that something is wrong immediately when null is passed to the function without passing a strategy how to deal with nulls. The disadvantage is that it is a runtime check, not a static check. Nevertheless, programmers that are concerned about safety may use a lint to identify calls to functions with nullable variables without defining the strategy what to do with nulls.

The third approach is easier to validate by the compiler than the second approach. But the naming convention can be difficult to enforce.

Do you have some thought about this issue?

pondělí 31. července 2017

International Software Testing Contest - experience

Why I write about the contest: Since I am one of the winners of the first ISTC competition held in 2017, it is in my best interest to promote this competition in order to make famous.

The assignment description: The contest consisted of writing tests for two Java projects: Naive Bayes classifier and Prim's algorithm for calculation of minimum spanning trees. A copy of the assignment: download.

Strategy: In 2017, we were evaluated only based on the branch coverage (line coverage was ignored contrary to what was written in the invitation) and the mutation score. Only in the case of a tie, the count of test cases would have been taken into the consideration. Since no tie happened in 2017, the recommended strategy for the next years (under the assumption that the rules do not change) is simple: maximize branch coverage even at the expense of the count of test cases.

Furthermore, to maximize the mutation score you have to use asserts in your tests. My strategy was: print the result of the method into a console:
    System.out.println(someMethod());
And use the printed output in the assert:
    assertEquals("textOutput".trim(), someMethod().trim());
I used trim methods because I did not want to deal with the errors caused by wrongly copy-pasting too many/too few whitespace characters. Is it a test strategy I would use outside of the contest? No way, because the toString() format can change anytime. It may not catch all deviations. And not all objects have to implement toString(). But at ISTC 2017 it worked reasonably well.

Idea: Use a generator of unit tests like EvoSuite. However, it is critical to make sure that the generated unit tests work even in the evaluation framework because a single non-compiling unit test will result in zero mutation score. If EvoSuite does not work with the evaluation framework, consider using Randoop, which is possibly less sophisticated than EvoSuite, but generates clean unit tests without any non-standard dependency.

Mutation testing: To measure mutation score you may use PITest. If you use IDEA, a nice plugin providing integration of PITest into IDEA is Zester. Following mutators were used to calculate the mutation score:
  1. Return Values Mutator
  2. Negate Conditionals Mutator
  3. Void Method Call Mutator
  4. Conditionals Boundary Mutator
  5. Increments Mutator
Warning: The branch coverage and the mutation score that you obtain from your favourite tools may not 100% agree with the scores reported by MoocTest - the tool used to evaluate these two metrics in the competition. Hence, if you can, train yourself against the framework used at the contest.

Another trickery I run into is that JDK7 was required. But because I also had JDK6 and JDK8 installed on my computer, I run into unpleasant clashes during the competition that has cost me 7 minutes of debugging. If you can, have just a single JDK installed on your computer.

Finally, the source codes were in their default packages and that was interfering with PITest (I realized only after the competition that Zester does NOT work if either the source code or the test code is in the default package). Hence, if you design an intricate plan how to win the competition, prepare also a simple fallback plan.

pátek 23. června 2017

Each database has its own charm, it only takes a while to discover it

Microsoft SQL Server
This is a tough one. But this was the only database that warned me about data corruption when the SSD disk started to fail. While other databases (Oracle, PostgreSQL and MySQL) were silently returning corrupted data. To be fair to other databases, if nothing else, you can [[https://www.postgresql.org/docs/9.5/static/wal-reliability.html|configure]] PostgreSQL to be more resistant to silent errors.

MySQL
You can relly on the order of the data in the tables, i.e. the tables are not sets, as dictated by relational algebra, but lists! That makes working with the database more intuitive. Of course, some other databases have this property as well (Oracle, MSSQL, SAS) but other not (PostgreSQL, Teradata, Netteza) - generally, all distributed databases use sets.

Another nice property is that you can change the order of the columns in a table any time you want to (in PostgreSQL, for example, you can only append new columns at the end of the tables).

PostgreSQL
PostgreSQL has the nicest installer on OS X I have ever seen for a database. It's just drag and drop like any other normal app. And after starting the database it tells you the connection parameters. And that's it! No configuration needed! In comparison to an installation of SAS or a full-blooded Oracle, it is Heaven versus Hell. Also, PostgreSQL does not need any configuration fine tuning to be usable. Once, I installed MySQL and PostgreSQL at the same server and mirrored their content. While PostgreSQL worked without any touch for a year, I had to change the configuration of MySQL multiple times, because some default limit (always different one) was too tight.

PostgreSQL, in comparison to MySQL (at least to MariaDB v10), allows renaming of schemas.

Also, PostgreSQL has, in comparison to MySQL, a sane behavior of temporal data types (whenever you are working with timestamps in MySQL, read the documentation - the behavior can be insane, but is documented).

Oracle
Hand down Oracle has the best execution planner I have ever used. Plus, it provides a rich set of commands.

SAS
The best part of SAS's SQL is that it allows you to use some of the data step conventions in the SQL. Do you need to limit the count of the read (not outputted!) rows from a table? No problem, just use inobs parameter! Or is the SQL too inconvenient for your task? Just use SAS code!

sobota 25. února 2017

TPC benchmarks

Since I repeatedly stumble upon the problem how to generate the benchmark databases, here are the references:
  1. HammerDB (for TPC-C and TPC-H databases).
  2. Benchmark Factory (for TPC-C, TPC-D, TPC-E, TPC-H and ASP3AP databases).

středa 22. února 2017

LaTeX

Sometimes I find LaTeX to be frustrating because of the following reasons:
  1. It is difficult to parse TeX code. This occasionally leads to unclear error messages and inconsistent treatment of white space characters.
  2. It is sensitive to the order, in which packages are loaded. If you are loading more than a few packages, finding the correct order can be a non-trivial problem.
  3. Inconsistent quality of typesetting. While TeX is prized for typography, kerning by default is not applied on equations.
For further discussion of the topic see 25 Years of TEX and METAFONT: Looking Back and Looking Forward.