pondělí 4. září 2017

Keyboard shortcuts on MacOS

MacOs does some things right. And some not so much.

Positives:
  1. A dedicated (and simple!) shortcut for displaying the configuration of an application. On windows, you generally have to fish for "setting" or "preferences" in the application menu. Not so on MacOS - it is always command+",".
  2. A variety of shortcuts for different screenshots.
Negatives:
  1. No dedicated keyboard shortcut for find and replace. On Windows, it is always ctrl+"r". On MacOS, it is application specific.  Sometimes it is command+alt+"f", sometimes command+shift+"f" and sometimes there is simply no keyboard shortcut at all and you have to use mouse to tell the app to perform the find and replace and not just find! If you wonder, both, F5 and command+"r" perform refresh.
  2. No keyboard shortcut for displaying context menu - you have to use right click on the mouse.

středa 30. srpna 2017

Not invented here syndrom

Whenever I find that my code can be replaced with a system call, I go a long way to replace my call with the system call because I generally trust more the sytem libraries than my own code. When it comes to third party libraries, I am conservative. Sometimes they are outright buggy, sometimes they are correct but get deprecated and removed. And sometimes the whole system gets so complex due to the gross amount of third party code that no one wants to deal with the mess anymore and the project dies off. Hence, I like to try new 3rd party libraries and study them. But I commit into using them extensively only after thorough consideration of alternatives and testing.

pátek 11. srpna 2017

A proper handling of nulls

I am familiar with two approaches how to deal with nulls in programming languages: either completely avoid them or embrace them. Languages for logic programming generally avoid the nulls (a great apologetic is given by Design and Implementation of the LogicBlox System). But languages that permit them should provide following facilities:

1) meta information about why the value is missing
2) nullable & non-nullable variables

For example, SAS got it right 40 years ago. In SAS, a missing value is represented with a dot. That by itself is not great whenever you need to print out the code or the data, because you never know whether that dot really represents a missing variable or it is just an imperfection of the paper or the printer. But it permits to easily define the reason why the value is missing:
    .                 // Generic missing value
    .refusedToAnswer  // Missing value with a metadatum
    .didntKnow        // Missing value with a different metadatum
Hence, generic algorithms can threat all missing values the same way. But if you want to treat them differently, for example because refusedToAnswer can have a vastly different meaning in a questionary than didntKnow, you can do it.

Furthermore, SAS provides optional non-null constraints on attributes, just like SQL. The only ward on SAS's implementation is that it raises exceptions only during the runtime, not during the compilation time as, for example, Kotlin does. Note also that nullability must be configurable for all variables. For example in C#, nullability is configurable only for value types, not class types. And this omission is a source of many null-pointer exceptions.

There is just one thing where I am not sure which approach is better. If we have a function sum, it can:
  1. Accept nullable variables and use some default strategy to deal with nulls (as SQL does).
  2. Accept nullable variables and blow during the runtime when null is encountered, unless some strategy is defined (R takes this approach).
  3. Have a dedicated sumnan function, which accepts nullable variables and takes a parameter, which determines, how nulls should be treated. Non-nulable variables get accepted by sum function (something like this is used in MATLAB, minus the type control).
The first approach is convenient to use. But potentially dangerous, because the user may never realize that a null leaked into the data and the conclusions are wrong.

The second approach is safer, because the user at least learns that something is wrong immediately when null is passed to the function without passing a strategy how to deal with nulls. The disadvantage is that it is a runtime check, not a static check. Nevertheless, programmers that are concerned about safety may use a lint to identify calls to functions with nullable variables without defining the strategy what to do with nulls.

The third approach is easier to validate by the compiler than the second approach. But the naming convention can be difficult to enforce.

Do you have some thought about this issue?

pondělí 31. července 2017

International Software Testing Contest - experience

Why I write about the contest: Since I am one of the winners of the first ISTC competition held in 2017, it is in my best interest to promote this competition in order to make famous.

The assignment description: The contest consisted of writing tests for two Java projects: Naive Bayes classifier and Prim's algorithm for calculation of minimum spanning trees. A copy of the assignment: download.

Strategy: In 2017, we were evaluated only based on the branch coverage (line coverage was ignored contrary to what was written in the invitation) and the mutation score. Only in the case of a tie, the count of test cases would have be taken into the consideration. Since no tie happened in 2017, the recommended strategy for the next years (under the assumption that the rules do not change) is simple: maximize branch coverage even at the expense of the count of test cases.

Furthermore, to maximize the mutation score you have to use asserts in your tests. My strategy was: print the result of the method into a console:
    System.out.println(someMethod());
And use the printed output in the assert:
    assertEquals("textOutput".trim(), someMethod().trim());
I used trim methods because I did not want to deal with the errors caused by wrongly copy-pasting too many/too few white space characters. Is it a test strategy I would use outside of the contest? No way, because the toString() format can change any time. It may not catch all deviations. And not all objects have to implement toString(). But at ISTC 2017 it worked reasonably well.

Idea: Use a generator of unit tests like EvoSuite. However, it is critical to make sure that the generated unit tests work even in the evaluation framework because a single non-compiling unit test will result in zero mutation score. If EvoSuite does not work with the evaluation framework, consider using Randoop, which is possibly less sofisticated than EvoSuite, but generates clean unit tests without any non-standard dependency.

Mutation testing: To measure mutation score you may use PITest. If you use IDEA, a nice plugin providing integration of PITest into IDEA is Zester. Following mutators were used to calculate the mutation score:
  1. Return Values Mutator
  2. Negate Conditionals Mutator
  3. Void Method Call Mutator
  4. Conditionals Boundary Mutator
  5. Increments Mutator
Warning: The branch coverage and the mutation score that you obtain from your favourite tools may not 100% agree with the scores reported by MoocTest - the tool used to evaluate the these two metrics in the competition. Hence, if you can, train yourself against the framework used at the contest.

Another trickery I run into is that JDK7 was required. But because I also had JDK6 and JDK8 installed on my computer, I run into unpleasant clashes during the competition that has cost me 7 minutes of debugging. If you can, have just a single JDK installed on your computer.

Finally, the source codes were in their default packages and that was interfering with PITest (I realized only after the competition that Zester does NOT work if either the source code or the test code is in the default package). Hence, if you design an intricate plan how to win the competition, prepare also a simple fallback plan.

pátek 23. června 2017

Each database has its own charm, it only takes a while to discover it

Microsoft SQL Server
This is a tough one. But this was the only database that warned me about data corruption, when the SSD disk started to fail. While other databases (Oracle, PostgreSQL and MySQL) were silently returning corrupted data. To be fair to other databases, if nothing else, you can [[https://www.postgresql.org/docs/9.5/static/wal-reliability.html|configure]] PostgreSQL to be more resistant to silent errors.

MySQL
You can relly on the order of the data in the tables, i.e. the tables are not sets, as dictated by relational algebra, but lists! That makes working with the database more intuitive. Of course, some other databases have this property as well (Oracle, MSSQL, SAS) but other not (PostgreSQL, Teradata, Netteza) - generally, all distributed databases use sets.

Another nice property is that you can change the order of the columns in a table any time you want to (in PostgreSQL, for example, you can only append new columns at the end of the tables).

PostgreSQL
PostgreSQL has the nicest installer on OsX I have ever seen for a database. It's just drag and drop like any other normal app. And after starting the database it tells you the connection parameters. And that's it! No configuration needed! In comparison to installation of SAS or full blooded Oracle, it is Heaven versus Hell. Also, PostgreSQL does not need any configuration fine tuning to be usable. Once, I installed MySQL and PostgreSQL at the same server and mirrored their content. While PostgreSQL worked without any touch for a year, I had to change the configuration of MySQL multiple times, because some default limit (always different one) was too tight.

PostgreSQL, in comparison to MySQL (at least to MariaDB v10), allows renaming of schemas.

Oracle
Hand down Oracle has the best execution planner I have ever used. Plus, it has provides a rich set of commands.

SAS
The best part of SAS's SQL is that it allows you to use some of data step conventions in the SQL. Do you need to limit the count of read (not outputted!) rows from a table? No problem, just use inobs parameter! Or is the SQL too inconvenient for your task? Just use SAS code!

sobota 25. února 2017

TPC benchmarks

Since I repeatedly stumble upon the problem how to generate the benchmark databases, here are the references:
  1. HammerDB (for TPC-C and TPC-H databases).
  2. Benchmark Factory (for TPC-C, TPC-D, TPC-E, TPC-H and ASP3AP databases).

středa 22. února 2017

LaTeX

Sometimes I find LaTeX to be frustrating because of the following reasons:
  1. It is difficult to parse TeX code. This occasionally leads to unclear error messages and inconsistent treatment of white space characters.
  2. It is sensitive to the order, in which packages are loaded. If you are loading more than a few packages, finding the correct order can be a non-trivial problem.
  3. Inconsistent quality of typesetting. While TeX is prized for typography, kerning by default is not applied on equations.
For further discussion of the topic see 25 Years of TEX and METAFONT: Looking Back and Looking Forward.