pondělí 23. dubna 2018

Bugs in code

It is said that there are just two hard problems in computer science: cache invalidation, naming things, and off-by-1 errors. Nevertheless, the most memorable bugs for me are race-conditions and numerical stability problems.

Race-conditions are memorable, because they are difficult to reproduce. While numerical stability related issues are memorable, because it is difficult to fix them correctly.

pondělí 26. března 2018

Type systems in programming languages

There are three approaches to data types:
  1. Have a very few basic data structures. This is the philosophy of Scheme.
  2. Have a rich type system that is organized into a hierarchy. Haskel is famous for this approach.
  3. Have a rich type system that is using union types. This is the Ceylon way.
Languages that desire simplicity generally favour having just a few data types. The advantage of this approach is that you need to write just a few different functions to handle all possible data types. However, there are two disadvantages. First, sometimes a specialized data type can dramatically decrease runtime and memory requirements. Second, sometimes you may want to deliberately limit the domain that a variable can take to be sure that a variable may not contain illegal values.

But once you have many data types, you frequently find yourself in a need to write the same function but for multiple data types. An illustrative example would be a sum function that should be able to work on integer vector and floating point vector. But it can quickly become tiresome to maintain multiple copies of the same code. Languages with the rich data type system have to provide some way how to avoid plain duplication of the code. Haskell solves it by providing data types in a hierarchy - if you need to write a function that works on numbers, just write it for the common ancestor of all numbers.

The hierarchical type system is, however, quite restrictive. If there are types that fulfill requirement α and some types fulfill requirement β, then we can represent such types in a Venn diagram:

But if all 3 sections, A, B and A∩B are non-empty, we cannot say that α⊆β or β⊆α. In other words, with a hierarchical system we cannot use both, α and β characteristics, to describe the types. We can, at best, just pick one of these characteristics to describe the data types.

A concrete example of this dilemma are integers in databases. Should we treat them as additive or as "groupable"? Integers are both. But continuous numbers are, arguably, just additive. While characters are, arguably, only groupable. In a hierarchical system, we generally take the smaller of the evils and accept that we can test exact equality even on floating point numbers. Union type systems allow you to tackle this dilemma (protocols in Swift or traits in Python take similar approach). Btw., if you are using tags to organize your bookmarks, files or photos, you are already using a union system. A nice critique of hierarchical systems is given in Goodbye, Object Oriented Programming.

How to return multiple values from a function

There are three approaches how to return multiple values from a function:
  1. Return some object. This is the canonical solution in objective and functional languages (which return "tuples").
  2. Use function parameters for the output. This is the canonical solution in Prolog.
  3. Allow functions to return multiple values. This is the canonical solution in Matlab.
Matlab's approach is beautiful, till you have to return up to 4 variables. Bu it does not scale to higher count of return values, because you have to address the output values by position - addressing by name does not work. And just a mental picture of me trying to read the 100th return value by writing the correct amount of commas and getting it wrong by one comma is painful.

Prolog's approach requires convention to differentiate between input parameters and output attributes.

Hence, the nicest approach, in my opinion, is to return "tuple", which can be addressed by both, position and name. This is the approach taken by R.

pondělí 19. března 2018

Dream castle in Heroes III by unit levels

1) Skeletons
While an average level 1 unit, the necromancy skill makes the skeletons a great choice for large maps with a lot of wandering creatures, which can be harvested.

Disadvantage: On tiny maps occupied only by castles, where the majority of the battles happen during the first two weeks, updated gremlins or sprites would be a better choice since they are better suited for castle conquests during the first weeks than skeletons.

2) Harpy
Updated harpies are masters of guerrilla attacks as the enemy does not retaliate and the harpies return to their original position unharmed. If combined with mass slow, melee units can have a tough time to take down the harpies.  If you dislike loosing units while cleaning the map, this is the creature for you. It was voted to be the second most favourite level 2 unit.

Disadvantage: While harpies are awesome units for map cleaning and castle defense against weak opponents because of their low attrition, harpies are lousy units for final battles where big unit losses have to be expected. If you expect early battles between the main heroes in the open field, wolfs with their impressive damage ability would be a better choice.

3) Elves
Each castle should have at least one shooting unit. Upgraded elves shoot twice during the turn, making their upgraded version highly desirable. Elves were voted to be the favourite level 3 unit.

4) Vampires
Once the count of upgraded vampires reaches ~20 units, they are awesome beasts for map cleaning as you may expect minimal losses due to their ability to resurrect from the flesh of their enemies. Rated the most favourite level 4 unit.

Disadvantage: Just like harpies, vampires are not the best units for the final battles. Furthermore, it takes two weeks to grow the necessary number of units from a single castle.

5) Pits
Upgraded pits can (or could, if they worked also on undeads) raise ridiculously expensive vampires in vast quantities from the slaughtered skeletons. Finally, you do not have to worry about skeleton losses during map cleaning as the lost skeletons get converted into vampires!

Disadvantage: Just like harpies, elves and vampires, this unit has to be upgraded to benefit from them fully.

6) Wyvern
To compensate the need to upgrade almost all lower level units, wyverns do not have to be upgraded to be of some use. Furthermore, wyverns' dwelling has very low prerequisite requirements (just a level 2 dwelling building), making it possible to recruit the wyverns during on the first day. Since they are also fliers, they can be of great use for early castle sieging.

Disadvantage: Champions, black knights and nagas are all beautiful creatures. But no other level 6 dwelling can be build on day 1. If we went with any other unit, we would not be able to recruit level 7 units on day 2 of the first week.

7) Firebird
While firebird is not the best level 7 unit, the fact that the dwelling is one of the cheapest (the second behind titan's), the unit itself is inexpensive, the weekly growth is above average, it is one of the fastest units in the game giving you the opportunity to apply mass haste/slow and decimate the opponent before he gets to the turn, and fire immunity allowing you to embrace Armageddon strategy (with the help of a support hero to store raised skeletons) make it a pretty decent choice. It is the only level 7 unit whose parameters were down-tuned in Horn of the Abyss. Furthermore, firebird dwelling building has only one prerequisite - level 6 dwelling building. Hence, if you play on normal difficulty, you just need to get 5 wood (or better, set the starting bonus to resources - wood and stone, and you do not have to worry about getting any additional resource or gold) and on the second day you will have 2 firebirds, 2 wyverns and 3900 gold in the pocket to spend it on whatever you want to (based on the situation you may want to keep it for capitol erection, or spend it on another hero, 8 harpies and 6 skeletons).

Since all the units would be from the same castle, there would not be morale penalty from mixing several units together. The castle would be usable for blitzkrieg strategy during the early phase of the game if played on normal difficulty level thanks to wyverns and firebirds. But the castle would also scale to long games due to the ability to harvest skeletons, which would be continuously converted into vampires. And because upgraded harpies and vampires simply do not die during the map cleaning, you may go into the final battles with vast counts of skeletons, harpies, and vampires. Furthermore, firebirds and Phoenixes are, contrary to many dragons, immune to Armageddon spell. And because Phoenixes are the fastest units in the game, you are almost guaranteed, that you get the opportunity to cast Armageddon before the opponent gets the opportunity to do anything. Hence, Phoenixes can be in some situations used for dragon slaying without any loss.

čtvrtek 14. prosince 2017

When do I write unite tests

  1. Whenever I am updating some piece of code that already works. By writing the unit tests, I make sure that the new code is at least as good as the old code. And that the new code does not brake things.
  2. Whenever I am fixing a bug. The unit test demonstrates that the bug was fixed. And it makes sure that the bug does not return in the future due to refactoring.
  3. Whenever I am not able to write a working code on the first attempt, it is a sign of complexity. And since it is said that debugging is tougher than writing a code, I want to make sure that some bug did not pass unnoticed.
  4. Whenever I am assigning a programming task to someone. A set of unit tests helps to communicate what do I want to get. And it nudges the assignee to use my interface, simplifying integration of the delivered code on my side.
  5. Whenever I get the code from the assignee. Reasoning: Whenever I am assigning a task, I generally provide just a very limited set of examples that the code has to pass because:
    1. Writing a comprehensive set of tests takes a lot of effort. Frequently more than writing the function itself.
    2. The assignee may find a much better solution to the problem that is incompatible with the original unit tests. When this happens, I genuinely want to use the better solution. But I do not want to waste a lot of my work.
    Unfortunately, when people write a code that passes all my tests they think that the code may not contain any more bugs. I enjoy proving them wrong.
  6. Before deploying the code. It happened to me in the past that my code was passing all the tests that I was using during the development. To enjoy the victory, I thrown at the code some newly generated data, expecting a beautiful result. But the code failed. Just like the assignee tend to overffit the "training" unit test set, so do I.
  7. Before publishing the code. A good unit test works as a nice illustration of how the code can be used. 

What to test in data processing code

Based on the past experience, data are dirty. If the current data are not dirty, the next one will be. Hence, it is reasonable to test the code for such scenarios. My checklist follows:
  1. Does the code do what it is supposed to do on a trivial example? The purpose of this unit test is two fold: it is a documentation of how the code can be used. And if this test fails on the client's computer but works correctly on your own computer, it is a sign of an integration problem.
  2.  If the code accepts numeric data, check following scenarios:
    1. Zero (a smoke test for division by zero error)
    2. One
    3. Negative number (not all algorithms accept negative values)
    4. Decimal number (not all algorithms accept decimals)
    5. NaN (not a number - permissible for example in doubles) 
    6. null (missing value)
    7. Empty set (no data at the input at all)
    8. Plus/minus infinity
    9. Constant vector (for example when variance is calculated and used in a denominator, we get division by zero)
    10. Vector of extremely large values and extremely small values (test of numerical stability)
  3. If the code accepts character data, check following scenarios:
    1. Plain ASCII
    2. Accented characters
    3. null (missing value)
    4. Empty set (no data at the input at all)
    5. Ridiculously long text
    6. Empty value ("")
    7. White space character (e.g. " ")

pondělí 4. září 2017

Keyboard shortcuts on MacOS

MacOs does some things right. And some not so much.

  1. A dedicated (and simple!) shortcut for displaying the configuration of an application. On windows, you generally have to fish for "setting" or "preferences" in the application menu. Not so on MacOS - it is always command+",".
  2. A variety of shortcuts for different screenshots.
  1. No dedicated keyboard shortcut for find and replace. On Windows, it is always ctrl+"r". On MacOS, it is application specific.  Sometimes it is command+alt+"f", sometimes command+shift+"f" and sometimes there is simply no keyboard shortcut at all and you have to use mouse to tell the app to perform the find and replace and not just find! If you wonder, both, F5 and command+"r" perform refresh.
  2. No keyboard shortcut for displaying context menu - you have to use right click on the mouse.