pondělí 26. března 2018

Type systems in programming languages

There are three approaches to data types:
  1. Have a very few basic data structures. This is the philosophy of Scheme.
  2. Have a rich type system that is organized into a hierarchy. Haskel is famous for this approach.
  3. Have a rich type system that is using union types. This is the Ceylon way.
Languages that desire simplicity generally favour having just a few data types. The advantage of this approach is that you need to write just a few different functions to handle all possible data types. However, there are two disadvantages. First, sometimes a specialized data type can dramatically decrease runtime and memory requirements. Second, sometimes you may want to deliberately limit the domain that a variable can take to be sure that a variable may not contain illegal values.

But once you have many data types, you frequently find yourself in a need to write the same function but for multiple data types. An illustrative example would be a sum function that should be able to work on integer vector and floating point vector. But it can quickly become tiresome to maintain multiple copies of the same code. Languages with the rich data type system have to provide some way how to avoid plain duplication of the code. Haskell solves it by providing data types in a hierarchy - if you need to write a function that works on numbers, just write it for the common ancestor of all numbers.

The hierarchical type system is, however, quite restrictive. If there are types that fulfill requirement α and some types fulfill requirement β, then we can represent such types in a Venn diagram:

But if all 3 sections, A, B and A∩B are non-empty, we cannot say that α⊆β or β⊆α. In other words, with a hierarchical system we cannot use both, α and β characteristics, to describe the types. We can, at best, just pick one of these characteristics to describe the data types.

A concrete example of this dilemma are integers in databases. Should we treat them as additive or as "groupable"? Integers are both. But continuous numbers are, arguably, just additive. While characters are, arguably, only groupable. In a hierarchical system, we generally take the smaller of the evils and accept that we can test exact equality even on floating point numbers. Union type systems allow you to tackle this dilemma (protocols in Swift or traits in Python take similar approach). Btw., if you are using tags to organize your bookmarks, files or photos, you are already using a union system. A nice critique of hierarchical systems is given in Goodbye, Object Oriented Programming.

Žádné komentáře: