Datafilos: Reserved words. Or no reserved words.

Recently I have heard that Prolog is awesome because it uses only 6 reserved words. However, is less better?

I argue that there is a sweet spot. You can write your code in bit code. But sooner or later you will realize that you are repeating the same sequence again and again. So you start to giving names to the repeating sequences. Congratulation. You have just reinvented assembler. Later on you realize that it would be nice to simplify some constructs. And you end up with something like C.

On the other end there are languages which have many keywords. Like SAS. There are so many keywords to remember, that you have to use documentation all the time. Hence this extreme is neither ideal.

OK. Too little or too much reserved words is bad. But where is the sweet spot? Let's introduce a parallel between computer and human languages. They both function for communication. And some computer languages, like Prolog or SQL, were modeled after human languages. Hence we can translate the problem from what is the optimal count of reserved words in computer languages to the count of unique sound atoms in human languages. And in many cases we can approximate this count with the length of alphabet. Here is (an incomplete) list of alphabets:

27 Hebrew
27 Latin
28 Arabic
28 Hindi
29 Cyrillic
48 Kanji

There are some extremes in the coding. For example, Chinese are using many characters. However, one character doesn't have to always represent an atomic sound. On the other end we can represent a language with Morse code. However, it doesn't appear to be a preferred way of communication - otherwise we would not bother to convert human voice to bits, send it over space and transform back to sound when we are talking over cellphones when the cellphones has a button, which directly generates signal in bit form.

Since the distribution of lengths of alphabets is left bounded, it is asymmetric. Hence it is better to use median for the mean value. And based on millenniums of evolution we are safe to say that the optimal count of reserved words in a programming language is 28.

Of course you can argue, that they are accents. Or changes in pronunciation when one character follows another. But you could say something similar about the programming languages. Hence let it ignore for the simplicity.

How do the computer languages compare between each other? Let's see:

ANSI COBOL 85: 357
SystemVerilog: 250 + 73 reserved system functions = 323
C#: 79 + 23 contextual = 102
F#: 64 + 8 from ocaml + 26 future = 98
C++: 82
Java: 50
PHP: 49
Ruby 42
Python 3.x: 33
C: 32
Python 2.7: 31
Go: 25
Smalltalk: 6 pseudo-variables
iota: 2

Based on this comparison Python made it right.

For discussion about the topic see http://lambda-the-ultimate.org/node/4295.

Datafilos

čtvrtek 15. ledna 2015

Reserved words. Or no reserved words.

Žádné komentáře:

Okomentovat