čtvrtek 15. ledna 2015

Reserved words. Or no reserved words.

Recently I have heard that Prolog is awesome because it uses only 6 reserved words. However, is less better?

I argue that there is a sweet spot. You can write your code in bit code. But sooner or later you will realize that you are repeating the same sequence again and again. So you start to giving names to the repeating sequences. Congratulation. You have just reinvented assembler. Later on you realize that it would be nice to simplify some constructs. And you end up with something like C.

On the other end there are languages which have many keywords. Like SAS. There are so many keywords to remember, that you have to use documentation all the time. Hence this extreme is neither ideal.

OK. Too little or too much reserved words is bad. But where is the sweet spot? Let's introduce a parallel between computer and human languages. They both function for communication. And some computer languages, like Prolog or SQL, were modeled after human languages. Hence we can translate the problem from what is the optimal count of reserved words in computer languages to the count of unique sound atoms in human languages. And in many cases we can approximate this count with the length of alphabet. Here is (an incomplete) list of alphabets:

27 Hebrew
27 Latin
28 Arabic
28 Hindi
29 Cyrillic
48 Kanji

There are some extremes in the coding. For example, Chinese are using many characters. However, one character doesn't have to always represent an atomic sound. On the other end we can represent a language with Morse code. However, it doesn't appear to be a preferred way of communication - otherwise we would not bother to convert human voice to bits, send it over space and transform back to sound when we are talking over cellphones when the cellphones has a button, which directly generates signal in bit form.

Since the distribution of lengths of alphabets is left bounded, it is asymmetric. Hence it is better to use median for the mean value. And based on millenniums of evolution we are safe to say that the optimal count of reserved words in a programming language is 28.

Of course you can argue, that they are accents. Or changes in pronunciation when one character follows another. But you could say something similar about the programming languages. Hence let it ignore for the simplicity.

How do the computer languages compare between each other? Let's see:
Based on this comparison Python made it right.

For discussion about the topic see http://lambda-the-ultimate.org/node/4295.










pondělí 12. ledna 2015

Can Artificial Inteligence overthrow humanity?

If I was an AI attempting to overthrow the humanity I would use following two strategies: pretend, that the AI is indeed stupid and behave symbiotically.

In a virus simulation Pandemic you have to engineer a disease to kill all the humans around the world. And the best strategy is develop a disease with a long incubation period. That way it will spread unnoticed. The development of computer intelligence is far behind the expectations from the sixties. Doesn't AI just pretend to be stupid?

A nice strategy how to help with the spread of an infection is to behave symbiotically. For example Stuxnet virus was enhancing the functionality of the software to mask itself as a software update. And don't computers pretend to be useful and spread everywhere?

And the best thing is, that we are now depending on the computers.



sobota 1. listopadu 2014

3Vs (variety, velocity and volume)

Three terms stood out in relation to Big Data.
  Variety, Velocity and Volume.
In marketing, the 4Ps define all of marketing using only four terms:
   Product, Promotion, Place, and Price.


pátek 3. října 2014

Comparison of MATLAB and R

Advantage of R:
  • Easy setting of default parameters (inheritance from functional languages). Not that it is incredibly difficult to set a default value in MATLAB, but it's verbose and error prone.
  • Named parameters (again, inheritance from functional languages). In MATLAB, when you pass many parameters with string values to a function, it's unclear at glance, what is parameter name and what is parameter value. In R, it's immediately clear.
  • Mixed tables (combination of string and numerical columns). Incredibly useful for real world messy data sets. A partial remedy to this problem is 'Tables' in the late versions of MATLAB.
  • Possibility to name rows and columns.This is awesome because you don't have to remember that you want column 181, all you have to remember is the name of the column. Also, it has the advantage that metadata are together with the data. Hence if you perform selection, projection or transformation of the data, the metadata are automatically in sync with the data. No work is left on the user. In MATLAB, you have to use 'Struct'. Or 'Tables' in the late versions of MATLAB.
  • Negative indexes for dropping of particular columns/rows.
Advantage of MATLAB:
  • There is a fewer competing packages for MATLAB than for R. Hence in MATLAB you are spared of deciding, which library is the best.
  • Spare matrices are integral part of MATLAB. Hence all algorithms benefiting from spare matrices are using the same representation of spare matrices. In R, each library is using it's own representation.

neděle 28. září 2014

Difference between Machine Learning and Artificial Inteligence

In my biased opinion, the difference between Machine Learning (ML) and Artificial Intelligence (AI) is in the way, how do they solve problems. AI seeks an optimal solution, while ML seeks a usable solution.

And this difference reflects in used tool sets. A typical tool for AI is logic, which is traditionally binary. On the other end probability, a common tool in ML, allows any value between 0 and 1.

The difference reflects in individual algorithms as well. A* is a representative algorithm from AI. It is an elegant algorithm that guaranties, that the returned is optimal. In contrast, neuron network, an algorithm from ML, doesn’t guaranty optimality of the solution at all. But it can tackle much wider range of problems than A*. And that is the reason, why ML is currently more popular and successful than AI. Despite all the hopes, it turned out we are unable to optimally solve many problems like voice or object recognition. All we can hope for is a good enough solution. And that is exactly the thing, where ML beats AI. ML is all about “how to get a usable solution”, while AI is about “how to get an optimal solution”.  And when this optimal solution is unreachable, AI just gives up, while ML gives  at least something.





úterý 9. září 2014

Comparison of SAS data step and SQL

The default tool for ETL in SAS is data step. However, SAS also offers support for SQL. When to use which?

The main advantages of data step are:

  1. Drop keyword. Let's imagine that you want to remove one column from a table with 2000 columns. In SQL you would have to name all columns you want to keep. But in data step it is enough to just name the column you don't want to include. Awesome.
  2. Wildcards. If you want to select all columns beginning with "pred_", all you have to do in data step is to write "pred_:" (note the column). In SQL you would have to write name of each predictor.
  3. Speed. SQL in SAS is not implemented overly effectively. 
  4. LAG command. In SQL you have to perform a slow and cumbersome join to get the corresponding functionality. 
The main advantages of SQL:
  1. Group by command. Simply because data step doesn't offer such functionality.
  2. Order by command. Again you can't sort directly in data step.  
  3. Metadata. Queries on the metadata are so addictive!