neděle 28. září 2014

Difference between Machine Learning and Artificial Inteligence

In my biased opinion, the difference between Machine Learning (ML) and Artificial Intelligence (AI) is in the way, how do they solve problems. AI seeks an optimal solution, while ML seeks a usable solution.

And this difference reflects in used tool sets. A typical tool for AI is logic, which is traditionally binary. On the other end probability, a common tool in ML, allows any value between 0 and 1.

The difference reflects in individual algorithms as well. A* is a representative algorithm from AI. It is an elegant algorithm that guaranties, that the returned is optimal. In contrast, neuron network, an algorithm from ML, doesn’t guaranty optimality of the solution at all. But it can tackle much wider range of problems than A*. And that is the reason, why ML is currently more popular and successful than AI. Despite all the hopes, it turned out we are unable to optimally solve many problems like voice or object recognition. All we can hope for is a good enough solution. And that is exactly the thing, where ML beats AI. ML is all about “how to get a usable solution”, while AI is about “how to get an optimal solution”.  And when this optimal solution is unreachable, AI just gives up, while ML gives  at least something.





úterý 9. září 2014

Comparison of SAS data step and SQL

The default tool for ETL in SAS is data step. However, SAS also offers support for SQL. When to use which?

The main advantages of data step are:

  1. Drop keyword. Let's imagine that you want to remove one column from a table with 2000 columns. In SQL you would have to name all columns you want to keep. But in data step it is enough to just name the column you don't want to include. Awesome.
  2. Wildcards. If you want to select all columns beginning with "pred_", all you have to do in data step is to write "pred_:" (note the column). In SQL you would have to write name of each predictor.
  3. Speed. SQL in SAS is not implemented overly effectively. 
  4. LAG command. In SQL you have to perform a slow and cumbersome join to get the corresponding functionality. 
The main advantages of SQL:
  1. Group by command. Simply because data step doesn't offer such functionality.
  2. Order by command. Again you can't sort directly in data step.  
  3. Metadata. Queries on the metadata are so addictive!