středa 26. prosince 2018

The impression of machine learning in R

One professor once said, that whenever a student asks him a question, he immediately knows whether the student studies economy, statistics or computer science. If a student asks him what is the return of investment, the student studies economy. If a student asks him what is the sample size, the student studies statistics. And if a student asks him for an edge scenario, the student is a computer scientist.
From the current state of algorithms in R it is evident that many authors are statisticians but not programmers. On the good side, whenever I need an implementation of some obscure statistical algorithm, I know that either I get it in R or I am out of luck. However, package authors in R frequently forget to check for common edge scenarios like division by zero.

I good burn in the past so many times that I have assembled a table with the prior believe that the implementation is not going to blow up when I throw my dirty data at it:
  1. Core R functions like c(): 100%
  2. Core library functions like chisq.test(): 98%
  3. A function in a library implementing a single algorithm: 70%
  4. A function in a library implementing many algorithms: 40%


Žádné komentáře:

Okomentovat