pondělí 16. července 2018

Impression of R

Following text is structured as a sonnet: thesis, antithesis and synthesis.

The philosophy of R is beautiful. Everything is a function. Hence the syntax highly regular (predictable). And R contains the common syntax sugar that you would expect from a functional language like named and default function parameters. This is a great improvement in comparison to Matlab, where you have to fake this functionality. Also, R always passes by value. Hence, it behaves more predictably than Java, which passes primitives by value and objects, effectively, by reference (Java passes a reference to the object by value). But R utilizes copy-on-write, which helps to avoid unnecessary copying of data.

But there are also disadvantages. In R, you can do the same thing many ways. And that can quickly become overwhelming. You search for a way how to do X. And the search returns ten different ways how to do it. Which of the methods is the best? You don't know. So you test the first one. It works. But it does not allow you to do Y, which is closely related to X. So you test the second approach. It works - it allows you to do both, X and Y. But it is too slow. So you test the third approach. There is a bug, which does not allow you to do X. You test the fourth approach. It does not work at all... I think you got the idea. Finding a way how to do something in R is much more consuming than in Python, where is commonly only a single dominant way how to do something. R community realizes that this is a problem. CRAN introduced a possibility to asses packages by stars. But it got phased out because only a few people were contributing their evaluation. Another disadvantage of R is that when you look at the source code of a function, you newer know ahead, what will you find. Is it function written in R, S or C? Sometimes it really fells like unwrapping a Christmas present - at the beginning you only see the function API. You unwrap it just to find out there is a thin R wrapper inside. So you remove another layer. It is in S. But not pure S - it is FORTRAN code ported into S. So you look into the FORTRAN code. But the FORTRAN code is not a typical FORTRAN code either - it was already ported from something else... I have to admit that I am impressed that so many levels of indirection actually works. But debugging and understanding such code is a nightmare.

My conclusion: R is awesome, if all you need is to use it as script language. Whenever I need to run some exotic statistical method, I can be pretty confident that it was ported to or written in R and distributed as a package. There are situations when I simply do not have choice but  to use R unless I want to implement the method from scratch. But whenever I have to read R code written by my colleagues, it is a hell - each of my colleagues is using different set of packages, different objects (data frame vs. data table vs. tibble vs. ...), different approach to writing loops (for loop vs. apply vs. tidyverse vs. ...). In the end, it is not difficult to understand the code. But it fells like reading text riddled with typos - you understand it. But it takes mental capacity from the content to the representation layer.

Žádné komentáře:

Okomentovat