To wreak my anger I publish a shame list of bugs in RapidMiner.
The list of bugs in RapidMiner 7.1 that I am aware of:
- The first step in ROC (Receiver Operating Curve) is not correctly drawn - the first step is always horizontal. Sometimes the error is negligible. Sometimes it is the whole difference between a perfect model with AUC=1 and a random model with AUC=0.5. The bug is best visible on binary estimates of the label (i.e. all estimates are either p=0 or p=1).
- AUC (Area Under Curve) is similarly way off (like 0.5 instead of 1.0). The reported value is different from both, the expected AUC and the area under the returned (and flawed) ROC.
- DBI (Davies-Bouldin index) reported by the performance operator is negative. But by the definition it can't be negative.
- The returned correlation matrix sometimes contains values out of range (like -67). The error is caused by unstable calculation of variance. Since correlation matrix is in the heart of many algorithms, it is worrisome.
- The operator for declaration of missing values causes troubles because the missing values often backpropagate to other branches (it's because the operator is using on-the-fly processing, which is buggy in RapidMiner). EDIT: fixed in version 7.4.
- Evolutionary optimization often crashes. Even on toy datasets like Iris.
- Weight by Chi Squared Statistic does not work with date attributes while other weighting operators (like Gini or Information Gain Ratio) work.
- Whenever I make a plot and re-run the schema, the setting of the plot is reset.
- They removed an "invert" checkbox from dictionary filter in text mining extension.