pátek 21. října 2011

Datamining

Statistics is good for description of population - with just two parameters, mean and deviance, you can describe amazing amount of things in the world. And more surprisingly, you can do that accurately due to the central limit theorem.On the other hand classification is good for description of individuals. Statistics tell you: with a 60% probability you die because this illness is averagely lethal in 60%. But classification can tell you more accurate prediction: you will die with a 58% probability because people with similar parameters to you die with a 58% probability. The disadvantage of this approach is that for classification you need much more data than for statistics. And a big amount of data requires datamining.


Thus if you need the highest possible accuracy you can get, you need datamining. If you are not an expert in datamining then you need a datamining expert. You need me. Leave a comment and I will contact you.

Žádné komentáře:

Okomentovat