pondělí 20. ledna 2014

Comparison of variable selection methods

Since I didn't know which variable selection method to use, I performed a trivial test on Sonar dataset. Sonar dataset has 60 attributes. But I arbitrarily decided to reduce the number of attributes to 10. Then I measured classification accuracy with ten fold cross-validation. And to get an idea how feature selection methods are dependent on the classifiers I tried three different models: naive Bayes, k-NN and classification tree:


Based on the comparison the best method to use is SVM attribute selection. However, this method requires parameter tunning. The next best variable selection method is Chi2. The disadvantage of this method is that it favors attributes with many levels. Hence the performance of Chi2 could be severely hindered on diverse set not like Sonar set. The last method from the top three is information gain ratio. The advantage of this method is that it can handle attributes with diverse number of levels not like Chi2.

Žádné komentáře:

Okomentovat