Datafilos: května 2015

The problem with percentage errors that is often overlooked is that they assume a meaningful zero. For example, a percentage error makes no sense when measuring the accuracy of temperature forecasts on the Fahrenheit or Celsius scales.

It means that percentage errors are useful only when changing the scale does not make any difference to the percentage error. It is said that percentage error is not useful incase of Celsius and Fahrenheit calculation because consider forecasted value is 10 degree and actual value is 11 degree then percentage error is 9.09% but the same thing gives a different solution when we convert to Fahrenheit, 10 degree Celsius is 50 Fahrenheit and 11 degree is 51.8 Fahrenheit so here the percentage error becomes 3.47%.

But when we consider a different measure like meters and inches, here we have a meaningful zero. Forecasted is 10 m and actual is 11m then percentage error is 9.09% . 10m is 393.7 inch and 11m is 433.07 inch and here the percentage error is also 9.09%.

They (MAPE) also have the disadvantage that they put a heavier penalty on negative errors than on positive errors.

This means that if there are negative errors the MAPE value becomes higher than for positive cases. Example if forecasted value is 27 and actual is 23 then MAPE is 17% and if forecasted is 23 and actual is 27 then MAPE is 14.8%. Though the absolute difference in both cases is 4 but the negative error gives a higher percentage error when compared to positive error.

I will start with a story and then we will transfer the knowledge from the story to F-measure.

Once upon a time, a friend of mine shared with me his problem. He had a website about cars, equally used by Americans and Canadians. And he wanted to calculate average fuel consumption per each type of a car. But Americans are using miles per gallon while Canadians are using litres per 100 kilometres. My friend was well educated to know that he has to convert all the measurements to the same units before he averages them. But to his surprise, when he calculated arithmetic mean of the measurements in the American system and the result converted to Canadian system, he got different result from the arithmetic mean of the measurements in the Canadian system. And now, what is the right unit system to use?

If he decided to use American system without any justification, he would upset Canadians. If he used Canadian system without any justification, he would upset Americans. And because Americans and Canadians equally visited my friend’s web page, he didn't feel comfortable with going with either system without any reasoning.

I suggested him to use a diplomatic solution and go with the geometric mean because geometric mean returns average directly convertible to the other system. Hence in the documentation of the system he doesn't have to spell the units. All he has to say is that he is using geometric mean. Example follows:

Let's imagine that my friend wanted to average two cars, one with consumption of 5 l/100 km and the second car with consumption of 10 l/100km. Arithmetic mean of these two values is 7.5 l/100 km. But if he first converted the units into miles per gallon (for example with Google using following query: 5 l/100 km in miles/gallon) and then calculated the arithmetic mean, he would get 35.25 miles/gallon. And 35.25 miles/gallon is, approximately, 6.67 l/100 km. And 6.67 is quite different from 7.5!

Nevertheless, if he used geometric mean, he would get 7.07 l/100 km. And that is 33.23 miles/gallons as expected. Everything works nicely.

But was it a correct to use geometric mean? Not quite. If we want to calculate average consumption for a single user, it is appropriate to use arithmetic mean on Canadian system. But if we want to calculate average consumption for a state, it is appropriate to calculate arithmetic mean on American system. Why?

Let's imagine that our hypothetical user has to drive daily exactly 100 km to his job and back. And that he is alternating between his two cars daily (with the cars' consumption from the table). What is his average consumption? His average consumption is 7.5 l/100 km, because 100 km are fixed.

And now let's consider a state, which can afford to provide exactly 1 gallon of gasoline per day. And that the two cars are alternating in the consumption. One day the first car consumes 1 gallon. The second day the second car consumes 1 gallon. What is the average consumption? In this scenario the correct answer is 35.25 miles/gallon (~6.67 l/100 km), because 1 gallon is fixed (reference).

Geometric mean always returns the same consumption (~7.07 l/100 km) regardless of the averaged units. But it doesn't really answer any of the above hypothetical scenarios.

Now, how do we transfer the knowledge about to fuel consumption to F-measure? Let's first define the elements that are averaged in F-measure:
    Precision = TP/(TP+FP)
    Recall = TP/(TP+FN)
If we performed arithmetic mean of precision and recall, we would set the denominator to be constant (recall the examples with the fuel consumption). Unfortunately, precision and recall are using different denominators. However, precision and recall use the same nominator. Couldn't we just calculate arithmetic mean of inverted precision and recall and get an average, which is formally acceptable? The answer is yes, we can! And this process has a special name. The name is "harmonic mean".

You could have noticed in the table, that arithmetic mean in American system produces harmonic mean in Canadian system. And reversely. This is a general property of arithmetic and harmonic mean - they are complementary. And it applies everywhere. For example, if you have two different resistors in series and you want to replace them with a two resistors of the same resistance without changing the total resistance, you use arithmetic mean. But if the resistors were parallel, you would use harmonic mean (reference).

Tl;dr:
Arithmetic mean is applicable, if the denominators of the averaged values are the same.
Harmonic mean is applicable, if the nominators of the averaged values are the same.
Geometric mean is the last resort, if neither nominators nor denominators are the same (reference).

In the F-measure denominators are different, but the nominators are the same. Hence harmonic mean was the best choice for the average.

PS:
Harmonic mean of precision and recall can be written as:
Harmonic mean = 2 * 1/(1/Precision + 1/Recall)
                          = 2 * 1/(1/(TP/(TP+FP)) + 1/(TP/(TP+FN)))
                          = 2 * 1((TP+FP)/TP + (TP+FN)/TP)
                          = 2 * 1((TP+FP+TP+FN)/TP)
= 2 * (TP+FP+TP+FN)/TP

While arithmetic mean of precision and recall can be hardly simplified to something simpler than:
Arithmetic mean = (Precision + Recall)/2
                            = (TP/(TP+FP) + TP/(TP+FN))/2
                            = ((TP(TP+FN) + TP(TP+FP))/(TP+FP)(TP+FN))/2
                            = ((TP(TP+FN+TP+FP))/(TP+FP)(TP+FN))/2

Harmonic mean is in the case of common nominator, but different denominator, truly nicer than arithmetic mean.

Edit:
Another look to the problem can be found in "An Introduction to Information Retrieval" by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze:

Why do we use a harmonic mean rather than the simpler average (arithmetic mean)? Recall that we can always get 100% recall by just returning all documents, and therefore we can always get a 50% arithmetic mean by the same process. This strongly suggests that the arithmetic mean is an unsuitable measure to use. In contrast, if we assume that 1 document in 10,000 is relevant to the query, the harmonic mean score of this strategy is 0.02%. The harmonic mean is always less than or equal to the arithmetic mean and the geometric mean. When the values of two numbers differ greatly, the harmonic mean is closer to their minimum than to their arithmetic mean.

Also, a quote from A survey of named entity recognition and classification written by D. Nadeau:

The harmonic mean of two numbers is never higher than the geometrical mean. It also tends towards the least number, minimizing the impact of large outliers and maximizing the impact of small ones. The F-measure therefore tends to privilege balanced systems.

Datafilos

pondělí 25. května 2015

MAPE - Mean Absolute Percent Error

čtvrtek 21. května 2015

Why to use cosine distance instead of Euclidean distance in text mining?

středa 6. května 2015

Why F-measure uses harmonic mean?