Friday, August 16, 2013

Accuracy versus F score: Machine Learning for the RNA Polymerases

Hello, today I'm going to show you the difference of using two different common performance measures (useful not only for Machine Learning purposes, is useful in every scientific field). Until now, I have found more the accuracy values than F scores in the performance measuring of some methods which ranges from metaheuristics (Genetic Algorithms fitness functions) to promoter recognition programs, diagnose methods and so on.

But, I would really recommend to avoid using the accuracy measure. The reason is shown below with a nice example in R programming language (all the functions used in the simulation are included,  you can download them clicking here).

Case study 1:

Imagine that you are in a Computer Vision project and your task is to "teach" a program to recognize among  electric guitars and acoustic guitars showing the program pictures of different guitars.

Suppose that you've already developed that program and now you want to measure the performance of this Boolean classifier (this is for example, you show the program a picture of a an electric guitar, and the program has to decide whether it will recognize and "classify" it as an electric or as an acoustic guitar).

For the function of this post, lets write down some useful concepts

Consider the following:

TP: a true positive is when the program classifies an electric guitar as an electric guitar, we will use the letter "E" to denote the electric guitar "class"

FP: a false positive is when the program classifies an acoustic guitar as an electric guitar, we will use the letter "A" to denote the acoustic guitar "class"

FN: a false negative is when the program classifies an electric guitar as an acoustic guitar

TN: a true negative is when the program classifies an acoustic guitar as an acoustic guitar

Now that we are ready, we shall begin with the calculations

In R, I have simulated the results of the program. Say, for 1,000 electric guitar pictures and 1,000 acoustic guitar pictures

The program prompt the following results:

        PREDICTED.E PREDICTED.A
TRUE.E         485         515
TRUE.A           9         991

If you notice, from the 1000 electric guitar pictures, only 485 were labeled as electric (TP=485), the rest were labeled as acoustic (FN=515). I feel bad for the hypothetical programmer of this hypothetical example.

On the other hand, from the 1000 acoustic guitars, 991 were labeled as acoustic (TN=991) and only 9 of them were labeled as electric (FP=9). Well not bad!..... or it is?

The accuracy value of this program is = 0.738

And, for computing the F score is necessary to compute the precision and the recall first, where:

precision = 0.9817814 and recall = 0.485

Then, the F score is equal to 0.6492637


Well, the F scores seems to be more "strict", and in fact it is in comparison of the accuracy performance measure. But this example is not very "cool". Lets pass to the case study 2

Case study 2:

Now we have 1,000 electric guitar pictures and 100,000 acoustic guitar pictures, the confusion matrix of the results are:

        PREDICTED.E PREDICTED.A
TRUE.E         493         507
TRUE.A        1017       98983

If you notice, from the 1,000 electric guitar pictures, only 493 were labeled as electric (TP=493), the rest were labeled as acoustic (FN=507)

On the other hand, from the 100,000 acoustic guitars, 98983 were labeled as acoustic (TN=98983) and only 1017 of them were labeled as electric (FP=1017)

Now (cha cha chan!), the performance values are:

Accuracy: 0.9849109
Precision: 0.3264901
Recall: 0.493
F score: 0.3928287

Now you see it?, how come or how is possible that missing almost the 50% of the labels of the electric guitars, the performance of the program in accuracy is almost 0.99?, despite of having a precision and recall not greater than 0.50. Then we have a winner and is the F score measure.

for references visit the following pages:

http://en.wikipedia.org/wiki/Accuracy
http://en.wikipedia.org/wiki/F1_score
http://en.wikipedia.org/wiki/Precision_and_recall