Skip to content Skip to footer

Evaluation Metrics for Machine Learning or Data Models

Table of contents

  • Confusion Matrix
  • Classification Accuracy
  • Precision/Specificity
  • Recall
  • F-1 Score
  • AUC-ROC
  • Root Mean Square Error(RMSE)
  • Cross-entropy Loss
  • Gini Coefficient
  • Jaccard Score

Confusion Matrix

It is a matrix of size (a x a) where ‘a’ is the number of classes available in the classification data. The x-axis of this matrix can consist of the actual values and the y-axis can consist of the predicted values or vice versa. If the dataset has only two classes or belongs to the binary classification problem then the size of the matrix will be 2 X 2.

  • True Negative(TN): Correct Negative Predictions
  • False Positive(FP): Incorrect Positive Predictions
  • False Negative(FN): Incorrect Negative Predictions
  • False Negative Rate(FNR) = FN/Actual Positive = FN/ (TP + FN) = 25/(45+25) = 0.36
  • True Negative rate = TN/Actual Negative = TN/ (TN + FP) = 30/(30+5) = 0.85
  • False positive rate = FP/Actual Negative = FP/ (TN + FP) =5/(30+5) = 0.15

Classification Accuracy

Using the above interpretation, we can easily calculate the classification accuracy using the following formula:

Precision/Specificity

With imbalanced data, classification accuracy is not the best indication to represent the model performance. In such conditions, we need to deal with a class-specific problem and precision or specificity is the best way to check the model’s performance. To get the value of this indicator, we need to have the true positive divided by the sum value to false positive and true positive.

Recall

The recall is a metric that represents the quantification of correct positive predictions that are made out of all positive predictions. Unlike precision, metrics recall comments on only the correct positive predictions made out of all positive predictions so that an indication of missed positive predictions can be provided. The below formula can be used to calculate the recall of any model:

F-1 Score

We can calculate the F1 score using precision and recall, which can be considered an excellent metric to use when the data is imbalanced.

AUC-ROC

AUC-ROC (Area Under the Curve-Receiver Operator Characteristic) is a curve that makes a plot between TPR and FPR at different threshold values while separating the signals from the noise. The area Under the Curve represents the ability of the model to predict between classes, and the plot uses it as the summary of the ROC curve.

Root Mean Square Error(RMSE)

This metric is used to measure the performance of a regression model, which assumes the errors are normally distributed and unbiased. This is the standard deviation of the prediction errors. Prediction errors are a measure of the distance of the data points to the prediction line. Using the below formula, we can calculate it:

Cross-entropy Loss

It is also known as Log Loss and is famous for evaluating neural networks’ performance because it helps overcome the vanishing gradient problems. By summing the log value of prediction probability distribution for incorrect predictions, we can calculate the Cross-entropy Loss.

Gini Coefficient

This can be calculated using the AUC-ROC number, this is a ratio of ROC and diagonal line. If the value of this coefficient is more than 60%, then model performance is good, and one thing which is important here is that we use it only with classification models.

Jaccard Score

This score represents the similarity index between two datasets. Similar to RMSE, it gives a value between 0 and 1, where 1 represents closer similarity. To calculate this, we need to divide the total number of data points in both sets by the number of observations in either set.

Final words

Here in the above, we have discussed some of the important metrics we use to evaluate the data models in real life. Since models and datasets have different conditions and characteristics, we can optimise different performance levels. So the model performance evaluation needs to be done rightly by knowing the characteristics of different evaluation metrics.