Understanding Loss Functions for Classification

0. MOTIVATION

Until now, I'm still overwhelmed by the wide variety of options available for each hyperparameter of the model. What are the loss functions? Which activation function to use for the last layer? How to select a loss function for a binary vs. multi-class classification? I realize that those questions will keep coming if I don't establish a solid understanding of each option. Hence, I created this notebook as a reference point to answer those questions.

What are the loss functions?

Loss functions ( or Error functions) are used to gauge the error between the prediction output and the provided target value. A loss function will determine the model's performance by comparing the distance between the prediction output and the target values. Smaller loss means the model performs better in yielding predictions closer to the target values.

Let's define a loss function (J) that takes the following two parameters:

Predicted output (y_pred)
Target value (y_true)

This function will determine your model’s performance by comparing its predicted output with the actual target values. If the deviation between y_pred and y_true is very large, the loss value will be very high.

If the deviation is small or the values are nearly identical, it’ll output a very low loss value. Therefore, a proper loss function is necessary if you want to penalize a model properly during training.

Loss functions change based on the problem statement that your model is trying to solve. The goal of training your model is to minimize the error between the target and predicted value by minimizing the loss function.

로스(에러)가 작으면 정확도가 높다는 뜻.

How to select a loss function for your task?

Based on the nature of your task, loss functions are classified as follows:

1. Regression Loss:

Mean Square Error or L2 Loss
Mean Absolute Error or L1 Loss
Huber Loss

2. Classification Loss:

Binary Classification:

Hinge Loss
Sigmoid Cross Entropy Loss
Weighted Cross Entropy Loss

Multi-Class Classification:

Softmax Cross Entropy Loss
Sparse Cross Entropy Loss
Kullback-Leibler Divergence Loss

Let's go to kaggle.

jaeneung