Cross-Entropy: A Measure of Surprise
Cross-entropy loss is fundamental for classification tasks. Let's build some intuition for its formulation.
- Definition: For a binary classification problem, the binary cross-entropy (BCE) loss for a single sample is given by , where is the true label (0 or 1) and is the predicted probability of the positive class.
- Case Analysis:
- Assume . How does behave as approaches 1? As approaches 0?
- Assume . How does behave as approaches 1? As approaches 0?
- Information Theory Connection: Briefly explain how cross-entropy relates to self-information and entropy. Why might a model be "surprised" when its prediction for a true class is very low?
- Verification: You can write a small Python function for BCE and test it with different pairs to confirm your understanding of its behavior.
import numpy as np
def binary_cross_entropy(y_true, y_pred):
# Your implementation here
# Ensure to handle log(0) cases, usually by clipping predictions
epsilon = 1e-10
y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
# return ...
pass
# Test cases:
# print(binary_cross_entropy(1, 0.99))
# print(binary_cross_entropy(1, 0.01))
# print(binary_cross_entropy(0, 0.99))
# print(binary_cross_entropy(0, 0.01))