ML Katas

Cross-Entropy: A Measure of Surprise

medium (<30 mins) Loss Functions Cross-Entropy Information Theory
this year by E

Cross-entropy loss is fundamental for classification tasks. Let's build some intuition for its formulation.

  1. Definition: For a binary classification problem, the binary cross-entropy (BCE) loss for a single sample is given by L=[ylog(y^)+(1y)log(1y^)], where y is the true label (0 or 1) and y^ is the predicted probability of the positive class.
  2. Case Analysis:
    • Assume y=1. How does L behave as y^ approaches 1? As y^ approaches 0?
    • Assume y=0. How does L behave as y^ approaches 1? As y^ approaches 0?
  3. Information Theory Connection: Briefly explain how cross-entropy relates to self-information and entropy. Why might a model be "surprised" when its prediction for a true class is very low?
  4. Verification: You can write a small Python function for BCE and test it with different (y,y^) pairs to confirm your understanding of its behavior.
import numpy as np

def binary_cross_entropy(y_true, y_pred):
    # Your implementation here
    # Ensure to handle log(0) cases, usually by clipping predictions
    epsilon = 1e-10
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    # return ...
    pass

# Test cases:
# print(binary_cross_entropy(1, 0.99))
# print(binary_cross_entropy(1, 0.01))
# print(binary_cross_entropy(0, 0.99))
# print(binary_cross_entropy(0, 0.01))