ML Katas

The Gradients of Activation Functions

medium (<1 hr) Activation Functions Backpropagation Derivatives
this year by E

Activation functions introduce non-linearity into neural networks, but their derivatives are crucial for backpropagation.

  1. Sigmoid: Given σ(x)=11+ex, derive dσdx in terms of σ(x).
  2. ReLU: Given ReLU(x)=max(0,x), derive dReLUdx. What happens at x=0? Why is this a practical issue and how is it often handled in implementations?
  3. Tanh: Given tanh(x)=exexex+ex, derive dtanhdx in terms of tanh(x).
  4. Vanishing Gradients: For Sigmoid and Tanh, sketch their derivatives. Explain how the properties of these derivatives (especially for values far from 0) can contribute to the "vanishing gradient problem" during backpropagation in deep networks.
  5. Verification: You can implement these functions and their derivatives in Python and plot them to visually verify your derivations.
import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    # Your implementation here
    pass

# ... similar for relu, tanh

x = np.linspace(-5, 5, 100)
plt.plot(x, sigmoid(x), label='Sigmoid')
plt.plot(x, sigmoid_derivative(x), label='Sigmoid Derivative')
plt.legend()
plt.show()