The Gradients of Activation Functions
Activation functions introduce non-linearity into neural networks, but their derivatives are crucial for backpropagation.
- Sigmoid: Given , derive in terms of .
- ReLU: Given , derive . What happens at ? Why is this a practical issue and how is it often handled in implementations?
- Tanh: Given , derive in terms of .
- Vanishing Gradients: For Sigmoid and Tanh, sketch their derivatives. Explain how the properties of these derivatives (especially for values far from 0) can contribute to the "vanishing gradient problem" during backpropagation in deep networks.
- Verification: You can implement these functions and their derivatives in Python and plot them to visually verify your derivations.
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
    # Your implementation here
    pass
# ... similar for relu, tanh
x = np.linspace(-5, 5, 100)
plt.plot(x, sigmoid(x), label='Sigmoid')
plt.plot(x, sigmoid_derivative(x), label='Sigmoid Derivative')
plt.legend()
plt.show()