ML Katas

The Implicit Higher Dimension of Kernels

medium (<30 mins) SVM Kernel Methods Feature Engineering
this year by E

Support Vector Machines (SVMs) are powerful, and the "kernel trick" allows them to find non-linear decision boundaries without explicitly mapping data to high-dimensional spaces.

  1. Linear Separability: Consider a 2D dataset where points are not linearly separable in their original feature space. Sketch an example of such a dataset (e.g., concentric circles or an XOR-like pattern).
  2. Feature Mapping: Imagine a simple mapping function ϕ(𝐱)=[x12,2x1x2,x22] that transforms a 2D input 𝐱=[x1,x2] into a 3D feature space. Apply this mapping to two sample points, say 𝐱A=[1,0] and 𝐱B=[0,1].
  3. The Kernel Trick: The "kernel trick" avoids explicit mapping by directly computing the dot product in the higher-dimensional space using a kernel function K(𝐱i,𝐱j)=ϕ(𝐱i)Tϕ(𝐱j). For the mapping ϕ(𝐱) above, show that K(𝐱i,𝐱j)=(𝐱iT𝐱j)2. This is the quadratic (polynomial degree 2) kernel.
  4. Intuition: Explain in simple terms how the kernel trick allows SVMs to find non-linear boundaries in the original space. Why is explicitly computing ϕ(𝐱) often computationally expensive or even intractable for very high-dimensional spaces?
  5. Verification: You can compute ϕ(𝐱A)Tϕ(𝐱B) directly and compare it to (𝐱AT𝐱B)2 for the points you picked in step 2 to confirm your derivation of the kernel function.