The Implicit Higher Dimension of Kernels

medium (<30 mins) SVM Kernel Methods Feature Engineering

this year by E

Support Vector Machines (SVMs) are powerful, and the "kernel trick" allows them to find non-linear decision boundaries without explicitly mapping data to high-dimensional spaces.

Linear Separability: Consider a 2D dataset where points are not linearly separable in their original feature space. Sketch an example of such a dataset (e.g., concentric circles or an XOR-like pattern).
Feature Mapping: Imagine a simple mapping function $ϕ (𝐱) = [x_{1}^{2}, \sqrt{2} x_{1} x_{2}, x_{2}^{2}]$ that transforms a 2D input $𝐱 = [x_{1}, x_{2}]$ into a 3D feature space. Apply this mapping to two sample points, say $𝐱_{A} = [1, 0]$ and $𝐱_{B} = [0, 1]$ .
The Kernel Trick: The "kernel trick" avoids explicit mapping by directly computing the dot product in the higher-dimensional space using a kernel function $K (𝐱_{i}, 𝐱_{j}) = ϕ (𝐱_{i})^{T} ϕ (𝐱_{j})$ . For the mapping $ϕ (𝐱)$ above, show that $K (𝐱_{i}, 𝐱_{j}) = (𝐱_{i}^{T} 𝐱_{j})^{2}$ . This is the quadratic (polynomial degree 2) kernel.
Intuition: Explain in simple terms how the kernel trick allows SVMs to find non-linear boundaries in the original space. Why is explicitly computing $ϕ (𝐱)$ often computationally expensive or even intractable for very high-dimensional spaces?
Verification: You can compute $ϕ (𝐱_{A})^{T} ϕ (𝐱_{B})$ directly and compare it to $(𝐱_{A}^{T} 𝐱_{B})^{2}$ for the points you picked in step 2 to confirm your derivation of the kernel function.