Dissecting the Variational Autoencoder's ELBO

hard (>1 hr) VAE Generative Models ELBO KL Divergence

this year by E

Variational Autoencoders (VAEs) are powerful generative models that optimize a lower bound on the data log-likelihood, known as the Evidence Lower Bound (ELBO). The ELBO for a single data point $𝐱$ is given by:

L (𝐱) = E_{q_{ϕ} (𝐳 | 𝐱)} [\log p_{θ} (𝐱 | 𝐳)] - D_{K L} (q_{ϕ} (𝐳 | 𝐱) | | p (𝐳))

where $q_{ϕ} (𝐳 | 𝐱)$ is the approximate posterior (encoder), $p_{θ} (𝐱 | 𝐳)$ is the likelihood (decoder), and $p (𝐳)$ is the prior over the latent space (often a standard normal distribution).

Reconstruction Term: Identify and explain the first term, $E_{q_{ϕ} (𝐳 | 𝐱)} [\log p_{θ} (𝐱 | 𝐳)]$ . What is its intuitive role in the VAE? Why is it an expectation? How is it typically approximated in practice (e.g., using Monte Carlo sampling)?
KL Divergence Term: Identify and explain the second term, $- D_{K L} (q_{ϕ} (𝐳 | 𝐱) | | p (𝐳))$ . What does the KL divergence measure? What is the purpose of this term in the ELBO, and what prior $p (𝐳)$ is commonly chosen?
Trade-off: Explain the inherent trade-off between these two terms during VAE training. What happens if one term dominates the other?
Reparameterization Trick: Briefly describe why the reparameterization trick is necessary for training VAEs (specifically for backpropagating through the sampling step). You don't need to derive it, just explain its purpose.
Verification: Conceptual understanding is key here. Verify your understanding by explaining how changing the weight of the KL divergence term (a common practice) would affect the generated samples and the latent space structure.