VAE Tutorial

VAE

Posted by Tab on October 8, 2017

Formation

Maximum Likelihood — Find $\theta$ to maximize $P(X)$, where $X$ is the data. $z \sim P(z)$, which we can sample from

Naive solution

Approximate with samples of z

Problems

  • Need a lot of samples of z and most of the P(X|z) ≈ 0
  • Not practical computationally

Key question 1

Is it possible to know which $z$ will generate $P(X|z) » 0$? Solution: Learn a distribution $Q(z)$, where $z \sim Q(z)$ generates $P(X|z) » 0$

Detail

Assume we can learn a distribution $Q(z)$, where $z \sim Q(z)$ generates $P(X|z) » 0$ We want $P(X) = E_{z \sim P(z)}P(X|z)$, but not practical. We can compute $E_{z \sim Q(z)}P(X|z)$, more practical. How does $E_{z \sim Q(z)}P(X|z)$ and $P(X)$ relate? Using KL Divergence finally we will get

  • KL divergence is always > 0.
  • $\text{log} P(X) > \text{log} P(X) - D[Q(z) || P(z|X)] $
  • Maximize the lower bound instead

Key question 2

How do we get $Q(z)$ ?

  1. Using $Q(z|X)$ instead
  2. Model $Q(z|X)$ with a neural network
  3. Assume $Q(z|X)$ to be Gaussian, $N(\mu, c⋅I)$
    • Neural network outputs the mean $\mu$, and diagonal covariance matrix $c⋅I$
    • Input: Image, Output: Distribution

Loss

Convert the lower bound to a loss function

  1. Model $P(X|z)$ with a neural network, let $f(z)$ be the network output.
  2. Assume $P(X|z)$ to be i.i.d. Gaussian
    • $X = f(z) + \epsilon$ , where $\epsilon \sim N(0,I)$
    • Simplifies to an L2 loss: $||X-f(z)||^2 $
  3. Assume $P(z) \sim N(0,I)$ then $D[Q(z|X) || P(z)]$ has a closed form solution

Framework

Reparametrization Trick : enable bp to encoder

$z \sim N(\mu, \sigma)$ is equivalent to $\mu+\sigma⋅\epsilon$, where $\epsilon \sim N(0, 1)$

Training

Repeat till convergence: $X^M$ <– Random minibatch of M examples from X $\epsilon$ <– Sample M noise vectors from $N(0, I)$ Compute Loss L (i.e. run a forward pass in the neural network) Gradient descent on L to updated Encoder and Decoder.

Testing

Sample $z \sim N(0,I)$ and pass it through the Decoder