Outline

Generative Model

Density estimation

Sample generation

Motivation

High-dimensional probability distribution

incorporated into reinforcement learning

predict missing data (semi-supervised learning)

multi-model output

realistic generation of samples

multi-model output

Figure 1: In this example, a model is trained to predict the next frame in a video sequence. Here the ‘multi-model’ means that the possible result is not a single answer. You can see the second image uses MSE to form an answer with an average over many slightly different images. However, with an additional GAN loss, the third image can understand that there are many possible outputs and each of them is sharp, recognizable and realistic.

FVBN: In parallel or not?

ICA: Generator functions has few restrictions?

Boltzmann machine: Markov chains needed?

VAE: Variational bound needed?

Quality of samples?

Taxonomy of Generative Models

Figure 2: An illustration about alternative methods of generative models

FVBN FVBN Figure 3: Fully Visible Belief Nets with great generation cost

ICA

ICA Figure 4: ICA puts too much restrictions to generator functions

VAE

VAE Figure 5: VAE is not asymptotically consistent unless q is perfect

Algorithm

The GAN framework

GAN framework Figure 6: Generater vs. discriminator

Cost function

The discriminator’s cost

$J^{(D)}((\theta)^{(D)},(\theta)^{(G)})=-\cfrac{1}{2}E_{x\sim p_{data}}logD(x)-\cfrac{1}{2}E_{z}logD(G(z))$

The generator’s cost

Minimax (JS divergence)

Maximum likelihood game (KL divergence)

Heuristic, non-saturating game

Minimax (JS divergence) Minimax Figure 7: Generator’s minimax cost: $J^{(G)}=-J^{(D)}$

Maximum likelihood game (KL Divergence)

$J^{(G)}=-\cfrac{1}{2}E_ze^{\sigma^{-1}(D(G(z)))}$

Heuristic, non-saturating game

$J^{(G)}=-\cfrac{1}{2}E_zlogD(G(z))$

A significant problem

Non-convergence

This problem usually happen when the ability of discriminator is great. (So the gradient of V is always vanishing) It is obvious that when D(G(z)) is near zero, the gradient is also near zero. convergence problem Figure 8: Convergence performance of different methods Figure 9: Convexity of different methods

Mode collapse mode collapse 1 Figure 10: Example of mode collapse, which can be avoided by unrolled GAN mode collapse 2 Figure 11: Different results with different orders of p and q in KL divergence

Experiment

Evaluation of generative models Figure 12: Evaluation of generative models. The two figures correspond with MLP and DCGAN, which shows that the cost is not pretty accurate according to the quality of pictures.

Dicrete outputs

Frontier

Semi-supervised learning

Using this code

Discussion

Tips and Tricks

Train with labels

Generating condition on labels: P(y|x)
Training with labels: P(x, y)

One-sided label smoothing

(to reduce vulnerability to adversarial examples)

Default discriminator cost

cross_entropy(1., discriminator(data))+cross_entropy(0.,discriminator(samples))

One-sided label smooth cost

cross_entropy(.9, discriminator(data))+cross_entropy(0.,discriminator(samples))

Do not smooth negative labels

$D^*(x)=\cfrac{(1-\alpha)p_{data}(x)+\beta p_{model}(x)}{p_{data}(x)+p_{model}(x)}$

According to the discriminator’s loss, the best discriminator should implement the function $D^*(x)$. So if we only decrease the coefficient of $p_{data}$, it’ll just have the confidence of D decline. However, if we get a positive $\beta$, it is likely that there exists a peak (local optimum value) which leads to a trend of generating wrong samples.

Virtual Batch Normalization

Virtual Batch Normalization Figure13: correlation of samples in a batch caused by batch normalization

Batch Norm (cause correlation of pictures in the single batch)

Reference Norm (Fix a reference batch R for normalization)

Virtual Batch Norm (avoid overfit in Reference Norm)

steps of VBN Figure14: Steps of Virtual Batch Normalization

Balance G and D?

D always performs better(in accordance with theory)

The network of D is bigger and deeper

Do not limit D when it is too accurate

Generative Adversarial Networks Tutorials

GAN

Outline

Motivation

Algorithm

The GAN framework