A Recurrent Neural Network For Image Generation

DRAW

Posted by Tab on January 6, 2018

Outline

1.png 2.png 3.png

  • As the figure shows, the way of generating images is an iterative method. It is a much reasonable approach particularly when we need to generate a large image. Besides, one of its motivation is that people tend to draw a picture gradually instead of in the one-shot way.

Related Work

4.png

  • As for the related work, some generative models, especially about VAE, share the same idea of image generation with the paper. However, this work implement the iteratively construction instead of the common one-shot approach.
  • In addition, there also have been some sequential attention models, like the way using policy gradients. Compared with these work, this paper uses a Gaussian kernel to mimic the attention. Maybe the motivation comes from the usual heatmap representation of human keypoints.

Architecture

5.png

  • As the architecture shows, there are mainly 3 differences. First, the encoder and reader will takes some information from decoder RNN, which means the iterative process can effectively generate new data. Second, the writer is an iterative process instead of just a single process. Third, the dynamically updated attentive model can restrict reading and writing process.
  • Note that the mean and variance are shown in the picture. In addition, some important variables and equations are shown in the next figure.

6.png 7.png

  • The loss function is also easy to understand. One is the general conditional probability and another is the KL divergence. The first one can be intepreted as the information taken by decoder, while the second one is the information loss given by the information source (the raw data’s prior distribution).

8.png

  • At last, here is the way to test our model. We just need to take samples from the latent distribution, then the iteratively construction can provide us with an excellent image.

Algorithm

But without read and write operations, the former architecture can not be implemented successfully. Now let’s define the operations both in no-attentive and attentive situation.

9.png 10.png 11.png 12.png 13.png

  • Like the figure shows, the parameters of Gaussian filters are decided by hidden states in t-1 or t. We use the strides to control the receptive fields of the filter, and set different variances to influence the smoothness effect. Most importantly, the location of the filter will decide the location of our attention.

Experiments

14.png 15.png 16.png 17.png 18.png 19.png


Discussion

20.png

  • Considering that Gaussian filter is always used to smooth the image, it’s likely to make the final image unclear. So maybe a better way to mimic the attention needs a better filter, which can dynamically imitate the attention as well as keep edges well.

Reference

  1. DRAW: A Recurrent Neural Network For Image Generation
  2. Auto-Encoding Variational Bayes