Introduction
- Present a general learning framework that combines a variational auto-encoder(VAE) with a generative adversarial network(GAN).
- Propose a new objective for the generator. Instead of using the same cross entropy loss as the discriminator network, the new objective requires the generator to generate data that minimize the
l2
distance of average feature to the real data.
The result of this work is as follows.
Related Work
Variational Auto-encoder (VAE)
Here E represents Encoder
and G represents Generator
. z
is a latent verctor from input x
. x'
is generated from z
.
Generative Adversarial Network (GAN)
GAN has a Discriminative
network represented by D
. The discriminator D
tries to distinguish real training data from synthesized data; and the generator G
tries to fool the discriminator.
CVAE && CGAN
VAEs and GANs can also be trained to conduct conditional generation, e.g., CVAE and CGAN. By introducing additional conditionality, they can handle probabilistic one-to-many mapping problems.
Performance
In the experiment for contrast, the results generated by CVAE are relatively blurry, but the whole structure is maintained, while results generated by CGAN miss the structure of faces.
How to improve these two models to generate better images? One idea is whether can we combine them to learn from the other’s strong points to offset one’s weakness.
CVAE-GAN
The formulation of CVAE-GAN
Where x
and x′
are input and generated image. E
, G
, C
, D
are encoder, generative, classification, and discriminative network, respectively. z
is the latent vector. y
is a binary output which represents real/synthesized image. c
is the condition, such as attribute or class label.
The naive combination of VAE and GAN is insufficient. Recent work shows that if the original KL
Divergence loss is adopted, training of GAN
will suffer from a gradient vanishing problem of the network G
.
So this work keep the training process of network E
, D
, and C
as the same as the original VAE
and GAN
, and propose a new mean feature matching objective for the generative network G
to improve the stability of the original GAN
.
Loss function
E
, D
, and C
are as the same as the original VAE
and GAN
. Therefore their loss function is as follows:
Mean feature matching based GAN
To improve the stability of the original GAN
, this work propose a new mean feature matching objective for the generative network G
which is represented
The objective requires the center of the features of the synthesized samples to match the center of the feature of the real samples.
Mean Feature Matching for Conditional Image Generation
For the conditional image generation, this work propose using the mean feature matching objective for generative network G.
Pairwise Feature Matching
The VAE model in this work can enforce the GAN to generate diverse samples since the encoder network E
can obtain a mapping from the real image x
to the latent space z
. Therefore, the model explicitly sets up the relationship between the latent space and real image space.
The loss of G
in this part is
Objective of CVAE-GAN
The goal of this approach is to minimize the following loss function:
In the experiments, λ1 = 3, λ2 = 1, λ3 = 1e10−3 and λ4 = 1e10−3.
Algorithm
The whole training procedure is clear and easy to understand.
Experiment
Visualization comparison with other models
The results generated by CVAE-GAN are difficult to distinguish from the real samples.
Quantitative comparison
The higher the score of Realisticity, the better.
Attributes morphing
Given two latent vectors, z1
and z2
, the generated images can change attribute when use the new z
.
Image inpainting
CVAE-GAN for data augmentation
Two data augmentation strategies: generate more images for existing identities in the training datasets; generating new identities by mixing of different identities.
Conclusion and Discussion
-
Present a general learning framework that combines a variational auto-encoder with a generative adversarial network.
-
Propose a mean discrepancy objective for the generative network to make the training of the
GAN
more stable. -
Now the model can just generate images from an known category.
Reference
-
Previous
Deep Level Sets for Salient Object Detection -
Next
Generative Modeling of Audible Shapes for Object Perception