Introduction
- Super-Resolution is a task of estimating a high-resolution image from its low-resolution counterpart, which is a ill-posed problem. Many supervised SR algorithms recently proposed use the minimization of the mean squared error between the recovered HR image and the ground truth as their optimization target. Although it can maximizes the peak signal-to-noise ratio(PSNR), the results always lack of high texture detail, which means high PSNR does not necessarily reflect the perceptually better SR result.
- This paper proposes a new state of the art for image SR with high upscaling factors(4×). It proposes SRGAN optimized for a new perceptual loss.
The result of this work is as follows.
Method
Ultimate goal
The ultimate goal is to train a generating function G
that estimates for a given LR input image its corresponding HR counterpart. To achieve this, a generator network is trained as a feed-forward CNN GθG
parametrized by θG
.
Here θG
= {W1:L
; b1:L
} denotes the weights and biases of a L-layer deep network and is obtained by optimizing a SR-specific loss function lSR
. ILR
is the LR image and IHR
is the HR image.
Adversarial network architecture
The architecture network is clear and easy to understand.
Perceptual loss
The definition of the perceptual loss function lSR
is critical for the performance of the generator network. It contains two parts. One is called content loss and the other is called adversarial loss.
Content loss
MSE is the most widely used optimization target for image SR on which many state-of-the-art approaches.
But solutions of MSE optimization problems often lack high-frequency content. Here define the VGG loss as the euclidean distance between the feature representations of a reconstructed image GθG(ILR)
and the reference image IHR
.
φi,j
indicates the feature map obtained by the j-th convolution (after activation) before the i-th maxpooling layer within the VGG19 network. Wi,j
and Hi,j
describe the dimensions of the respective feature maps within the VGG network.
Adversarial loss
This encourages our network to favor solutions that reside on the manifold of natural images, by trying to fool the discriminator network. The generative loss is defined based on the probabilities of the discriminator DθD(GθG (ILR))
over all training samples as:
Experiment
Mean opinion score (MOS) testing
Mean opinion score (MOS) testing is a test for human to verify which image is more perceptual. The testers were asked to assign an integral score from 1 (bad quality) to 5 (excellent quality) to the super-resolved images. Different algorithms’ result is as follows.
The result form SRGAN is definitely best of all.
Different loss function
Different methods
Visual Results
Conclusion and Discussion
Contribution
- a new state of the art for image SR with high upscaling factors (4×)
- propose SRGAN optimized for a new perceptual loss.
Discussion
- computational efficiency
- deeper networks (B > 16) increase the performance of SRResNet
- SRGAN variants of deeper networks are increasingly difficult to train due to the appearance of high-frequency artifacts.
Reference
-
Previous
Non-local Neural Networks -
Next
Stack GAN++ Realistic Image Synthesis with Stacked Generative Adversarial Networks