Introduction

Super-Resolution is a task of estimating a high-resolution image from its low-resolution counterpart, which is a ill-posed problem. Many supervised SR algorithms recently proposed use the minimization of the mean squared error between the recovered HR image and the ground truth as their optimization target. Although it can maximizes the peak signal-to-noise ratio(PSNR), the results always lack of high texture detail, which means high PSNR does not necessarily reflect the perceptually better SR result.
This paper proposes a new state of the art for image SR with high upscaling factors(4×). It proposes SRGAN optimized for a new perceptual loss.

The result of this work is as follows.

Method

Ultimate goal

The ultimate goal is to train a generating function G that estimates for a given LR input image its corresponding HR counterpart. To achieve this, a generator network is trained as a feed-forward CNN GθG parametrized by θG.

Here θG = {W1:L ; b1:L} denotes the weights and biases of a L-layer deep network and is obtained by optimizing a SR-specific loss function lSR. ILR is the LR image and IHR is the HR image.

Adversarial network architecture

The architecture network is clear and easy to understand.

Perceptual loss

The definition of the perceptual loss function lSR is critical for the performance of the generator network. It contains two parts. One is called content loss and the other is called adversarial loss.

Content loss

MSE is the most widely used optimization target for image SR on which many state-of-the-art approaches.

But solutions of MSE optimization problems often lack high-frequency content. Here define the VGG loss as the euclidean distance between the feature representations of a reconstructed image GθG(ILR) and the reference image IHR.

φi,j indicates the feature map obtained by the j-th convolution (after activation) before the i-th maxpooling layer within the VGG19 network. Wi,j and Hi,j describe the dimensions of the respective feature maps within the VGG network.

Adversarial loss

This encourages our network to favor solutions that reside on the manifold of natural images, by trying to fool the discriminator network. The generative loss is defined based on the probabilities of the discriminator DθD(GθG (ILR)) over all training samples as:

Experiment

Mean opinion score (MOS) testing

Mean opinion score (MOS) testing is a test for human to verify which image is more perceptual. The testers were asked to assign an integral score from 1 (bad quality) to 5 (excellent quality) to the super-resolved images. Different algorithms’ result is as follows.

The result form SRGAN is definitely best of all.

Different loss function

Different methods

Visual Results

Conclusion and Discussion

Contribution

a new state of the art for image SR with high upscaling factors (4×)
propose SRGAN optimized for a new perceptual loss.

Discussion

computational efficiency
deeper networks (B > 16) increase the performance of SRResNet
SRGAN variants of deeper networks are increasingly difficult to train due to the appearance of high-frequency artifacts.

Reference

Ledig C, Theis L, Huszar F, et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network[J]. 2016.

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Super-Resolution