DCSR Dilated Convolutions for Single Image Super-Resolution

Page 1

DCSR Dilated Convolutions for Single Image Super-Resolution

Abstract: Dilated convolutions support expanding receptive field without parameter exploration or resolution loss, which turn out to be suitable for pixel-level prediction problems. In this paper, we propose multiscale single image superresolution based on dilated convolutions (DCSR). We adopt dilated convolutions to expand the receptive field size without incurring additional computational complexity. We mix standard convolutions and dilated convolutions in each layer, called mixed convolutions, i.e. in the mixed convolutional layer, the feature extracted by dilated convolutions and standard convolutions are concatenated. We theoretically analyze the receptive field and intensity of mixed convolutions to discover their role in SR. Mixed convolutions remove blind spots and capture the correlation between low resolution(LR) and high-resolution (HR) image pairs successfully, thus achieving good generalization ability. We verify those properties of mixed convolutions by training 5-layer and 10-layer networks. We also train a 20-layer deep network to compare the performance of the proposed method with those of the state-of the- art ones. Moreover, we jointly learn maps with different scales from a low-resolution image to its high-resolution one in a single network.


Experimental results demonstrate that the propose method outperforms the stateof-the-art ones in terms of PSNR and SSIM, especially for a large scale factor. Existing system: We provide a network, named DCSR, whose convolutional layer is made up of standard and dilated convolutions to capture larger scale contextual information in an image . We theoretically analyze the receptive field and the receptive intensity of mixed convolutional layers. We analyze the correlation between LR-HR pairs, and verify that mixed convolutional layers outperform standard convolutions or dilated convolutions alone. This is because mixed convolutions remove blind spots and capture the correlation of LR-HR image pairs more accurately. The rest of this paper is organized as follows. Section II briefly reviews the related work on deep learning-based SR and introduces dilated convolutions. The proposed method is described in Section III, while the analysis of mixed convolutions on SR is shown in Section IV. We present experimental results and their corresponding analysis in Section V. Finally, we draw conclusions of this paper in Section VI. Proposed system: To accelerate SRCNN, they proposed fast SRCNN(FSRCNN) [6] which had more convolutional layers, smaller convolutional filters, and a de convolutional layer at the top of the network to do up-sampling. FSRCNN used an LR image without interpolation or up-sampling as the input for the SR network, thus remarkably reducing the amount of network parameters. Shi et al. proposed a novel network with convolution and sub-pixel convolutional layers that achieved real-time SR in 1080p videos. Advantages: They used much more convolutional layers than VDSR, i.e. more than 32 layers in SRResNet and 64 layers in EDSR. EDSR utilized some additional strategies for further improvement such as removal of batch normalization layer, residual scaling and geometric self-ensemble. In this work, we focus on the convolution itself and introduce dilated convolutions into SR task. proposed by Yu and Koltun , are a convolution operator that uses the same filter at different ranges using different dilation factors. Thus, dilated convolutions expand


the receptive field more effectively. Yu and Koltun showed that dilated convolutions were helpful for pixel-level segmentation tasks. Although dilated convolutions support expanding receptive fields without parameter exploration or resolution reduction, blind spots would appear in the receptive field if we directly use dilated convolutions in SR. To address this problem, we propose a mixed convolutional layer which combines both dilated convolutions and standard convolutions in the same layer. Disadvantages: In this section, we formulate SR problem and introduce a mixed convolutional layer to solve it. We also describe the overall network architecture and learning strategy of DCSR in detail. To address this problem, we propose a mixed convolutional layer which combines both dilated convolutions and standard convolutions in the same layer. Mixed convolutions successfully capture the correlation between LR-HR pairs more accurately. We prove it in the next section. The mixed convolutional layer. Deep neural networks have enough capacity to handle multiscale SR problem. Compared with multiple single-scale models, the total number of parameters in multiscale training is small. All layers except the last one have the same number of convolutional filters. All layers except the last one have the same proportion of dilated convolutions (The last layer has one filter and we simply set it to standard convolution in all experiments); The dilation factor of each dilated convolution. Such setting prevents blind spots within the receptive field. Modules: Correlation Analysis between LR and HR pairs : Locality is a strong property of natural images. One of the reasons why CNN can successfully model natural images is that locality is coded in the intrinsic structure of CNN. Locality means that a pixel has high correlation with its neighbors while having low correlation with or is nearly independent of pixels far from it. Li et al. analyzed the correlation of neighbors in natural images, and drew a conclusion that


if CNN reconstructs this correlation more accurately, it has better generalization ability because of its gaussian complexity . For image SR, it can be expected that a pixel in HR images has high correlation with its corresponding neighbors in LR images, but has low correlation with pixels in LR images far from it. Instead of analyzing correlation for natural images, we analyze correlation between LR and HR pairs in a similar way to . Denote Li and Hi as ith LR patch and ith HR patch, respectively. Denote k 2 Z2 as a two dimensional shift vector. That is, Hi(k) means shifting Hi by k.

Multiscale Training : It is hard to handle different scales in a single model for many SR methods such as prior information-based methods. However, for CNN-based SR methods, it can be achieved in a straightforward way: Training a single network using image patches with different downsampling scales simultaneously. Formally, we obtain LR samples ^x from with scales s 2 f2; 3; 4g. Our motivations of multiscale training are threefold: Parameters learned from different scales can be shared in some degree because parameters from a scale are helpful for reconstructing SR images of its similar scales; Deep neural networks have enough capacity to handle multiscale SR problem; Compared with multiple single-scale models, the total number of parameters in multiscale training is small. Kim et al. trained their networks with all possible combinations of three different downsampling scales: 2, 3, and 4. they tested the PSNR performance on ’Set5’ and found that training multiple scales boosted the overall performance. However, their final models were trained with a single scale. In this work, we simultaneously train DCSR using image patches with scales 2, 3, and 4. Then, we compare its performance with the benchmark based on the learned single model. Network Architecture The proposed network architecture is based on a fully convolutional network, and the standard convolution layers are replaced by the mixed convolution layers. In the network, the number of channels of each layer except the first layer is set to be 64, while the kernel size is set to be . Moreover, we introduce the technique of


residual block into the architecture to achieve fast convergence. Residual block in deep neural networks was proposed by He et al. For image recognition. It has been reported that the residual block is helpful for training very deep neural networks. Each residual block in this paper is constructed by 3 mixed convolutional layers, which is called mixed residual block (MR-block). Specifically, the input of MRblock is applied to 3 mixed convolutional layers. Then, MR-block produces the sum of the input and features obtained by the last layer as illustrated . We denote M as the function of a mixed layer and obtain the function.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.