Proc. of Int. Conf. on Recent Trends in Information, Telecommunication and Computing, ITC
Motion Estimation Algorithms in Video SuperResolution Anand Deshpande1, Prashant P. Patavardhan2 and D. H. Rao3 1
Research scholar, Dept. of E&C Engg., Gogte Institute of Technology, Belgaum/ Associate Professor, Dept. of E & C, Angadi Institute of Technology and Management, Belgaum. India E-mail:deshpande.anandb@gmail.com 2 Professor, Dept. of E&C Engg., Gogte Institute of Technology, Belgaum. India E-mail:prashantgemini73@gmail.com, 3 Dean Faculty of Engineering/Professor of Dept. of PG studies, Visvesvaraya Technological University, Belgaum. India E-mail:dr.raodh@gmail.com
Abstract— Super-resolution (SR) is the process of obtaining a high resolution (HR) image or a sequence of HR images from a set of low resolution (LR) observations. The block matching algorithms used for motion estimation to obtain motion vectors between the frames in Super-resolution. The implementation and comparison of two different types of block matching algorithms viz. Exhaustive Search (ES) and Spiral Search (SS) are discussed. Advantages of each algorithm are given in terms of motion estimation computational complexity and Peak Signal to Noise Ratio (PSNR). The Spiral Search algorithm achieves PSNR close to that of Exhaustive Search at less computation time than that of Exhaustive Search. The algorithms that are evaluated in this paper are widely used in video super-resolution and also have been used in implementing various video standards like H.263, MPEG4, H.264. Index Terms— Super-resolution, motion estimation, block matching, exhaustive search, spiral search, PSNR.
I. INTRODUCTION The optical sensors created a new era of imaging wherein optical images could be efficiently captured by sensors and stored as digital information. The resolution of the captured image depended on the size and number of these sensors. Increasing resolution by improving sensor resolution is not always a feasible approach to improving resolution. For example, to increase spatial resolution [1], reduce the pixel size by sensor manufacturing techniques. As the pixel size decreases, however, the amount of light available also decreases. It generates shot noise that degrades the image quality seriously. Another approach for enhancing the spatial resolution is to increase the chip size, which leads to an increase in capacitance. This approach is not considered too much effective because large capacitance [2] makes it difficult to speed up a charge transfer rate. To address this issue, the image processing community is developing a collection of algorithms known as super-resolution for generating high-resolution (HR) imagery from systems having lowerresolution (HR) imaging sensors. These algorithms combine a collection of low-resolution images containing aliasing artifacts and restore a high-resolution image. It is possible to reconstruct the original image, by choosing a magnification factor, L, for the desired HR image, where L = HR image resolution / LR image resolution. The value of the magnification factor will depend on the number of non-redundant LR images that DOI: 02.ITC.2014.5.557 © Association of Computer Electronics and Electrical Engineers, 2014
are available. The observation model that relates the original and HR image to the observed LR image is as shown in Fig.1.
Figure 1.- Super-Resolution observation model
Here, X denotes the continuous scene, and Xs be the desired HR image sampled above the Nyquist rate from the band-limited continuous scene. The output Yk is the kth observed LR image from the image sensor. The representation of observation model is: =
+
= 1, 2, 3, …
(1) th
where, D is a down-sampling operator, Bk contains the blur for the k LR image, Mk contains the motion information that transforms the kth LR image onto the HR image grid, and Nk is the noise in the kth LR image Motion estimation plays major role in super-resolution. It estimates the relative shift between LR images compared to the reference LR image. The motion estimation algorithms use block matching method as it provides flexible trade-off between complexity and motion vector quality [3]. The subjective quality of the HR image suffers as a result of artifacts, which are generated during the fusion process as a result of erroneous motion vectors (MVs). Accurate motion estimation plays major role in the SR problem, and with erroneous MVs, SR may give worse results [4][5][6]. Therefore, although it is necessary to provide accurate motion vectors in order to increase the spatial resolution, it is even more critical to be able to detect invalid motion vectors in order to prevent artifacts in the HR image. Most of the motion estimation algorithms [4], are too complex to be used in practical applications. Many applications require a real-time approach, whereas most algorithms take from several minutes to several hours to estimate the motion between two images. A low-complexity approach requires fixing this problem. Low-complexity approaches require a block-based motion estimation algorithm and low-complexity priors. It is necessary that such an algorithm converge in a small number of iterations. While a block-based motion estimation algorithm reduces the computational complexity, it is also presents its own set of challenges. This paper gives discussion on exhaustive search and spiral search motion estimation algorithms along with simulation results. Section II explains block matching in general. Section III explains and compares ES and SS and presents some simulation results and discussion. Section IV gives concluding remarks, followed by references. II. BLOCK MATCHING MOTION ESTIMATION Block-matching algorithms represent a very popular approach for estimating the motion between frames in an image sequence. Block matching requires the use of the translation-motion model and brightness constancy assumption to estimate the motion of blocks between image pairs. The actual motion can only be approximated as a translation for small displacements, and the brightness-constancy assumption does not hold for illumination changes due to non-uniform lighting, shadows, etc. Block matching is also sensitive to block size. Large blocks are needed to avoid local minima; however, large blocks produce poor matches compared to small blocks. Even with the limitations of the translation-motion model and brightnessconstancy assumptions, block-matching algorithms perform well in terms matching the block. Block matching algorithms make use of the brightness constancy assumption, which assumes that image pixels retain their luminance values over a spatio-temporal displacement path, i.e. ( ; ; ) = ( +∆ ; +∆ ; +∆ )
(2)
where I(x; y; t) is a continuous representation of the pixel luminance; x and y represent the spatial shift; and t represents the temporal shift. The brightness constancy assumption is violated when the illumination of the scene changes between successive images; however, it is generally valid for small spatio-temporal displacements [7]. To make use of the brightness constancy assumption, block-matching algorithms divide the image into square regions generally referred to as blocks. To reduce complexity, the image is usually divided into blocks of fixed size or variable size [8][9][10].
481
With the image divided into Macro-Blocks (MB) and blocks of predetermined size, the task of the block matching algorithm is to locate the block in the adjacent image that best matches the block in the reference image to create a vector that represent the movement of a block from one location to another. The adjacent image may fall before (backward block matching) or after (forward block matching) the reference image. The search area for a good macro block match is constrained up to p pixels on all fours sides of the corresponding macro block in previous frame. This p is called as the search parameter. Larger motions require a larger p and the larger the search parameter the more computationally expensive the process of motion estimation becomes. Usually the macro block is taken as a square of side 16 pixels, and the search parameter p is 7 pixels. Correlation-based approaches are used to find the best match [11] called as “cost functions”. There are various cost functions, of which the most popular and less computationally expensive is Sum of Absolute Difference (SAD) or Sum of Absolute Error (SAE) given by equation (3). Another cost function is Mean Squared Error (MSE) given by equation (4). Sum of Absolute Error:
=∑
∑
|
−
|
(3)
−
(4)
Mean Squared Error: =
∑
∑
where M x N is the size of the macro block, and Cij and Rij are the pixels being compared in current macro block and reference macro block, respectively. The block that minimizes the SAE will become the Motion vector for the block at position. To maximize the probability of choosing the correct MV with the SAE metric, it is required to consider the following: 1. Choice of p for the search parameter range. 2. Block size. 3. Initializing the search. To evaluate and compare the systems it is required to measure the quality of the video images displayed to the viewer. Visual quality measurement is a difficult and imprecise task because there are so many factors that can affect the results. Visual quality measurement is subjective and is influenced by many factors. The complexity and cost of subjective quality measurement make it attractive to be able to measure quality automatically using an algorithm. The objective (algorithmic) quality measures give the quantitative values. The most widely used measure is Peak Signal to Noise Ratio (PSNR). = 10
(
)
(5)
where (2n-1)2 is the square of the highest-possible signal value in the image, and n is the number of bits per image sample. Above discussed block matching motion estimation leads to develop algorithms to provide good PSNR. Two motion estimation algorithms have been implemented and discussed for super-resolution, in the next section. They are: Exhaustive Search and Spiral Search. III. PROPOSED SYSTEM The proposed system contains following blocks. A) Sampling: The continuous input frame is sampled above the Nyquist rate. B) Motion estimation: This block contains two search algorithms. They are Exhaustive search and Spiral search algorithms. A. Exhaustive Search This algorithm, also known as Full Search [12], is the most computationally expensive block-matching algorithm of all. This algorithm calculates the cost function at each possible location in the search window. As a result of which it finds the best possible match and gives the highest PSNR amongst other blockmatching algorithms. Fast block matching algorithms try to achieve the same PSNR doing as little computation as possible. Full search [12] motion estimation involves evaluating equation (3) (SAD) at each point in the search window. The first search location is at the top-left of the window and the search proceeds 482
in raster order until all positions have been evaluated. The full search estimation is guaranteed to find the minimum SAD in the search window but it is computationally intensive since the error measure must be calculated at every one of (2S +1)2 locations, where S is position. B. Spiral Search Most image sequences have smooth motion and high spatial correlation (a measure of the tendency for pixels that are near to each other to have more similar values of their statistics). It is quite likely that the motion vector of a block is close to the motion vectors of its neighbors. Hence, the search window center can be predicted using the motion vectors of the predictor blocks. The motion vectors of three neighboring macroblocks (one to the left, one above and one above right) are used as predictors for the motion vector of the current macro-block. The prediction [11] is formed by taking the median of three motion vectors. The prediction error between the actual motion vector and the predicted value in the horizontal direction and the vertical direction is coded as shown in Fig. 2.
MV2 MV1
MV3
MV
MV: Current motion vector, MV1, MV2, MV3: predictors Prediction = median(MV1,MV2,MV3) Figure 2. Prediction of motion vectors
Special cases are needed to take care of MBs for which the predictors lie outside the picture boundary or group of boundary (GOB). These special cases are shown in Fig. 3.
MV2 (0, 0)
MV1
MV1
MV
MV2
MV1
MV3
MV1
MV
(0, 0)
MV
Picture boundary or GOB boundary Figure 3. Special cases of motion vector prediction
Whenever one of the prediction MBs lies outside the picture boundary, it is replaced by (0, 0), however, when two MBs lie outside, they are replaced by the motion vector of the third MB. This is done to avoid having two of these motion vectors replaced by zeros, in which case the final value got after the median operation will be (0, 0). The spiral search algorithm [13] uses the motion vectors of the predictor blocks to get a predicted search window center. It then uses the SAD values of these predictor blocks to achieve a variable window size. The SAD is computed starting at the center of the search window, and moving outward spirally. This process is stopped once the SAD falls under a threshold value. This threshold is clearly the parameter that controls the size of the window and it is obtained from the SAD values of the predictor blocks. The threshold is same as the median of the SAD values of the predictor blocks. A median operation helps to suppress the effect of the SAD value of any uncorrelated block in the neighborhood. C. Blur Blur is a natural property of all image acquisition devices caused by the imperfections of their optical systems. It can also be caused by factors like, motion blur or atmospheric blur. Lens blur can be modeled by 483
convolving the image with a mask corresponding to the optical system's Point Spread Function. Gaussian blur model is used in this work. The image is convolved with a two-dimensional Gaussian of size G*G and standard deviation. Since blurring takes place on the image vector, convolution is replaced by matrix multiplication. IV. RESULT Exhaustive Search and Spiral Search algorithms have been implemented to achieve good performance for different test videos. The performance is evaluated on two counts: PSNR and computational time. The ES and SS algorithms are implemented on Intel CORE i3 machine, in C language using Visual studio. The motion estimation algorithms have been tested on three Quarter Common Intermediate Frame (QCIF)resolution test video sequences as shown in Table 1. TABLE NO. I. VIDEO SEQUENCES USED FOR PERFORMANCE ANALYSIS: ( A) C ARPHONE (B) FOREMAN (C) C LAIRE S.N0.
(a)
(b)
(c)
Filename
Video
Carphone
Foreman
Claire
No. of Frames
Details
380
Moderate motion in background, and no motion in camera
400
Motion in background
490
No motion in background as well as camera
camera
and
The motion vectors and reference frame are sent to motion compensation block where the new frame is generated using MV and reference frame. This frame is compared with the current frame to get the PSNR. The Objective performance of ES and SS algorithms is as shown in Table 2. TABLE NO. II. PERFORMANCE ANALYSIS OF EXHAUSTIVE SEARCH AND SPIRAL SEARCH ALGORITHMS S.No.
Filename
(a) (b) (c)
Carphone Foreman Claire
PSNR (dB) 43.9599 43.1578 45.8718
Exhaustive Search CPU Time(seconds) 55 67 62
PSNR (dB) 43.9485 43.1523 45.8656
Spiral Search CPU Time(seconds) 48 60 43
From the table it can be seen that full search algorithm takes more execution time than spiral search and achieves better PSNR (in dB) that of spiral search. PSNR of Claire video sequence is more than that of other video sequences due to less motion in video. The PSNR of the ES and SS versus the frame number are plotted as shown in Fig. 4. It can be seen that Claire video shows better PSNR as compared to the Carphone and Foreman videos for both the search algorithms. The spiral search algorithm gives better PSNR and execution time. The superresolution model using spiral search algorithm in motion estimation gives better performance compared to the exhaustive search algorithm. 484
car_ful car_Sp
Carphone 46
SNR
45 44 43 42 41 1
28 55 82 109 136 163 190 217 244 271 298 325 352 379 Number Of Frames
Full Search
Foreman
Spiral Search 45 44 SNR
43 42 41 40 1 28 55 82 109 136 163 190 217 244 271 298 325 352 379 Number Of Frames
Claire
Full Search Spiral Search
47
SNR
46 45 44 43 42 1
39 77 115 153 191 229 267 305 343 381 419 457 495 Number Of Frames
Figure 4. PSNR Comparison of search algorithm for input video (a) Carphone (b) Foreman (c) Claire
V. CONCLUSION The motion estimation algorithms, Exhaustive Search and Spiral Search, for video super-resolution are implemented and tested for different video test sequences. The obtained performance of algorithms is based on PSNR and execution time. From the test results it can be seen that spiral search takes less execution time than full search and achieves average PSNR very close to that of full search. These algorithms provide better PSNR for the video with no background motion than the video with no camera motion. It can be concluded that spiral search algorithm provides premier design for motion estimation in super-resolution. The spiral search algorithm provides a hitherto unavailable set of cost/performance points that will have a powerful impact on super-resolution.
485
REFERENCES [1] Subhasis Chaudhuri, “Super Resolution Imaging,” Kluwer Academic Publishers, pp.1-44, 2002. [2] Sung Cheol Park, Min KyuPark,and Moon Gi Kang, “Super-Resolution Image Reconstruction: A Technical Overview,” IEEE Signal Processing Magazine May 2003. [3] Michael Santoro “Valid Motion estimation for super-resolution image reconstruction,” Ph.D. dissertation, School of Electrical and Computer Engineering, Georgia Institute of Technology, USA, 2012 [4] G. Callico, S. Lopez, O. Sosa, J. Lopez, and R. Sarmiento, “Analysis of fast block matching motion estimation algorithms for video super-resolution systems,” IEEE Transactions on Consumer Electronics, vol. 54, pp. 1430– 1438, Aug. 2008. [5] S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution image reconstruction: a technical overview,” IEEE Signal Processing Magazine, vol. 20, pp. 21–36, May 2003. [6] P. Hill, T. Chiew, D. Bull, and C. Canagarajah, “Interpolation free subpixel accuracy motion estimation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, pp. 1519–1526, Dec. 2006. [7] M. Chan, Y. Yu, and A. Constantinides, “Variable size block matching motion compensation with applications to video coding,” IEEE Proceedings on Communications, Speech and Vision, vol. 137, pp. 205–212, Aug. 1990. [8] CCITT, “Codec for audiovisual services at n x 384 kbits/s,” Fascicle III.5, Rec. H.261, 1988. [9] Z. Ahmed, A. Hussain and D. Al-Jumeily, “Fast Computations of Full Search Block Matching Motion Estimation (FCFS),” proceedings of PGNeT Conference, 2011. [10] M. Ahmadi and M. Azadfar, “Implementation of fast motion estimation algorithms & comparison with full search method in H.264,” IJCSNS International Journal of Computer Science & Network Security, vol. 8, no. 3, pp. 139143, 2008. [11] Tsuhan Chen, Deepak Turaga and Mohamed Alkanhal, Correlation Based Search Algorithms for Motion Estimation, Picture Coding Symposium, Portland, April 1999. [12] Aroh Barjatya, “Block Matching Algorithms for Motion Estimation,” Technical report, Dept. of ECE, Utah State University, April 2004.
486