DIGITAL IMAGE PROCESSING PROJECT
COMPLEXITY-RATE-DISTORTION TRADE-OFFS IN VIDEO COMMUNICATION Alessandro Baffa (682075), Paolo Oranges (705875) ({alessandro.baffa, paolo.oranges}@mail.polimi.it)
I. INTRODUCTION The new video coding standard H.264/AVC outperforms previous standards reducing the bit budget by 50% for the same target quality. This comes with a price to pay: the computational complexity of the H.264/AVC encoder is typically much higher than encoders of previous standards (the complexity of the decoder is comparable to previous standards though). With the proliferation of hand-held portable devices, the conventional broadcast model (encode once, decode many times) is being challenged, since video communication is also up-link. These new devices need to limit the computational complexity of the encoder, in order to save battery life. This can be achieved at the cost of a reduced rate-distortion performance. The goal of this paper is to study the complexity-ratedistortion trade-offs in the H.264/AVC codec. In the next section we consider the search algorithms used in H.264/AVC to estimate the motion vectors. After a general overview, we explain three of them we have used for the tests: Full search, Fast-Full search, UMHexagonS. In the third we consider a general overview of the ratedistortion models in the literature, explaining the general model and analyzing two of them in details: a linear model and a quadratic one. In the fourth and fifth sections we show our test environment, including the explanation of our test operations, and the results obtained. In the sixth section we explain the conclusions about these results. In the seventh section we explain the future works. II. SEARCH CRITERIA IN H.264 For our purpose we have used different search criteria for the block matching in order to evaluate the differences in performance among them: Full Search, Fast Full Search and UMHexagonS. A. FULL SEARCH Exaustive search withing the search range ÂąW, i.e. the set of all candidate predictors that contain all the motion vector such that:
A total number of candidate motion vectors need to be tested. For each block, Full search explores an error surface where each point corresponds to a candidate motion vector and the height of the surface to the matching error. Full search finds the global optimum of the error surface to the matching error by testing all the candidate motion vectors. Therefore it achieves the best performances, but it is computationally expensive. B. FAST FULL SEARCH Reduce the computational complexity with respect to the Full search and finds the global minimum of the error surface. Blocks are visited in a smart way in order to avoid local minima. Successive Elimination Algorithm (SEA) is used for the Fast Full Search. This method allows to calculate the SAD only when the following inequality is satisfied
where
R represents the sum norm of the current macroblock and represents the sum norm of any matching candidate macroblock with motion vector . Therefore, the SEA is the method to reduce the number of required computations for SAD with pre-calculated R and M(x, y), and to realize Fast Full search. The following are the steps of this method: 1. We calculate R and all M(x, y) in search window. 2. We select a initial motion vector (m, n) and calculate 3. Using the result of 1. and 2., we find out a motion vector (x, y) which is satisfied with the inequality. 4. We calculate , then compare with .
Complexity rate-distortion trade offs in video communication
5.
6.
When
is equal or less than , we replace to . We continue above steps until we finish searching all M(x, y) in search window.
The last steps is a small diamond search, used to make the search more precise. We have to remark that for small prediction mode, that depends on the used block size, steps 2 and 3 are skipped and the algorithm go directly to step 4.
C. UMHEXAGONS UMHexagonS stays for Unsymmetrical-cross MultiHexagon-grid Search algorithm.It aims to reduces the computational load by reducing the number of candidate blocks within a search window. The comparisons are done in 5 steps: 1. Initial search point prediction; 2. unsymmetrical cross search; 3. uneven multihexagon grid search; 4. extended hexagon based search; 5. small diamond search.
Figure 2:UMHexagonS Steps Figure 1:Reference block location for initial motion vector
UMHexagonS begins with the initial search point prediction. As depicted in Figure 1 if we considering E as the current block, the predicted motion vector is the median of the left, top and top-right (or top-left) motion vector. If one of the these blocks lies outside the GOB (Group of Blocks), it is replaced by a specific block or value depending on specific rules. Second steps is unsymmetrical cross search. If we use a Search Range W, is search the motion vector on a cross with an horizontal search range of W and a vertical search range of W/2. Third step consist of an uneven multihexagon grid search. It is divided in two sub-steps: first of all a Full search with a fixed search range of 2 is done (depicted in figure 2 by the small red circles); after that a MultiHexagon-Grid search strategy is taken. It is based on the consideration that Unsymmetrical cross search can’t find irregular motion vector. A Sixteen-Point-Hexagon-Pattern (16-HP) is used as base and it is extended from 1 to W/4 scale factor (see figure 2). This steps consider the fact that in horizontal direction movement are more probable then in vertical one. The fourth step is the Extended Hexagon Based Search (EHS). I search the motion vector with a fixed small hexagon pattern.
III. RATE-DISTORTION OVERVIEW The Rate-Distortion theory cover a large range of application like optimal bit allocation, quality control etc. We can find two main categories of R-D theory: the Shannon's model and the one derived from the high-rate quantization theory. These two theory are complementary and converges to the lower bound when input block goes to infinity and the bitrate is high. This assumptions are true only in the theory. In real encoding, the input block size cannot be infinite and, in addiction, transform-based encoder achieve an high compression ratio and often a low bitrate signal. For those reasons a set of parameters are incorporated in the R-D model to keep-up with the complexity of coding systems and the diversity of video sources. From the R-D theory we can find not a specific model but only upper and lower bound for general sources. (1) 2 where Q is the entropy power and σ G is the variance of a Gaussian distributed source. We can express the distortion function as:
(2) where K and α > 0 are unspecified constants. In video application, considering the restriction of non-infinity input length and low-bitrate, is preferred:
Complexity rate-distortion trade offs in video communication
(3)
where Îł is the correlation coefficient of the source and is a source-dependent scaling parameter (1.4 for Gaussian, 1.2 for Laplacian, and 1 for uniform sources). This is the starting point to build a R-D model. A. LINEAR MODEL Starting form (2), Mallat et al. in [9] presents a model for the transform-based low bitrate images: (4) where , C > 0 and others parameters are adjusted with respect to practical coding settings. To measure the quality of a video sequence, we choose to use the PSNR (peak Signal-to-Noise Ratio). Using this measure we can simplify the (2) in a linear function of coding rate R: (5) where c and d are constant. In [7] we can find that (4) is linear with respect to R only when the bitrate is sufficient high but we need a different model to describe what happen in the low-bitrate range. B. QUADRATIC MODEL In the traditional rate-distortion theory, the distortion function is measured considering the mutual information of a transmission from the source to an user. The classical model is (2) but considering an i.i.d. memoryless source the classical model can be simplified as (3). In reality, few sources are memoryless; it is necessary to improve (3) with some others parameters that describe the dependence on the content of the source in the measure. There is another approach starting from the assumption that the source has a Laplacian distribution. where (6) If we consider a distortion measure , we can derive the rate-distortion function: (7) This rate-distortion function can be expanded into a Taylor series:
(7) From the equation above, we can express the rate distortion function as a linear combination of 1/D and 1/D2. So, we can express with an arbitrary constant: (8) This is a simple quadratic model for the rate-distortion measure.
IV.
TEST RESULTS A. TEST ENVIRONMENT In order to study the ratio between rate and distortion in H.264/AVC, we have conducted experimental simulations over real video sequences: Foreman, Soccer and Coastguard. The environment used for our tests has been: 1. H.264/MPEG4 AVC Jm12.2 encoder, compiled with MS Visual Studio under Windows systems. 2. Matlab scripts capable to start the encoder with the correct configuration parameters, parse the output logs and generate the charts. In order to obtain meaningful results we have made the tests under different conditions. Using the H.264/AVC Baseline profile, we focalized the attention on inter prediction with block search size of 4x4. Then the sequences have been coded varying some parameters of the encoder: 1. Number of frames of the sequence to code (varying with the size of the sequence), 2. Search Range (1:32), 3. Search Mode (Full, Fast, UMHexagonS), 4. Quantization Parameter (1:51). To obtain the values of complexity we inserted a counter into the source code of the encoder in order to show the total steps required for coding the sequences changing the parameters of the encoder. B. NUMERICAL RESULTS The obtained results are divided considering the Search Mode algorithm used in the Motion Estimation. The following charts show the ratio between the PSNRY, complexity (the number of steps required to code the sequences) and the bitrate for the three sequences. Our purpose is to see the trend of the rate-distortion observing the variation of the complexity. Varying the value of the search range from 1 to 32, and the value of quantization parameter from 1 to 51, we have obtained a 3D plan made by 32 curves of 51 points, each of them having different PSNRY and bitrate. In the charts we have added also the trend of the quadratic model fitted with the data obtained with the tests (blu curves), such that it can be seen the difference between our data and the theoretical trend. The values of the data obtained with our tests are printed as green points around the theoretical curves.
Complexity rate-distortion trade offs in video communication
From the Table 1 it can be also seen how, keeping fixed the value of quantization parameter, the bitrate decreases a lot with respect to the PSNRY. This one remains globally equal while the bitrate decrease of 30kbps. If we consider the variation of quantization parameter from 1 to 51 in the codification, it can be seen that the PSNRY varies in a very small way with respect to the bitrate (Table 3). Sequence Coastguard
Max bitrate 192.01
Min bitrate 144.92
Max PSNRY 35.58
Min PSNRY 34.59
Table 3: high variation of bitrate with lower changes of PSNRY
As you can see the bitrate has a range of about 48 kbps while the PSNRY decreases of about 1dB. The table below shows the results of the other sequence coded for this test. Sequence Figure 3: plot 3D of Coastguard using Full Search
The results are different for each type of sequence and searching mode. We have been used the Full search mode first. In the Table 1 it can be seen a general descendant trend of PSNRY while the search range increases and quantization parameter fixed to 1, but in a so minimum way that it can be considered globally constant, in fact the value of Coastguard vary from 35.58 dB to 35.38 dB. It can be also seen the descendant trend of the bitrate increasing the search range. Sequence
Search Range 1 10 20 32
Coastguard Coastguard Coastguard Coastguard
Bitrate
PSNRY
Complexity
192.01 162.02 162.23 162.94
35.58 35.38 35.38 35.38
10 442 1682 4226
Foreman Soccer
Max bitrate 130.29 178.05
Min bitrate 99.07 130.27
Max PSNRY 37 37.26
Table 4: high variation of bitrate with lower changes of PSNRY
The results (and the Figure 4 below) show that we can reach the same PSNRY decreasing the bitrate, that it means decreasing the weight of the coded sequence, but increasing strongly the complexity of the encoder.
Table 1: Coastguard PSNRY-Search Range
The results also show that the more increases motion in the sequence the more increases the PSNRY reached. In a ascendant order it can be seen that with Coastguard the maximum value of quality is 35.58 dB , Foreman 37 dB and Soccer 37.26. Sequence Coastguard Foreman Soccer
Search Range 1 1 1
Bitrate 192.01 130.29 178.05
Table 2: PSNRY-video complexity
Min PSNRY 35.98 36.15
PSNRY 35.58 37 37.26 Figure 4: PSNRY-Bitrate-Complexity
Complexity rate-distortion trade offs in video communication
The figure 4 is the 2D projection of the figure 3 and the different curves are related to different value of search range. Moving from right to the left the search range increases. The horizontal red line in the figure shows that we can reach the value 35.2 of PSNRY with different and smaller values of bitrate, but increasing the search range. In the table below are represented the results for the Coastguard. Bitrate 182.66 146.99 147.24 148.72
PSNRY 35.21 35.21 35.21 35.208
Complexity 10 170 842 4226
Table 5: equal values of PSNRY increasing complexity in Full Search mode
Sequence Coastguard Coastguard Coastguard Coastguard
Search Range 1 10 20 32
Bitrate
PSNRY
Complexity
233.71 166.08 165.13 165.71
35.71 35.38 35.36 35.37
6 79 221 434
Table 6: Coastguard PSNRY-Search Range in Fast Search mode
As in the case of Full search mode, also in the Fast mode, in the Figure 6 the horizontal red line shows that we can reach the same level of PSNRY (Table 7) with different levels of complexity, which here is lower than Full mode. As in the previous case of Figure 4, moving from right to left side of the chart the level of complexity increases.
It can be seen that the PSNR is globally equal instead the bitrate decreases strongly. The Fast search is the second mode used for the tests. The resulting trends are globally the same as with the Full search, but the complexity is strongly decreased. The total number of steps required for the encoder to code the sequences decreases from a maximum of 4226 in the Full Search to 434 in the Fast mode.
Figure 6:Coastguard, PSNRY-Bitrate-Complexity
Bitrate 219.76 190.25 169.26 162.51 160.74
PSNRY 35.27 35.27 35.26 35.27 35.27
Complexity 6 11 16 104 397
Table 7: equal values of PSNRY increasing complexity in Fast Search mode Figure 5:plot 3D of Coastguard using Fast Search
The results show that we can reach very similar values of PSNRY of Table 1, but here with lower complexity (Table 6). The drawback is the growth of bitrate in the Fast search mode. Complexity rate-distortion trade offs in video communication
The third search mode used for the test is the UMHexagonS. The trend is again globally equal to the Full Search and Fast Search mode, but in this case the complexity is even lower than the previous.
Figure 8: Coastguard, PSNRY-Bitrate-Complexity
Figure 7: plot 3D of Coastguard using UMHexagonS Search
The results demonstrates the lower values of the complexity. The encoder needs few working steps (maximum 23) to encode the whole sequences, but the level of PSNRY are the same reached with the Full and Fast Search mode. Sequence Coastguard Coastguard Coastguard Coastguard
Search Range 1 10 20 32
Bitrate
PSNRY
Complexity
203.30 170.45 170.89 170.58
35.59 35.43 35.43 35.42
22 23 23 23
Table 8:Coastguard PSNRY-Search Range in UMHexagonS Search mode
The UMHexagonS allows to reach the same level of PSNRY with different values of complexity and bitrate, as in the previous search mode. As you can see from Figure 10 we can achieve a big bitrate gain keeping globally the same level of PSNRY.
Bitrate 190.63 170.32 166.19 164.77 163.26
PSNRY 35.205 35.21 35.21 35.21 35.2
Complexity 22 22 22 22 23
Table 9: equal values of PSNRY increasing complexity in Hex Search mode
V. CONCLUSIONS H.264 allows to encode the sequences with different algorithms as it is possible to see. We can reach the same level of quality (low level of distortion) increasing the computational complexity of the encoder, but decreasing strongly the bitrate. Coding the sequences with Full, Fast and UMHexagonS search mode we can reach the same level of PSNRY with different computational complexity and bitrate. Search Mode Full Fast Hex
QP 1 1 1
Search Range 32 32 32
Bitrate
PSNRY
Complexity
148.72 160.74 163.26
35.208 35.27 35.2
4226 397 23
Table 10: PSNRY with different search algorithms
Considering all the data obtained with our tests, we have calculated the ratio between PSNRY/Bitrate for all the sequences and search algorithms. The results are show in the tables below: Complexity rate-distortion trade offs in video communication
Sequence PSNRY/Bitrate Complexity (steps) PSNRY (dB) Bitrate (kbps)
Full 0.2388 226 34.61 144.91
Fast 0.2377 144 34.6 145.52
Hex 0.2294 23 34.65 150.99
VII.
Zhibo Chen; Peng Zhou; Yun He; “Fast Integer Pel and Fractional Pel Motion Estimation for JVT”, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6) 6th Meeting: Awaji, Island, JP, 5-13 December, 2002.
[2]
A.M. Tourapis; A. Leontaris; K. Suring; G. Sullivan; “Revision of the H.264/AVC Reference Software Manual”; Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6); 23rd Meeting: San Jose, California, USA, 21–27 April, 2007.
[3]
Rahman, C.A.; Badawy, W.; “UMHexagonS algorithm based motion estimation architecture for H.264/AVC”, System-on-Chip for Real-Time Applications, 2005. Proceedings. Fifth International Workshop on, 20-24 July 2005, Page(s):207 – 210
[4]
Furukawa, J.; Kiya, H.; Noguchi, Y; “A fast full search block matching algorithm for MPEG-4 video”; Image Processing, 1999. ICIP 99. Proceedings. 1999 International Conference on; Volume 1, 1999 Page(s):61 - 65 vol.1
[5]
Toivonen, T.; Heikkila, J.; “Fast full search block motion estimation for H.264/AVC with multilevel successive elimination algorithm”; Image Processing, 2004. ICIP '04. 2004 International Conference on Volume 3, 24-27 Oct. 2004 Page(s):1485 - 1488 Vol. 3
[6]
YUV Sequences Source http://trace.eas.asu.edu/uyv/index.html
[7]
Dai, M.; Loguinov, D.; Radha, H., “Rate-distortion modeling of scalable video coders” Image Processing, 2004. ICIP '04. 2004 International Conference on, IEEE, Volume 2, 24-27 Oct. 2004 Page(s):1093 - 1096 Vol.2.
[8]
Rezaei, M.; Gabbouj, M.; Wenger, S.;“Analyzed rate distortion model in standard video codecs for rate control”; Signal Processing Systems Design and Implementation, 2005. IEEE Workshop on; 2-4 Nov. 2005 Page(s):550 - 555
[9]
S. Mallat and F. Falzon, “Analysis of Low Bit Rate Image Transform Coding,” IEEE Trans. on Signal Processing, vol.46, April 1998.
[10]
T. Chiang and Y.Q. Zhang, "A New Rate Control Scheme Using Quadratic Distortion Model," IEEE Trons. CSVT. vol. 7. Feb. 1997.
[11]
Min Dai; Loguinov, D.; Radha, H.; “Statistical analysis and distortion modeling of MPEG-4 FGS”, Image Processing, 2003. ICIP 2003. Proceedings. 2003
Table 11: Coastguard
Sequence PSNRY/Bitrate Complexity (steps) PSNRY (dB) Bitrate (kbps)
Full 0.3608 2402 36.073 100.06
Fast 0.3607 301 36.12 100.14
Hex 0.3398 23 36.13 106.32
Fast 0.2757 956 36.19 131.24
Hex 0.2456 23 36.37 148.04
Table 12: Foreman
Sequence PSNRY/Bitrate Complexity (steps) PSNRY (dB) Bitrate (kbps)
Full 0.2789 3970 36.15 129.59 Table 13: Soccer
For each sequence we can reach different ratio PSNRY/Bitrate. The best ratio is obtained always with the Full Search algorithm, and this result is because of the lower values of the bitrate that can be reached with the Full Search mode. The drawback of this algorithm is the higher value of computational complexity than the others. The best ratio PSNRY/Bitrate seems to be reached with the Fast Search mode because we can obtain globally the same PSNRY of Full Search with lower complexity and similar bitrate. The Hex search mode gives the best computational complexity, but the highest bitrate. VI. POSSIBLE FUTURE WORKS In this work we didn’t considerate some features of H.264. They could be used other search algorithms, Simplified Exagon Search and Enhanced Predicted Zonal Search (EPZS). In addiction it could be used the bitrate control mode as the parameter of the encoder, to set a constraint to the coding in such a way to compare better the results. We didn’t consider InterSearch block size different from 4x4, even if H.264 gives the possibility to use different sizes for coding the sequences (4x4, 4x8, 8x4, 8x8, 8x16, 16x8, 16x16). In this work it has been used only the Inter Prediction (P frames), and in the future this test could be improved using also the Intra Prediction (I frames) and a different profile that uses B frames.
REFERENCES
[1]
Complexity rate-distortion trade offs in video communication
International Conference on; Volume 3, 14-17 Sept. 2003 Page(s):III - 301-4 vol.2. [12]
Tagliasacchi M.; “Lecture Notes on Digital Audio-Video Signal Processing”, 19 December 2006 Page(s): 161175.
[13]
Tian Song; Ogata, K.; Saito, K.; Shimamoto, T.; “Adaptive Search Range Motion Estimation Algorithm for H.264/AVC”; Circuits and Systems, 2007. ISCAS 2007. IEEE International Symposium on; 27-30 May 2007 Page(s):3956 – 3959.
Complexity rate-distortion trade offs in video communication