Spring’09
EE669-HW3 Multimedia data compression Neha Rathore-5994499980
[JEPG AND MPEG STANDARDS] Study of jpeg and mpeg compression schemes and their effects and performance.
EE669 Multimedia Data Compression HW3-JPEG and MPEG Standards PROBLEM 1: The JPEG Compression Standard PART A: Written Question PROBLEM 1. BASELINE ENCODING AND DECODING PROCESS MODULE 1
Baseline JPEG The JPEG specification defines a minimal subset of the standard called baseline JPEG, which all JPEG-aware applications are required to support. This baseline uses an encoding scheme based on the Discrete Cosine Transform (DCT) to achieve compression. DCT is a generic name for a class of operations identified and published some years ago. DCT-based algorithms have since made their way into various compression methods. DCT-based encoding algorithms are always lossy by nature. DCT algorithms are capable of achieving a high degree of compression with only minimal loss of data. This scheme is effective only for compressing continuous-tone images in which the differences between adjacent pixels are usually small. In practice, JPEG works well only on images with depths of at least four or five bits per color channel. The baseline standard actually specifies eight bits per input sample. Data of lesser bit depth can be handled by scaling it up to eight bits per sample, but the results will be bad for low-bit-depth source data, because of the large jumps between adjacent pixel values. For similar reasons, color mapped source data does not work very well, especially if the image has been dithered. The JPEG compression scheme is divided into the following stages: 1. Transform the image into an optimal color space. 2. Down sample chrominance components by averaging groups of pixels together. 3. Apply a Discrete Cosine Transform (DCT) to blocks of pixels, thus removing redundant image data. 4. Quantize each block of DCT coefficients using weighting functions optimized for the human eye. 5. Encode the resulting coefficients (image data) using a Huffman variable word-length algorithm to remove redundancies in the coefficients. Figure below summarizes these steps, and the following subsections look at each of them in turn. Note that JPEG decoding performs the reverse of these steps. Figure 9-11: JPEG compression and decompression JPEG compression & decompression Color Transform
Down Sampling
F DCT2D
Quantization
CODE DC-DPCM
OFFSET = 128
RAW image data
Color Transform
1
UP Sampling
I DCT- 2D
http://www.fileformat.info/mirror/egff/ch09_06.htm
Quantization Table
De Quantization
AC-ZIG ZAG SCAN
DECODE
JPEG Compressed image data
Transform the image The JPEG algorithm is capable of encoding images that use any type of color space. JPEG itself encodes each component in a color model separately, and it is completely independent of any color-space model, such as RGB, HSI, or CMY. The best compression ratios result if a luminance/chrominance color space, such as YUV or YCbCr, is used. Most of the visual information to which human eyes are most sensitive is found in the high-frequency, gray-scale, luminance component (Y) of the YCbCr color space. The other two chrominance components (Cb and Cr) contain high-frequency color information to which the human eye is less sensitive. Most of this information can therefore be discarded. In comparison, the RGB, HSI, and CMY color models spread their useful visual image information evenly across each of their three color components, making the selective discarding of information very difficult. All three color components would need to be encoded at the highest quality, resulting in a poorer compression ratio. Gray-scale images do not have a color space as such and therefore do not require transforming. Downsample chrominance components The simplest way of exploiting the eye's lesser sensitivity to chrominance information is simply to use fewer pixels for the chrominance channels. For example, in an image nominally 1000x1000 pixels, we might use a full 1000x1000 luminance pixels but only 500x500 pixels for each chrominance component. In this representation, each chrominance pixel covers the same area as a 2x2 block of luminance pixels. We store a total of six pixel values for each 2x2 block (four luminance values, one each for the two chrominance channels), rather than the twelve values needed if each component is represented at full resolution. Remarkably, this 50 percent reduction in data volume has almost no effect on the perceived quality of most images. Equivalent savings are not possible with conventional color models such as RGB, because in RGB each color channel carries some luminance information and so any loss of resolution is quite visible. When the uncompressed data is supplied in a conventional format (equal resolution for all channels), a JPEG compressor must reduce the resolution of the chrominance channels by downsampling, or averaging together groups of pixels. The JPEG standard allows several different choices for the sampling ratios, or relative sizes, of the downsampled channels. The luminance channel is always left at full resolution (1:1 sampling). Typically both chrominance channels are downsampled 2:1 horizontally and either 1:1 or 2:1 vertically, meaning that a chrominance pixel covers the same area as either a 2x1 or a 2x2 block of luminance pixels. JPEG refers to these downsampling processes as 2h1v and 2h2v sampling, respectively. Another notation commonly used is 4:2:2 sampling for 2h1v and 4:2:0 sampling for 2h2v; this notation derives from television customs (color transformation and downsampling have been in use since the beginning of color TV transmission). JPEG typically uses 4:2:0 OFFSET by 128 Before computing the DCT of the sub image, its grey values are shifted from a positive range to one centered around zero. For an 8-bit image each pixel has 256 possible values: [0,255]. To center on zero it is necessary to subtract by half the number of possible values, or 128.
Subtracting 128 from each pixel value yields pixel values on [ − 128,127]
Apply a Discrete Cosine Transform The image data is divided up into 8x8 blocks of pixels. (From this point on, each color component is processed independently, so a "pixel" means a single value, even in a color image.) A DCT is applied to each 8x8 block. DCT converts the spatial image representation into a frequency map: the low-order or "DC" term represents the average value in the block, while successive higher-order ("AC") terms represent the strength of more and more rapid changes across the width or height of the block. The highest AC term represents the strength of a cosine wave alternating from maximum to minimum at adjacent pixels. The DCT calculation is fairly complex; in fact, this is the most costly step in JPEG compression. The point of doing it is that we have now separated out the high- and low-frequency information present in the image. We can discard high-frequency data easily without losing low-frequency information. The DCT step itself is lossless except for roundoff errors. The 2D forward DCT
where
After applying the 2-D DCT transformation on the block, we get an 8 x 8 S (u, v).There is a rather large value of the top-left corner of the S (u, v). This is the DC coefficient. The remaining 63 coefficients are called the AC coefficient. The advantage of the DCT is its tendency to aggregate most of the signal in one corner of the result. The quantization step betters this effect while simultaneously reducing the size of the DCT coefficients to 8 bits or less, resulting in a signal with a large end region containing zeros that the entropy stage can simply get rid of. Quantization: Quantize each block To discard an appropriate amount of information, the compressor divides each DCT output value by a "quantization coefficient" and rounds the result to an integer. The larger the quantization coefficient, the more data is lost, because the actual DCT value is represented less and less accurately. Each of the 64 positions of the DCT output block has its own quantization coefficient, with the higher-order terms being quantized more heavily than the low-order terms (that is, the higher-order terms have larger quantization coefficients). Furthermore, separate quantization tables are employed for luminance and chrominance data, with the chrominance data being quantized more heavily than the luminance data. This allows JPEG to exploit further the eye's differing sensitivity to luminance and chrominance. It is this step that is controlled by the "quality" setting of most JPEG compressors. The compressor starts from a built-in table that is appropriate for a medium-quality setting and increases or decreases the value of each table entry in inverse proportion to the requested quality. The complete quantization tables actually used are recorded in the compressed file so that the decompressor will know how to (approximately) reconstruct the DCT coefficients. Selection of an appropriate quantization table is something of a black art. Most existing compressors start from a sample table developed by the ISO JPEG committee. It is likely that future research will yield better tables that
provide more compression for the same perceived image quality. Implementation of improved tables should not cause any compatibility problems, because decompressors merely read the tables from the compressed file; they don't care how the table was picked. We used here,
Coding of DC coefficients: ( USED IN JPEG ): First order prediction DPCM = DCi – DCi-1 Where DCi and DCi-1 are the current (8x8) block and the previous(8x8) block DC coefficients. • The dynamic range of DCT coefficients is 11 bits when the source image precision is 8 bits. • Prediction based on the previous DC coefficient increases the prediction error by 1 bit.(DPCM) • The Huffman code tables for Luminance and Chrominance DC difference are used. That is, VWL, variable length coding is done. This operation is applied to all the (0,0) element of 8x8 blocks which is the DC component. Coding of AC coefficients: Zigzag scan of DCT coefficients is done. • Many of the AC coefficients that correspond to high frequencies become zero after quantization. • Thus, the runs of zero along the zigzag scan are identified and compacted. • Entropy coding is then done. During JPEG encoding the frequency components of the data block are ordered so that the low frequency components are at the beginning and higher frequency components follow. To do this the frequency matrix is ordered in a zig-zag fashion as described in the following diagram
Encode the resulting coefficients
The resulting coefficients contain a significant amount of redundant data. Huffman compression will losslessly remove the redundancies, resulting in smaller JPEG data. An optional extension to the JPEG specification allows arithmetic encoding to be used instead of Huffman for an even greater compression ratio. At this point, the JPEG data stream is ready to be transmitted across a communications channel or encapsulated inside an image file format. VWL ENCODER: • In order to encode the above generated coefficient pattern, JPEG uses Huffman encoding. JPEG has a special Huffman code word for ending the sequence prematurely when the remaining coefficients are zero. • Each non zero AC coefficient is described by a composite R/S. R: a 4-bit zerorun from the previous non zero value. S: representing of the 10 categories of AC. Huffman encoding/decoding tables are generated for each composite category R/S. • Also Huffman doe table for Luminance and Chrominance DC difference is used. • Run length encoder is used when there is a large trail of zeros after a non zero DCT coefficient. In that case we represent the SSSS as 1111 and RRRR to represent the number of zero run followed by SSSS=0000 and RRRR=0000 to represent the end of run. Problem 1.2 To find the quantization parameters q(j) for all j for minimum distortion and max. buffer size 100 with C=15 and B(0)=0 using formula
SOLUTION: We know that higher bit rate gives us lower distortion. Hence, in order to find the optimal set of quantization parameters, we find the quantization parameter set that gives us minimum distortion, has higher bit rate and at the same time does not exceed buffer which is 100 in our case. BLOCK 1 (j=1) BLOCK 2 (j=2) BLOCK 3 (j=2) BLOCK 4 (j=4) Q(j) =1 (110,4) (85,2) (55,3) Q(4)=1 (75,3) Q(j)=2 (70,6) (70,5) (45,4) Q(4)=2 (55,5) Q(j)=3 (50,11) (35,10) (25,9) Q(4)=3 (30,8) Q(j)=4 (40,15) (20,13) (15,11) Q(4)=4 (25,9) Since Q(1)=1 for block One is has rate more than maximum buffer size buffer size 100 so we eliminate all possible combinations of Q (1).=1 We try to choose the optimal set such that we can send maximum number of blocks while avoiding overflow and underflow of buffer. This means we need to send exactly 100 elements in the buffer. Block 1 Block 2 Block 3 Block 4 q(1) = 2 q(2) = 4 q(3) = 2 q(4) = 4 Quantization Parameter (rate, distortion) (70, 6) (20, 13) (45, 4) (25, 9) Buffer 55 66 90 100 Distortion 6 19 23 32
Quantization Parameter (Rate, distortion) Buffer Distortion
Block 1
Block 2
Block 3
Block 4
q(1) = 3
q(2) = 3
q(3) = 2
q(4) = 3
(50, 11) 35 11
(35, 10) 55 21
(45, 4) 85 25
(30, 8) 100 33
Quantization Parameter (rate, distortion) Buffer Distortion
Block 1
Block 2
Block 3
Block 4
q(1) = 2
q(2) = 3
q(3) = 3
q(4) = 3
(70, 6) 55 6
(35, 10) 75 16
(25, 9) 85 25
(30,8) 100 33
By observation we see that the minimum distortion is given by the 1st set . Hence it this the optimal set with minimum distortion 32. PART B: programming problem Problem 1.3 JPEG compression Quality factor
QUALITY FACTOR Original file size File size after compression COMPRESSION % PSNR
1 66614 bytes 1847 97.22% 18.14005
10 66614 bytes 3152 95.26% 23.424
25 66614 bytes 4381 93.42 24.46416
50 66614 bytes 6081 90.87% 24.76418
100 66614 bytes 37487 43.72% 25.22354
We see that the image with quality factor=1 gives a compression ratio as high as 98% which is tremendous as compared to other compression algorithms we studied earlier. However, the quality of image is not very good at this quality factor. Also for an acceptable quality of 25 we see little or no artifacts and the compression achived in 93% which is also very good as compared to other compression algorithms. For a quality of 100 also, wee se the compression ratio achieved is 43% which is also considered good.
Quality factor: Typically quality factor f ranges from 1 to 100. It is defined as Q=1/f* ( distortion matrix). The quantization matrix is multiplied by the quality factor to determine the quantization step size. A lower value of f gives higher distortion and vice versa. As we know distortion is inversely proportional to the bits per pixel and hence quality, a lower value of f leads to poorer quality of the coded image. In our case the images are coded by quality factor 1,10,25,50,100. And the results are seen below:
ORIGNAL IMAGE
Quality factor =100: we see that the decoded image is as good as the original image but little or no artifact.
Quality factor =50: wee see the image is more blur and has lesser quality then the previous image.
Quality factor =25: we see the blocking artifact is coming into picture and the image is decreasing in quality.
Quality factor =10: we see drastic blocking artifacts near the background and the clock surface along with the book.
Quality factor =1: the image has drastic blocking effects and very very poor quality. The image is almost unrecognizable.
We see the best result is observed at quality factor of 100 and thw worst is obtained at qulity factor 1. While the optimum can be taken as 25. 1.3.3: we have seen the blocking artifact with decreasing number of quality factor. We have also discussedthe effect of blocking artifact with decreasing quality factor. The reason for this is that with decreasing quality factor we have a increased step size leading to larger number of zeros in the dct block. This leads to the DC value of the dct block dominate and hence give the block the average values of the dct block leading to blocking artifact. We started bserving the blocking artifact significantly at Q=11 and PSNR: 23.1459 as shown below
Q =22 PSNR or 5 encoded images:
1.4 Post-processing on JPEG encoded images ALGORITHM: Deblocking can be achieved by simple edge operations. There are several ways of doing this. I have simply used smoothening filter followed by sharpening of image. The algorithm is as follows. 1. Calculate the second order edge map of the image before encoding. 2. Save this information for future use. 3. Encode the image. 4. Decode the image and get jpeg decoded image. 5. Find the edge map of the new image.
6. Compare it with the old edge map and find out the new edges generated as a result of blocking artifact. 7. We need to apply smoothening operation in these pixels. 8. We use simple averaging filter as a smoothening operation filter over these pixel values. For simplicity we take 3x3 neighborhood. 9. Now sharpen the image using laplacian sharpening mask. Or any other simple sharpening mask.
image
pepper 1 2 3 4 5
28.888011 30.496524 31.696277 33.217187 34.513525
pepper deblock 28.08759 29.209858 30.290453 31.503225 32.332968
clock 23.197536 23.705101 24.011477 24.448146 24.598022
Clock 1
Deblocked clock
Clock 2
Deblocked clock
Clock deblock 22.85886 23.456942 23.890356 24.238362 24.420765
Clock 3
Deblocked clock
Clock 4
Deblocked clock
Clock 5
Deblocked clock
Pepper1
Deblocked pepper
Pepper2
Deblocked pepper
Pepper3
Deblocked pepper
Pepper4
Deblocked pepper
Pepper5
Deblocked pepper
1.4.2 the RD curves form the deblocked images vs decoded images are shown above: OBJECTIVE QUALITY: we notice that in both the graphs the PSNR for the deblocked image appears to be less then the PSNR of the decoded image. One possible reason for this is that, while deblocking we try to make the image more acceptable to the human eye. We do this by interpolating the pixels that are missing in the blocking artifact. When we smooth or sharpen the image we generate new bits to replace the distorted ones. These new bits might have little correlation with the original ones and can be very different form the original pixel values. Thus the PSNR values come out to be lower then expected. SUBJECTIVE quality: we see that the images are more or less smoothened. In clock1 ,except form minor artifacts in the background we see that the rest of the image is pretty smooth. The same goes for pepper images. The images are very smooth with little loss of sharpness. We can say the subjective quality is much much better then the original images. Clearly the objective ad subjective quality tells the opposite story. They do not have any consistency because of reasons described above. 1.4.4 image
pepper 1 2 3 4 5
28.888011 30.496524 31.696277 33.217187 34.513525
pepper deb 28.08759 29.209858 30.290453 31.503225 32.332968
gain -0.800421 -1.286666 -1.405824 -1.713962 -2.180557
clock
clock deb
23.197536 23.705101 24.011477 24.448146 24.598022
22.85886 23.456942 23.890356 24.238362 24.420765
gain -0.338676 -0.248159 -0.121121 -0.209784 -0.177257
As we see that the RD curve for deblocked images gives poorer objective quality, hence we get a negative gain and it would not be advisable to use this images as final images for compression. For deblocked images, clock 1 corresponds to q=8( file size 2800); clock 2 Q=16(3550); clock3 Q=19(3990); clock 4 Q= 21(4100); clock5 24(4250). Since the gain is negative, there might not be any saving of bits in getting similar image quality. However, the deblocked image clock 3 is similar in quality as the decoded image with q=50. Hence, for some images this might give actually saving of lot of bits because, the PSNR is low which implies lower bit rate but we are getting higher quality in lower bit rate.
Problem 2: The MPEG Compression Standard 2.1 Program Setup 2.1.1: Flowg-1-ucb Frame 1 of Flowg-1ucb
Frame 11 of Flowg-1ucb
Frame 21 of Flowg-1ucb
Flowg-3-ucb
Frame 1 of Flowg-3ucb
Frame 11 of Flowg-3ucb
Frame 21 of Flowg-3ucb
2.1.2
2.2 Analyzing MPEG Coded Video 2.2.1 Bits per frame
flowg-1-mssg1
flowg-3-mssg1
flowg-6mssg1
flowg-1ucb
flowg-3ucb
flowg-6-ucb
0
I
202984
202984
359656
142064
310184
537944
1
P
87684
87684
142028
71004
122492
191116
2
B
29732
29732
40724
30508
48364
74748
3
B
60572
60572
221100
10620
28124
82460
4
I
228504
228504
399296
30012
92060
246564
5
B
67004
67004
158700
17036
63412
151084
6
B
62244
62244
134588
10820
29692
98692
7
P
127012
127012
280700
104288
250192
444168
8
B
64724
64724
132612
36460
174596
328124
9
B
68492
68492
129116
11460
48044
125804
10
I
218928
218928
376424
8892
20028
69780
11 12
B B
68828 62940
68828 62940
140756 138404
30812 17684
131180 58300
249092 133148
13
P
139996
139996
274996
13868
24236
61428
14
B
52956
52956
130828
119400
259256
421128
15
B
55244
55244
141380
50132
185644
336732
16
I
222976
222976
367312
11556
39556
103908
17
B
61428
61428
135260
8396
16412
58388
18
B
59484
59484
134676
38012
126948
288548
19
P
144828
144828
295972
18972
46860
127244
20
B
52708
52708
131996
15428
19388
58980
21
B
59820
59820
136004
125144
272288
432040
22
I
213512
213512
356688
64356
189892
349100
23
B
56788
56788
131172
15756
36684
67876
24
B
59196
59196
129756
8572
15540
0
26
B
58180
58180
325372
34084
119996
0
25
P
0
53052
127372
17812
44212
0
27
B
53052
0
127492
10604
18948
0
2.2.2 PSNR per frame:
FRAME #
flow1mssg1
flowg-1-ucb
flow3mssg1
flowg-3ucb
flow6mssg1
flowg-6-ucb
1 2 3 4 5 6 7 8 9 10 11
26.669523 26.795415 26.144233 26.577443 26.398941 26.369891 26.171807 21.981831 21.318391 20.937491 19.105336
26.062545 25.120504 24.256905 25.482625 24.256905 22.972065 22.604121 24.766206 23.476975 23.2032 23.217187
24.2778 23.4347 23.8967 24.1875 23.7239 23.9804 24.4243 23.923 24.1612 24.5601 23.8508
28.174452 26.577443 26.035653 27.600019 26.700656 25.07729 25.955964 27.716877 26.45763 25.851937 27.796566
25.2142 24.0729 24.9916 25.0471 24.646 24.6649 25.3778 24.605 24.8985 25.6216 24.8302
28.685977 27.196587 27.231752 28.399525 27.600019 26.892287 27.96047 28.588379 27.600019 27.024907 28.636904
12 17.638623 22.507875 23.2392 25.877711 24.2079 27.338991 13 17.270767 21.735939 24.3353 24.726362 25.3612 26.199558 14 17.127098 22.231308 23.8002 26.45763 24.7273 27.716877 15 17.456375 25.142273 23.9239 27.716877 24.9883 28.54039 16 16.789033 24.048404 24.4248 26.763598 25.4811 28.002431 17 19.648912 23.521825 23.7841 25.651071 24.8309 27.024907 18 20.227952 23.674762 23.214 27.231752 24.096 28.130804 19 23.44733 22.80326 24.3815 25.700423 25.3053 26.99137 20 20.695706 22.002965 23.591 24.686881 24.5248 26.341034 21 19.780243 22.287491 23.805 25.929723 24.6805 27.486224 22 18.521342 16.744619 24.3848 27.600019 25.3979 28.399525 23 22.264931 24.031472 22.9237 26.089604 24.6647 24 23.374092 23.491874 25.675677 24.829 25 25.955964 24.95017 27.756539 25.2556 26 26.089604 23.849456 26.341034 25.0132 27 26.827466 23.359591 25.252786 24.8911 28 27.161703 23.690356 26.0123 Here –“-“ represents the error in file hence for graph I assumed an arbitrary value 22 & 28 respectively 2.2.3 3d plot of motion vectors.
flowg-3-ucb
flowg-3-mssg1
2.3 Rate Control for MPEG-2 Encoder Preselected mquant of MB= 2MB/s and 4Mb/s (arbitrary) 2.3.1 bit per frame for each setting: Num
Type 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
I P B B P B B P B B I B B P B B P B B P B B I
BPP at 4mb/s 410888 142020 53132 98516 230940 104516 98700 251532 101708 104524 346448 115780 110172 234772 99164 100772 261044 104252 101076 246484 98804 101988 290592
BPP at 2Mb/s 223672 89260 35516 44292 115052 44372 39956 120692 40316 45684 191040 51044 51156 124484 43484 44252 139572 48700 46772 130060 40140 48204 174032
Num
Type 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
I P B B P B B P B B I B B P B B P B B P B B I
PSNR at 2mbit/sec 34.1 32.5 31.7 32 31.9 31.6 31.6 32.8 32.2 32.1 31.7 31.3 31 32.5 31.7 31.7 32.9 30.9 31 32.2 31.5 31.4 31
PSNR at 4Mbit/sec 39.1 36 33.9 36.2 37.6 36.1 36.2 39 37.3 37.2 36.5 36.3 35.8 37.6 36.3 36.2 38.2 35.5 35.6 37.6 36 35.6 35.4
23 24 25 26 27
B B P B B
103100 104468 261180 101556 91324
2.3.2 PSNR for each setting
44660 45524 128788 44140 39740
23 24 25 26 27
B B P B B
31.6 31.2 33.3 30.8 31.7
36.2 36 39.6 35.4 36.3
2.3.3: error bits per frame Num
Type
target
Actual
ERROR at 2MB/s
Num
Type
TARGET
ACTUAL
error at 4 Mb/sec
0 1 2 3 4
I P B B P
246154 96052 56699 43527 112366
223690 89283 35530 44315 115073
127638 32584 7997 68051 73828
0 1 2 3 4
I P B B P
492308 198182 142191 99210 233281
410907 142042 53148 98536 230964
81401 56140 89043 674 2317
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
B B P B B I B B P B B P B B P B B I B B P B B
41245 41508 127624 43514 46695 174985 45399 49161 121667 49045 44124 139014 43239 45771 130015 45115 50070 168889 43804 41526 132574 44386 44607
44391 39978 120713 40332 45703 191059 51066 51171 124503 43505 44273 139594 48716 46793 130082 40160 48221 174046 44685 45540 128807 44164 39761
2883 87646 77199 6363 129282 145660 1905 70496 75458 619 94741 96355 2945 83222 84967 9910 120668 130242 3159 87034 84421 443 4846
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
B B P B B I B B P B B P B B P B B I B B P B B
101477 103476 258038 104800 107867 322104 101765 109756 226942 106642 99427 259823 96882 101319 248643 101582 104340 285586 102784 99466 262679 101460 101340
104532 98722 251549 101733 104541 346462 115800 110198 234786 99190 100790 261064 104268 101100 246505 98823 102004 290608 103118 104491 261199 101580 91339
3055 4754 6489 3067 3326 24358 14035 442 7844 7452 1363 1241 7386 219 2138 2759 2336 5022 334 5025 1480 120 10001
2.3 .4 TM-5 Test Model 5 (TM5) rate control algorithm is used in MPEG-2 compression. The MPEG2 TM rate control method offers a dramatic improvement to the Simulation Model (SM) method used for MPEG-1. TM's improvements are due to more sophistication preanalysis and post-analysis routines. Rate control and adaptive quantization are divided into three steps: Step 1: Target Bit Allocation - this step estimates the number of bits available to code the next picture. It is performed before coding the picture .In Complexity Estimation, the global complexity measures assign relative weights to each picture type. These weights (Xi, Xp, Xb) are reflected by the typical coded frame size of I, P, and B pictures I pictures are assigned the largest weight since they have the greatest stability factor in an image sequence. B pictures are assigned the smallest weight since B data does not propagate into other frames through the prediction process. Picture Target Setting allocates target bits for a frame based on the frame type and the remaining number of frames of that same type in the Group of Pictures. Step 2: Rate Control - by means of a "virtual buffer", this step sets the reference value of the quantization parameter for each macroblock. Rate control attempts to adjust bit allocation if there is significant difference between the target bits (anticipated bits) and actual coded bits for a block of data. Step3: Adaptive Quantization - this step modulates the reference value of the quantization parameter according to the spatial activity in the macroblock to derive the value of the quantization parameter, mquant, that is used to quantize the macroblock Re-computes macroblock quantization factor according to activity of block against the normalized activity of the frame; this improves the uniformity of the image
ADANTAGES: 1. Improves uniformity of image by re computing macroblock quantization factor according to the activity of block against the normalized activity of the frame. 2. Low complexity method 3. Gives reasonably good results for some video sequence at some range of channels. 4. The direct buffer state feedback method takes into account the different types of framesnamely I P B. Disadvantages: 1. The target bit allocation step does not handle the scene changes efficiently. 2. A wrong value of avg_act is used in step 3 after a scene change. 3. The quantization parameter for a macroblock is fully dependent upon the channel buffer fullness. Therefore, macroblocks in a picture may not all be treated equally because of variations in buffer fullness, which may result in non-uniform picture quality. 4. VBV compliance is not guaranteed. 5. The TM5 rate control does not handle scene changes properly because the target bit rate for a picture is determined based only on the information obtained from encoding of the pervious pictures. 6. The algorithm does not use a mechanism to control buffer overflow. 7. The assumptions made by the TM5 algorithm on R-D characteristics do not usually hold. 8. Variable bit rate encoding is not present. 9. Special low delay mode rate control is not available. Suggested changes: •
we can use a single quantization parameter can be used for each picture which will make all the macroblocks in a picture treated equally instead of the current case, where the varying quantization parameter can dominate the predicted and generated number of bits
•
We can analyze the rate-distortion characteristics of input sequence to calculate the best quantization setting which optimizes the quality. So pre-analysis based on each GOP or frame can be employed. R-D optimization based rate control algorithms using interpolation functions that take into account the inter-frame dependencies can be used too. Properties of HVS should also be used to enhance visual quality.
• • •