International Journal of Engineering Research (IJOER)
[Vol-1, Issue-2, April.- 2015]
A Literature Survey of Keyframe Extraction Techniques for Video Content Summarization Pritam H. Patil1, Sudeep Thepade2 1
Department of computer Engineering, Pimpri Chinchwad College of Engineering, Savitribai Phule Pune University, Pune, India 2 Ph.D. Professor, Department of Computer Engineering, Pimpri Chinchwad College of Engineering, Savitribai Phule Pune University, Pune, India
Abstract— In the current era, most of the digital information in the form of multimedia with a giant share of videos. Videos do have audio and visual content where the visual content has number of frames put in a sequence. In video summarization process, several frames containing similar information are need to get processed. This leads to redundant slow processing speed and complexity, time consumption. Video summarization using key frames can ease the speed up of video processing. Key frame extraction plays an important role in content-based video stream analysis, retrieval and inquiry. Video summarization, aimed at reducing the amount of data that must be examined in order to retrieve the information desired from information in a video, is an essential task in video analysis and indexing applications. In order to extract key frame efficiently from different type of videos. In this paper study of various techniques for key frames based video summarization available in the literature. Keywords— key frame extraction; video summarization; video.
I. INTRODUCTION Now a day's increase in video data have imposed new challenges in managing such an enormous information. Due to this Video summarization is attracting more researcher as the applications like information browsing and video retrieval [1], [2] need summary of the videos. A video summary is a small part of the complete video sequence and aims to give a meaningful visual outline of a entire video with fewer video frames. The goal of a video summarization is to provide a crisp video visualization so that the user can understand overall content of video [3] and remove redundant information from the video. Generally, video summarization include the steps like video segmentation, feature extraction, after that redundancy detection based on features and finally video summarization with the non redundant features (key frame) [4]. Generally, videos are structured according to a descending hierarchy of video clips, scenes, shots, and frames. Video structure analysis aims at segmenting a video into a number of structural elements that have semantic contents, including shot boundary detection, key frame extraction, and scene segmentation [5]. It is very important to understand the concept of Video Structure to do some improvement in the state of art of video processing.
Fig. 1.
Structural hierarchy of a video Signal in terms of shots and scenes
As shown in Fig. 1, a whole video can be divided into Scenes which is a large continuous unit of still frames the basic unit of video is frames which are like a sequence of still images. In [11], various terminologies being used in Video can be demonstrated from as following:Page | 15
International Journal of Engineering Research (IJOER)
[Vol-1, Issue-2, April.- 2015]
Shot:- A shot is defined as an image sequence that presents continuous action which is captured from a single operation of single camera. Shots are joined together in the editing stage of video production to form the complete sequence. Shots can be effectively considered as the smallest indexing unit. Scene:- A Scene can have any number of Shot, where shots are the video frames which are continuously recorded using single camera. Then the shot can have any number of frames. Scene is a more semantic notion, which is essentially a story unit. Frame:- A frame is a basic unit of video. Since videos have very complex and unstructured format of content representation it is very much essential to find the technique to do the abstraction of these videos. Video content summarization may somewhat help in this regard.
II.
LITERATURE SURVEY
As the name implies, video summarization is a mechanism for generating a short summary of a video, which can either be a sequence of stationary images (key frames) or moving images (video skims) [6]. Video summarization can be done by using key frames and video skimming. The detailed survey of video processing and video content summarization can be discussed with the few considerations as techniques. A. Methods of Video Content Summarization The Key frame based video summarization is sequence of still image abstract or static story board. It consists of a collection of salient images which are extracted from the video structure. These key frames are also called as representative frames. Video skim based video summarization is a sequence of moving- image abstract or moving story board. The original video is segmented into various parts which is a video clip with shorter duration. The trailer of movie is the best example for video skimming. Video shot is retrieved among other similar frames. The key frame extraction is not only used for video summarization, but also applied in other video processing such as video annotation, video transmission, video shot detection, video segmentation, video indexing and retrieval etc. The work area of key frame extraction is so wide and rich. Many techniques for key frame detection have been reported and lots of work is done so far in video content summarization. With some relative research work done so far we have identified few things as basis for key frame extraction intended for video content summarization. B. Video Content Summarization Using Key frame Extraction Before A video summarization is a summary which represents abstract view of original video sequence and can be used as video browsing and retrieval systems. It can be a highlight of original sequence which is the concatenation of a user defined number of selected video segments or can be a collection of key frames. Different methods can be used to select key frames. Three different approaches for key frame extraction based video summarization were studied and for the proposed work following assumption were considered. Classification based on sampling It chooses key frames uniformly or randomly under-sampling, without considering the video content. The summary produced by these methods does not represent all the video parts and may cause some redundancy of key frames with similar contents. Classification based on scene segmentationIt extracts key frames using scenes detection, the scene includes all parts with a semantic link in the video or in the same space or in the same time. The disadvantage of these techniques is producing a summary, which does not take into account the temporal position of frames. Classification based on shot segmentationIt extracts adapted key frames to video content. They extract the first image as shot key frames or the first and the last frames of the shot. These methods are effective for stationary shot and small content variation, but they donâ€&#x;t provide an adequate representation of shot with strong movements.
III. RELATED WORK The survey of video content summarization using key frame extraction has been done and few of the findings can be discussed as follows. Unsupervised clustering based on a statistical model uses clusters for representation of key frames. [7] The clustering technique divides the frames into clusters within a shot, and then a key frame is selected from each candidate cluster. To make the algorithm independent of video sequences, it employs a statistical model to calculate the clustering threshold. It can capture the important yet salient content as the key frame. A new frame to an existing cluster is assigned, if it is similar enough to the centroid of that cluster. If the computed similarity is lower than the pre-specified threshold, a new cluster is formed around the current frame. As a result, the number of key frames is determined by the number of clusters and the frame closest to the centroid of a key cluster is extracted as a key frame. Page | 16
International Journal of Engineering Research (IJOER)
[Vol-1, Issue-2, April.- 2015]
An optimized key-frames extraction scheme based on Singular Value Decomposition (SVD) and Correlation Minimization was proposed in [8] for key frames extraction of video sequences was proposed in [8], which leads to selection of a meaningful set of video frames for each given shot. Initially for each frame the Singular Value Decomposition method was applied and a diagonal matrix was produced, containing the singular values of the frame. Afterwards, a feature vector was created for each frame, by gathering the respective singular values. Next all feature vectors of the shot were collected to form the feature vectors basin of this shot. Finally a Genetic Algorithm (GA) approach was proposed and applied to the vectors basin, for locating frames of minimally correlated feature vectors, which were selected as key frames. In research Perceived Motion Energy model (PME)[ ], motion was used as salient feature. In this work a triangle model of perceived motion energy (PME) was proposed to model motion patterns in video and a scheme to extract key frames based on this model. With this model, a video shot is segmented into sub segments of consecutive motion patterns in term of acceleration and decelerations. The frames at the turning point of the motion acceleration and motion deceleration are selected as key frames. The suggested key frame extraction process is threshold free and fast since the motion information in MPEG can be directly utilized in motion analysis, while the key frames are representative. The approach combines motion based temporal segmentation and color based shot detection. The turning point of motion acceleration and deceleration of each motion pattern is selected as key frame. Key frames concept is used to abstract a shot. It is assumed that video motion is the more salient feature in presenting actions or events in video and thus should be the feature to determine key frames. The key frame is an effective form of summarizing a long video sequence. Motion is the more salient feature in presenting actions or events in video and, thus, should be the feature to determine key frames. In Discrete Wavelet Transform (DWT) [9], transformation technique was introduced for key frame extraction. In this process of extracting key frames, two consecutive frames are firstly transformed using Discrete Wavelet transform (DWT) and then the differences of the detail components of them are estimated. If difference value of a consecutive pair is greater than threshold, the last frame of the pair is considered as a key frame. After the detail study of all these literature survey, the first way of measuring visual content complexity of a video shot is done by sequential comparison of color. In [10], key frames were selected in a sequential fashion for each shot by computing color histogram difference between current frame and the last extracted key frame. This idea of sequential comparison was also extended by using the information of dominant or global motion. Many of the existing works are designed to handle certain kinds of key frame extraction with clustering and classification to increase retrieval accuracy, which usually leads to a large computational overhead and requires more time to respond to a normal video query on a database comprising thousands of videos. As such there is no any standard or efficient technique defined for key frame extraction in video summarization. All of researchers has made certain assumptions and carried the work as per the applications requirement. In few of the cases the time matters but the space is not the issues. Define Different techniques for key frame extraction proposed by different algorithm comparison are shown in following table. TABLE I.
COMPARISON OF DIFFERENT PAPER FOR KEY FRAME EXTRACTION
Paper 1 [11]
Paper 2 [12]
Paper 3 [13]
Paper 4 [14]
Paper 5 [15]
Feature
Representation, Uniqueness
Color feature
Transform
Frame Information
Inter and Intra Clip Mode
Algorithm
Clustering
TSTBTC
DCT
K-Means
Reinforcement
Summary Length
Specified
Not Specified
Not Specified
Not Specified
Not Specified
Space Need To store
High
Normal
Normal
High
Low
Similarly Measure Used
MSE
Seven types Similarity Used
MSE
-
-
Video Summarizatio n Evaluation Criteria
Static
Dynamic
Dynamic
Static
Static
Precision
Completeness
Completeness
-
-
IV. CONCLUSION Video summarization plays important role in many video applications. A survey on various methods for key frame based video summarization has been carried out. But there is no any universally accepted method available for video summarization that gives better output in all kinds of videos. The summarization viewpoint and perspective are often Page | 17
International Journal of Engineering Research (IJOER)
[Vol-1, Issue-2, April.- 2015]
application-dependent. The semantic understanding and its representation are the biggest issues to be addressed for incorporating diversities in video and human perception. Depending upon the changes in contents of the video, the key frames are extracted. As the key frames need to be processed for summarization purpose, the important contents must not be missed.
REFERENCES [1] Z. Xiong, X. S. Zhou, Q. Tian, Y. Rui, and H. TS, “Semantic retrieval of video - review of research on video retrieval in meetings, movies and broadcast news, and sports,” IEEE Signal Processing Magazine, vol. 23, no. 2, pp. 18–27, march 2006. [2] V. Valdes and J. Martinez, “Efficient video summarization and retrieval tools,” in 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI), june 2011, pp. 43–48. [3] M. Furini, F. Geraci, M. Montangero, M. Pellegrini, a1, a2, and a3, “Stimo: Still and moving video storyboard for the web scenario,” Multimedia Tools Appl., vol. 46, no. 1, pp. 47–69, Jan. 2010. [4] Edward J. Y. Cayllahua Cahuina, Guillermo Camara Chavez ," A New Method for Static Video Summarization Using Local Descriptors and Video Temporal Segmentation", IEEE Conference on Graphics, Patterns and Images (SIBGRAPI)XXVI. 2013, pages 226 - 233. [5] Guozhu Liu, and Junming Zhao,” Key Frame Extraction from MPEG Video Stream”, Proceedings of the Second Symposium International Computer Science and Computational Technology(ISCSCT ’09) Huangshan, P. R. China, 26-28,Dec. 2009, pp. 007011. [6] A.V.Kumthekar, Mrs.J.K.Patil, “Key frame extraction using color histogram method”, International Journal of Scientific Research Engineering & Technology (IJSRET)Volume 2 Issue 4 pp 207-214 July 2013 www.ijsret.org ISSN 2278 – 0882. [7] Jasmeet Kaur, Rohini Sharma, “A Combined DWT-DCT approach to perform Video compression base of Frame Redundancy”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 2, Issue 9, September 2012 ISSN: 2277 128X. [8] Jianping Fan, Department of Computer Science,”Existing CBIR/CBVR Systems”, University of North Carolina at Charlotte, NC 28223, http://www.cs.uncc.edu/~jfan. [9] Khin Thandar Tint, Dr. Kyi Soe, “Key Frame Extraction for Video Summarization Using DWT Wavelet Statistics “, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), Volume 2, No 5, May 2013. [10] Kalpana Thakre , Archana Rajurkar and Ramchandra Manthalkar, “An Effective CBVR System Based On Motion, Quantized Color “ Science & Information Technology (IJCSIT), Vol 3, No 2, April 2011. [11] Shayok Chakraborty, Omesh Tickoo and Ravi Iyer," Adaptive Key frame Selection for Video Summarization" 2015 IEEE Winter Conference on Applications of Computer Vision [12] Dr. Sudeep D. Thepade, Pritam H. Patil," Novel visual content summarization in videos using key frame extraction with Thepade’s sorted ternary block truncation coding and assorted similarity measures" 2015 International Conference on Communication, Information & Computing Technology (ICCICT), Jan. 16-17, Mumbai, India [13] Dr. Sudeep D. Thepade, Ashwini Tonge," Extraction Of Key Frames from Video Using Discrete Cosine Transform", 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT). [14] Huayong Liu, Huifen Hao," Key Frame Extraction Based on Improved Hierarchical Clustering Algorithm" [15] Carles Ventura, Xavier Giro-i-Nieto, Veronica Vilaplana, Daniel Giribety and Eusebio Carasusany," Automatic Keyframe Selection based on Mutual Reinforcement Algorithm", 2013 11th international workshop on content-based multimedia indexing (cbmi) 17 – 19. June 2013 veszpr´em, hungary.
Page | 18