Paper id 26201461

Page 1

International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637

A Novel Approach for Detection of Human in Images Using IK-SVM Naveen M Dandur1, Satish Naik 2, Veerendra T M3, Pruthviraja C B 4 M Tech, Digital Electronics 1, M Tech, Digital Electronics 2, M Tech, Digital Electronics 3, M Tech, Digital Electronics 4, Student 1, Student 2, Student 3, Student 4 Email: naveend.026@gmail.com1, naik.satish88@gmail.com2 Abstract- Human detection is a challenging classification problem which has many potential applications including monitoring pedestrian junctions, young children in school and old people in hospitals, and several security, surveillance and civilian applications. Various approaches have been proposed to solve this problem. We have studied and implemented a scheme using Histogram of Oriented Gradients (HOG). The INRIA Person dataset was used for training and testing the classifier. The proposed method is implemented using IK-SVM classifier. On the INRIA pedestrian dataset an approximate L-SVM classifier based on these features has the current best performance. The user friendly Graphical User Interface is designed for this proposed method. Index Terms- IK-SVM classifier, HOG feature extraction, GUI.

1. INTRODUCTION Human Detection is a key problem for a number of application domains, such as intelligent vehicles, surveillance, and robotics. Notwithstanding years of methodical and technical progress and it is still a difficult task from a machine-vision point of view. There is a wide range of human appearance arising from changing articulated pose, clothing, lighting, and in the case of a moving camera in a dynamic environment ever-changing backgrounds. Explicit models to solve the problem are not readily available, so most research has focused on implicit learningbased representations. Many interesting human classification approaches have been proposed; an overview is given in Section II. Most approaches follow a two-step approach involving feature extraction and pattern classification. In recent years, a multitude of (more or less) different feature sets has been used to discriminate humans from non-human images. Most of those features operate on intensity contrasts in spatially restricted local parts of an image. As such, they resemble neural structures which exist in lower level processing stages of the human visual cortex. In human perception, however, depth and motion are important additional cues to support object recognition. In particular, the motion flow field and surface depth maps seem to be tightly integrated with spatial cues, such as shape, contrasts, or color. With a few exceptions (see Section II), most spatial features used in machine vision for object classification are based on intensity cues only. If used at all, depth and motion cues merely provide information about scene geometry or serve as a selection mechanism for regions of interest in segmentation rather than a classification context. In this paper, feature extraction technique called HOG (Histogram of Orientations of Gradients) is used. Discriminative approaches to recognition problems often depend on comparing distributions of features,

e.g. a kernelized SVM, where the kernel measures the similarity between histograms describing the features. In order to evaluate the classification function, a test histogram is compared to histograms representing each of the support vectors. This paper presents a classifier method to greatly speed up that process for histogram comparison functions of a certain form– basically where the comparison is a linear combination of functions of each coordinate of the histogram. 2. PREVIOUS WORK Human classification has attracted a significant amount of interest from the research community over the past years. A human classifier is typically part of an integrated system involving a preprocessing step to select initial object hypotheses and a post processing step to integrate classification results over time (tracking). The classifier itself is the most important module. Its performance accounts for the better part of the overall system performance and the majority of computational resources are spent here. Most approaches for human classification follow a discriminative scheme by learning discriminative functions (decision boundaries) to separate object classes within a feature space. Prominent features can be roughly categorized into texture- based and gradient-based. Non adaptive texture-based Haar wavelet features have been popularized and used by many others. Recently, local binary pattern (LBP) features have also been employed in pedestrian classification. The particular structure of local texture features has been optimized in terms of local receptive field (LRF) features, which adapt to the underlying data during training. Other texture-based features are codebook patches, extracted around interest points in the image and linked via geometric relations. Gradient-based features have focused on

212


International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637 discontinuities in image brightness. Normalized local histograms of oriented gradients have found wide use in both sparse (SIFT) and dense representations [histograms of oriented gradients (HOG)]. Spatial variation and correlation of gradients have been encoded using covariance descriptors enhancing robustness towards brightness variations. However, others have proposed local shape filters exploiting characteristic patterns in the spatial configuration of salient edges. Some of the presented spatial filters have been extended to the spatio-temporal domain by means of intensity differences over time or optical flow. Regarding pattern classifiers, support vector machines (SVMs) have become very popular in the domain of human classification, in both linear and nonlinear variants. However, performance boosts resulting from the nonlinear model are paid for with a significant increase in computational costs and memory. Recent work presented efficient versions of nonlinear SVMs for a specific class of kernels. Other popular classifiers include neural networks and boosted classifiers. In the past years, many novel feature and classifier combinations were proposed to improve classification performance, along with corresponding experimental studies and benchmarks. Orthogonal to such lower level performance boosts are improvements coming from higher level methods based on the fusion of multiple classifiers. Several approaches have attempted to break down the complexity of the problem into subparts. One way is to represent each sample as an ensemble of components which are usually related to body parts. After detecting the individual body parts, detection results are fused using statistical models, learning or voting schemes or heuristics. Beside component-based approaches, multi-orientation models are relevant to current work. Here, local pose-specific clusters are established, followed by the training of specialized classifiers for each subspace. The final decision of the classifier ensemble involves maximum selection, trajectorybased data association, shape-based combination, or a fusion classifier. A recent trend in the community involves the combination of multiple features or modalities, e.g., intensity, depth and motion. While some approaches utilize combinations on the module level, others integrate multiple information sources directly into the pattern classification step. To the best of our knowledge, our work in presented the first use of appearance, motion, and stereo features for human classification. A similar approach was recently presented in. Some approaches combine features in the intensity domain using a boosted cascade classifier or multiple kernel learning. One approach combines HOG, covariance, and edge let features in the intensity domain into a boosted heterogeneous cascade classifier with an explicit optimization with regard to runtime. Others integrate intensity and flow features by boosting or by concatenating all features

into a single feature vector which is then passed to a single classifier. The work was recently extended to additionally include depth features. A joint feature space approach to combine HOG and LBP features was used in presents the integration of HOG features, co-occurrence features and color frequency descriptors into a very high-dimensional 170 000 dimensions joint feature space in which classical machine learning approaches are intractable. Hence, partial least squares is applied to project the features into a subspace with lower dimensionality which facilitates robust classifier learning. Boosting approaches require mapping the multidimensional features to a single dimension, either by applying projections or treating each dimension as an individual feature. An alternative is the use of more complex weak learners that operate in a multidimensional space, e.g., support vector machines. In contrast, to utilize fusion on the classifier level by training a specialized classifier for each cue. The work use a single feature (HOG) in two intensity and depth) and three different modalities (intensity, depth, and motion), respectively. The work involves a combination of two features (HOG and LRF) with a single modality (intensity). Finally, the work in presents a classifier-level combination of two features, where each feature operates in a different modality (HOG/intensity and LRF/depth). Classifier fusion is done using fuzzy integration, simple classifier combination rules, or a Mixture-of-Experts framework. 3.

METHODOLOGY

This section gives an overview of our feature extraction Process which is summarized in fig. 1. In this work proposed a Intersection Kernel SVM (IKSVM) classifier and model is prepared using this method as shown in fig. 2.the human detection model is shown in fig.3. 3.1 Feature extraction HOG features, proposed by Dalal and Triggs, are adopted for our application. As shown in Figs. 2(a)– (d), a sample of 64 × 128 pixels is divided into cells of size 8 × 8 pixels, each group of 2 × 2 cells is integrated into a block in a sliding fashion, and blocks overlap with each other. To extract HOG features, we firstly calculate the gradient orientations of the pixels in the cells. Then in each cell, we calculate a 9dimensional histogram of gradient orientations as the features. Each block is represented by a 36-dimensional feature vector, which is normalized by dividing each feature bin with the vector module. Each sample is represented by 105 blocks (420 cells), corresponding to a 3780-dimensional HOG feature vector.

213


International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637

Normalize Gamma &color

Input Image

Compute gradients

Contrast normalize over overlapping spatial blocks

Weighted vote into spatial & Orientation cells

Fig.1.HOG feature extraction

(a)

(b)

(c)

A Test image

Collect HOG’s over detection window HOG Features

IK-SVM classifier

Multi-scale sliding window Integration of positives

HOG Feature extraction

Detected humans

Fig.4.Human detection 3.4 Datasets. We tested our detector on INRIA data set. INRIA containing 1805, 64Ă—128 images of humans cropped from a varied set of personal photos. Fig. 5 shows some samples. The people are usually standing, but appear in any orientation and against a wide variety of background image including crowds. Many are bystanders taken from the image backgrounds, so there is no particular bias on their pose. The database is available from http://lear.inrialpes.fr/data for research purposes.

(d)

Fig.2. HOG feature extraction. (a) Human example. (b) HOG cells. (c) HOG feature extraction in block. (d) Visualization of HOG features multiplying with IK-SVM norm vector. 3.2 IK-SVM model with detection. Given a set of training samples, from these samples train a Intersection Kernel SVM model with HOG features. The block diagram is shown in fig.3.Which summarizes the linear SVM training model. In this paper applied histogram equalization and median filtering of radius equal to 3 pixels on the test image as a pre-processing. The test image is repeatedly reduced in size by a factor of 1.1 resulting in an image pyramid. Sliding windows extracted from each layer of the pyramid. In each window, the hog features are extracted and tested with the IK- SVM to decide whether it is a human or not. Human detection block diagram is represented in fig.4. Training images

IK-SVM Training

HOG Feature extraction Sample division into subsets Iteration

Initial sample division

IK-SVM model

Fig.5. Sample images from INRIA dataset. 4. RESULTS AND ANALYSIS The proposed human detection algorithm was tested on the INRIA dataset to illustrate its performance on a benchmark dataset. There are very few data samples available at very small scales of the input image (corresponding to very large pedestrians in the input images) to model the enclosing hyper sphere of the normalcy class. So the data samples from the smallest eight scales of each input image are grouped together before modeling the normalcy class, and then the

214


International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637 anomalies (pedestrians) are obtained for these eight scales together. A subset of the INRIA dataset consisting of 230 images with sizes 480 × 640 and 640 × 480 are used to test the proposed algorithm. Fig.6 shows the final bounding boxes in the images representing the human detections. As shown in this figure, the proposed algorithm is capable of detecting human in urban and rural scenes. However, the number of false alarms appears to be higher in urban scenes, as exemplified in figures 4a, d, and f. This observation is due to the fact that some of the detection windows in the urban scenes have local spatial structures that are quite different from the majority of the image. So, these windows are deemed to be anomalies along with humans. At present, the detection rate of the proposed algorithm is around 74% at 1 false alarm per 4 images. The false alarm rate will drop sharply in rural scenes with less clutter.

Fig.6. Human detected output obtained for set of input images from INRIA dataset. 5.

CONCLUSION

In this paper, we have developed a human detection algorithm using Intersection Kernel SVM (IK-SVM). The only prior information used is an average human HOG template. Using this template, a distance feature vector is extracted for each detection window and used in IK-SVM modeling. By setting the upper limit on the number of outliers, the windows containing human are detected as anomalies during the modeling stage. The performance of the algorithm is demonstrated using a benchmark dataset from INRIA. Even though the algorithm generates more false alarms compared to some human detection techniques, it has shown great potential in detecting human. Our future work includes reducing the number of false alarms, as well as dealing with large number of humans in an image. Research work on different distance metrics to calculate the feature vectors will also be performed in the near future. Acknowledgement It is a pleasure to recognize the many individual who have helped me in completing this technical paper. Mrs. Venkata Sumana C H and Mr Shivakumar.B.R (Asst.Professor, GMIT Davangere) for all the technical guidance, encouragement and analysis of the data throughout this process. REFERENCE [1] Qixiang Ye, Zhenjun Han, Jianbin Jiao, and Jianzhuang Liu, Senior Member, IEEE, “Human Detection in Images via Piecewise Linear Support Vector Machines”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 2, FEBRUARY 2013. [2] Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection”, September 2001. [3] Y. Xu, D. Xu, S. Lin, T. X. Han, X. Cao, and X. Li, “Detection of sudden pedestrian crossings for driving assistance systems,” IEEE Trans. Circuits Syst. Video Technol., vol. 42, no. 3, pp. 729–739, Jun. 2008. [4] Q. Zhu, S. Avidan, M. Yeh, and K. Cheng, “Fast human detection using a cascade of histograms of

215


International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637 oriented gradients,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., Jul. 2006, pp. 1491–1498. [5] M. Enzweiler and D. M. Gavrila, “Monocular pedestrian detection: Survey and experiments,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 12, pp. 2179–2195, Dec. 2009. [6] A. Bar-Hillel, D. Levi, E. Krupka, and C. Goldberg, “Part-based feature synthesis for human detection,” inProc. 11th Eur. Conf. Comput. Vis., Sep. 2010, pp. 127–142. [7] H. Cevikalp, M. Neamtu, M. Wilkes, and A. Barkana, “Discriminative common vectors for face recognition,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 1, pp. 4–13, Jan. 2005. [8] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” inProc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., Jun. 2005, pp. 886–893 [9] S. Maji, A. Berg, and J. Malik, “Classification using intersection kernel support vector machines is efficient,” inProc. IEEE Int. Conf. Comput.Vis. Pattern Recognit., Jun. 2008, pp. 1–8. [10] S. Maji, A. C. Berg, and J. Malik, “Efficient classification for additivekernel svms,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1,pp. 66–77, Jan. 2013

216


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.