GRD Journals | Global Research and Development Journal for Engineering | International Conference on Innovations in Engineering and Technology (ICIET) - 2016 | July 2016
e-ISSN: 2455-5703
An Efficient Extreme Learning Machine based Intrusion Detection System 1W.
Sylvia Lilly Jebarani 2K. Janaki 3R. Anupriya 1 AP- Senior Grade 2,3UG Scholar 1,2,3 Department of Electronics and Communication Engineering 1,2,3 Mepco Schlenk Engineering college, Sivakasi, India Abstract This paper presents an intrusion detection technique based on online sequential extreme learning machine. For performance evaluation, KDDCUP99 dataset is used. In this paper, we use three feature selection techniques – filtered subset evaluation, CFS subset evaluation and consistency subset evaluation to eliminate redundant features. Two network traffic profiling techniques are used. Alpha profiling is done to reduce time complexity and beta profiling is used to remove redundant connection records and hence reduce the size of dataset Keyword- Network traffic profiling, OS-ELM __________________________________________________________________________________________________
I. INTRODUCTION In recent years of advanced technologies, networks are facing many threats. One among them is intrusion. It affects networks by consuming more bandwidth and other resources. Thus the need of this hour is detecting intrusions. It can be done by analyzing the network traffic dataset. But it is difficult to process large dataset. So network traffic behavior can be used for intrusion detection. The proposed technique considers issues like hugeness of dataset, low accuracy and time complexity. OS-ELM processes network traffic dataset to detect intrusions. It is fast and accurate in classification. The previous intrusion detection techniques use support vector machines for classification. It has the inability of classifying new type of connection records for which it is not trained. In this proposed technique, we use extreme learning machine for classification. It is trained by using training dataset and it learns itself and classifies new type of connection records. The standard KDDCUP99 dataset is used for performance evaluation of this proposed technique. It has about 5 million connection records. Three feature selection techniques are used to remove redundant features which reduce the accuracy of the classifier. The techniques are filtered subset evaluation, CFS subset evaluation and consistency subset evaluation. By selecting appropriate features, the accuracy of classification is improved. 10 fold cross validation technique is used to divide the dataset into training and testing dataset. The dataset is divided into 10 sets and 10 iterations are done. Every time, one set is used for testing and 9 for training. This is repeated 10 times. Alpha profiling is a network traffic profiling technique which reduces the time complexity by grouping connection records which have same protocol type and service into a single alpha feature. This reduces the time complexity of the classifier. Some of the advantages of using alpha profiling are increased scalability, load balancing and handling unknown profiles. Beta profiling is another network traffic profiling technique which reduces the size of dataset. The similar connections records are grouped together by using a clustering algorithm namely DBSCAN and the centre of these clusters are combined and used as dataset. Online sequential extreme learning machine classifier is used for classification. It is fast and accurate compared to other previously used classifiers. This classifier detects intrusion in the network.
II. DATASET DESCRIPTION 25,000 connection records were chosen from the KDDCup99 dataset. The dataset consists of 41 features and one class label. The class label indicates whether the record is normal or anomalous. The features are as shown in Table 1.
All rights reserved by www.grdjournals.com
297
An Efficient Extreme Learning Machine based Intrusion Detection System (GRDJE / CONFERENCE / ICIET - 2016 / 048)
Table 1: List of features in KDDCup99 Dataset
III. METHODOLOGY Fig.1 shows methodology of the proposed system. The experiment is carried out using MatLab(version R2014a) and Weka data mining tool. The blocks involved are explained below: A. Dataset Pre-processing The KDDCup99 dataset contains both categorical and continuous features. Since classifier cannot compute categorical features, the dataset must contain only continuous features.
All rights reserved by www.grdjournals.com
298
An Efficient Extreme Learning Machine based Intrusion Detection System (GRDJE / CONFERENCE / ICIET - 2016 / 048)
Fig. 1: Proposed Intrusion Detection System
B. Feature Selection Space and time complexity can be reduced by feature selection technique. It is carried out using Weka tool. After analysis with different evaluation techniques, it is found that Filtered subset evaluation, consistency subset evaluation, CFS subset evaluation are the three feature selection techniques that provides optimal subset of features. By reducing the number of features, accuracy is increased. C. Cross – Validation 10-fold cross validation technique is used in this system. The whole dataset is divided randomly into 10 parts. Here, 9 parts are used for training and the remaining one part is considered for testing. Maximum error estimation is done using 10-fold crossvalidation. D. Alpha Profiling Based on the protocol and service features of connection records, profiles are created. This process is called Alpha profiling. Combination of a protocol and service is termed as an alpha feature. Connection records are separated based on each feature and the groups are called Alpha profiles. The main advantages of this process are as below: Increased scalability and load balancing. Reduces number of comparisons Efficient handling of unknown profiles Reduces protocol service imbalances.
All rights reserved by www.grdjournals.com
299
An Efficient Extreme Learning Machine based Intrusion Detection System (GRDJE / CONFERENCE / ICIET - 2016 / 048)
E. Beta Profiling IDS use more detection time for processing the large dataset. In order to rectify this problem, beta profiling is introduced. This process is also called Sample reduction process. This process also reduces memory requirement and time complexity. ‘Density based clustering of applications with noise (DBSCAN)’, a clustering algorithm is used for the implementation of this process. This groups similar connections removing redundant records resulting in quality samples only. F. OS-ELM: In our proposed methodology, OS-ELM classifier is used for intrusion detection. It overcomes the slow learning limitation of other classifiers. It is capable of solving several classification problems and process large dataset in a very less time. G. Result Aggregation: The results obtained from every process are analysed and compared with other existing classifier performances.
IV. RESULTS Results of each experiment is presented in this section and analysed as given below: A. Pre-processing In this process, the nominal values were first converted into numerical values and then normalized. This was performed using Matlab. B. Feature Selection The output of pre-processing is fed into Weka data mining tool for feature selection. Three techniques mentioned above were used. The result of this process is as shown in Table 2.
Table 2: List of Selected features
C. Alpha Profiling Alpha profiling is done using Matlab code. The output file consisted of 17 alpha profiles. The list of alpha profiles is as shown below:
All rights reserved by www.grdjournals.com
300
An Efficient Extreme Learning Machine based Intrusion Detection System (GRDJE / CONFERENCE / ICIET - 2016 / 048)
D. Beta Profiling Using Matlab, the normal and anomalous connections records were separated. Then, based on DBSCAN algorithm, parameters (distance threshold and minimum number of connections) are set and thus beta profiles are created. After beta profiling, the size of the dataset got reduced by 10%.
V. PERFORMANCE ANALYSIS The results obtained were fed into Support Vector Machine (SVM) and Sequential minimal optimization(SMO). The accuracy values obtained are as shown in Table 3.
Table 3: Performance comparison using SVM and SMO classifiers
From the analysis of the tabulated results, the inferences are summarized as below: 1) After feature selection process, the dimensions have reduced by 52.38% of the original dataset. Thus, time consumption is reduced. 2) From Table 1 it is found that accuracy is retained the same after alpha profiling but it accounts for reducing the number of comparisons. 3) Accuracy is increased after beta profiling process. It reduces the size of the dataset thus reducing time and space complexity.
REFERENCES [1] Adetunmbi A.Olusola., Adeola S.Oladele. and Daramola O.Abosede, “Analysis of KDD ’99 Intrusion Detection Dataset for Selection of Relevance Features,” World Congress on Engineering and Computer Science 2010, Vol I. [2] Singh.R.,Kumar, H., &Singla. R. K(2014), “TOPSIS based multi-criteria decision making of feature selection techniques for network traffic dataset”, International Journal of Engineering and Technology,5(6),4598-4604. [3] Chia-Ming Wang, Yin-Fu Huang, “Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data”, April 2009. [4] Ester,M., Kriegel, H.P., Sander,J.,&Xu,X.(1996),”A Density-based algorithm for discovery clusters in large spatial databases with noise”, Second International Conference on Knowledge Discovery and data Mining(pp.226-231). [5] S. Revathi., A. Malathi., “A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection”, International Journal of Engineering & Technology, Vol. 2. [6] KDDCup dataset (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. [7] Matlab Language of Technical Computing (2014). http://in.mathworks.com/products/matlab/. [8] Weka 3.6.9: Data Mining Software, http://www.cs.waikato.ac.nz/ml/weka/.
All rights reserved by www.grdjournals.com
301