IJIRST –International Journal for Innovative Research in Science & Technology| Volume 3 | Issue 03 | August 2016 ISSN (online): 2349-6010
Improved Algorithm for Network Intrusion Detection System based on K-Nearest Neighbour: Survey Jyotika Gupta M. Tech. Student Department of Computer Science & Engineering JSS Academy of Technical Education, Noida, India
Krishna Nand Chaturvedi Assistant Professor Department of Computer Science & Engineering JSS Academy of Technical Education, Noida, India
Abstract From the onset of web arrangement, protection menaces normally recognized as intrusions has come to be extremely vital and critical subject in web arrangements, data and data system. In order to vanquish these menaces every single period a detection arrangement was demanded because of drastic development in networks. Because of the development of arrangement, attackers came to be stronger and every single period compromises the protection of system. Hence a demand of Intrusion Detection arrangement came to be extremely vital and vital instrument in web security. Detection and prevention of such aggressions shouted intrusions generally depends on the skill and efficiency of Intrusion Detection Arrangement (IDS).Therefore countless ensemble mechanism has been counseled by employing countless methodologies’, these methodologies have their own benefits and short comings. In this paper we will focus on different classification techniques. Keywords: Intrusion Detection, Anomaly Detection, Misuse Detection, KDD Cup 99, Ensemble Approaches _______________________________________________________________________________________________________ I.
INTRODUCTION
In the past two decades alongside the quick progress in the Internet established knowledge, new request spans for computer web have emerged. At the alike period, expansive range progress in the LAN and WAN request spans in company, commercial, industry, protection and healthcare sectors made us extra reliant on the computer networks. All of these request spans made the web an appealing target for the mistreatment and a large vulnerability for the community. A fun to do job or a trial to accomplish deed for a little people came to be a nightmare for the others. In countless cases malicious deeds made this nightmare to come to be a reality. In supplement to the hacking, new entities like worms, Trojans and viruses gave extra panic into the net- worked society. As the present situation is a moderately new phenomenon, web armaments are weak. Though, due to the popularity of the computer webs, their connectivity and our ever producing dependency on them, realization of the menace can have desecrating consequences. Safeguarding such a vital groundwork has come to be the priority one scrutiny span for countless researchers. Aim of this paper is to study the present trends in Intrusion Detection Arrangements (IDS) and to examine a little present setback that continue in this scrutiny area. In analogy to a little mature and well stayed scrutiny spans, IDS is a youthful earth of research. Though, due to its duty critical nature, it has enticed momentous attention towards itself. Density of scrutiny on this subject is constantly rising and everyday extra researchers are involved in this earth of work. The menace of a new wave of cyber or web aggressions is not just a probability that ought to be believed, but it is a consented fact that can transpire at each time. The present trend for the IDS is distant from a reliable protective arrangement, but instead the main believed is to make it probable to notice novel web attacks. One of the main concerns is to make sure that in case of an intrusion endeavor, the arrangement is able to notice and to report it. After the detection is reliable, subsequent pace should be to protect the web (response). In supplementary words, the IDS arrangement will be upgraded to an Intrusion Detection and Reply Arrangement (IDRS). Though, no portion of the IDS is presently at a fully reliable level. Even nevertheless researchers are concurrently involved in working on both detection and answer factions of the system. A main setback in the IDS is the promise for the intrusion detection. This is the reason why in countless cases IDSs are utilized jointly alongside a human expert. In this method, IDS is truly helping the web protection captain and it is not reliable plenty to be trusted on its own. The reason is the in- skill of IDS arrangements to notice the new or modified attack patterns. Even though the latest creation of the detection methods has considerably enhanced the detection rate, yet there is a long method to go. There are two main ways for noticing intrusions, signature-based and anomaly-based intrusion detection. In the early way, attack outlines or the deeds of the intruder is modeled (attack signature is modeled). Here the arrangement will gesture the intrusion after a match is detected. Though, in the subsequent way nor- mal deeds of the web is modeled. In this way, the arrangement will rise the alarm after the deeds of the web does not match alongside its normal behavior. There is one more Intrusion Detection (ID) way that is shouted specification-based intrusion detection. In this way, the normal deeds (expected
All rights reserved by www.ijirst.org
81
Improved Algorithm for Network Intrusion Detection System based on K-Nearest Neighbour: Survey (IJIRST/ Volume 3 / Issue 03/ 015)
behavior) of the host is enumerated and subsequently modeled. In this way, manage worth for the protection, freedom of procedure for the host is limited. In this paper, these ways will be briefly debated and compared. The believed of possessing an intruder accessing the arrangement lacking even being able to notice it is the worst nightmare for each web protection officer. As the present ID knowledge is not precise plenty to furnish a reliable detection, heuristic methodologies can be a method out. As for the last line of protection, and in order to cut the number of undetected intrusions, heuristic methods such as Honey Jars (HP) can be deployed. HPs can be installed on each arrangement and deed as mislead or decoy for a resource. Another main setback in this scrutiny span is the speed of detection. Computer webs have a vibrant nature in a sense that data and data inside them are unceasingly changing. Therefore, noticing an intrusion precisely and punctually, the arrangement has to work in real time. Working in real period is not just to per- form the detection in real period, but is to change to the new dynamics in the network. Real period working IDS is an alert scrutiny span pursued by countless researchers. Most of the scrutiny works are aimed to familiarize the most period effectual methodologies. The aim is to make the requested methods suitable for the real period implementation. From a disparate outlook, two ways can be envisaged in requesting an IDS. In this association, IDS can be whichever host established or web based. In the host established IDS, arrangement will merely protect its own innate ma- chine (its host). On the supplementary hand, in the web established IDS, the ID procedure is somehow distributed alongside the net- work. In this way whereas the agent established knowledge is extensively requested, a distributed arrangement will protect the web as a whole. In this design IDS could manipulation or monitor web firewalls, web routers or web switches as well as the client machines. The main emphasis of this paper is on the detection portion of the intrusion detection and reply problem. Re- searchers have pursued disparate ways or a combination of disparate ways to resolve this problem. Every single way has its own theory and presumptions. This is so because there is no precise behavioral ideal for the legitimate user, the intruder or the web itself. II. LITERATURE SURVEY Alexander J. Gibberd The paper proposes an intrusion detection model using chi-square feature selection and multi class support vector machine. A parameter tuning technique is adopted for optimization of RBF kernel parameter gamma and over fitting constant ‘C. The proposed model results in high detection rate and low false alarm rates in comparison to other traditional approaches. Antonis Papadogiannakis The paper proposed a method of intrusion detection using SVM which can reduce the time required to build model for classification and increase the intrusion detection accuracy when Gaussian RBF kernel is used. When data sets are properly processed and proper SVM kernel is selected i.e. Radial Basis Function (RBF), it can overcome the drawback of SVM i.e. extensive time required to build model. Asma Gul Combining more than one data mining algorithms may be used to remove disadvantages of one another. Thus a combining approach has to be made while selecting a mode to implement intrusion detection system. Combining a number of trained classifiers lead to a better performance than any single classifier. Kalyanmoy Deb Amrit Pratap The paper proposed a new methodology based on GFS and pair wise learning for the development of robust and interpretable IDS. The paper make use of a multi-objective fuzzy model (MOGFIDS), three different GFS schemes developed by Abadeh et al., and a genetic approach for boosting fuzzy association rules. The application of this ‘‘divide-and-conquer’’ strategy improves the individual accuracy for the different classes of the problem, which is reflected on the high value for the average accuracy metric. Nikos Tsikoudis The key idea of LEoNIDS is to process with higher priority the first few bytes of each flow, which have a higher probability to carry an attack, to achieve lower latency for them and faster attack detection. The paper proposed two alternative techniques: time sharing, which uses a typical priority queue scheduling, and space sharing, which uses dedicated cores with low utilization to process high-priority packets. Reema Patel The paper proposed a new algorithm (CSVAC) for generating classifiers with clustering, and applied it to the intrusion detection problem. This approach combines two existing machine learning methods (SVM and CSOACN) to achieve better performance in both detection accuracy rate and faster running time. Robert Mitchell The paper present selective packet discarding, a best approach that gracefully reduces the amount of traffic that reach the detection engine of the NIDS by selectively discarding packets that are less likely to affect its detection accuracy. This is achieved by setting a cutoff limit to the number of packets to be inspected for each network flow. Salma Elhag The paper presents a novel feature representation approach that combines cluster centers and nearest neighbors for effective and efficient intrusion detection, namely CANN. The CANN approach first transforms the original feature representation of a given dataset into a one-dimensional distance based feature. Then, this new dataset is used to train and test a k-NN classifier for classification. Sumaiya Thaseen The paper put forward a new algorithm called S-K, where the combination of SOM neural network and Kmeans algorithm is running to detect the abnormity of the nodes in the wireless sensor network, which will make the system more flexible, precise and easier to implement. Wang Huai-bin The paper presented a two-stage method for learning dynamic GGM alongside change points in the graphical structure. This method, based on two-stage regularization has comparable computational efficiency with existing dynamic
All rights reserved by www.ijirst.org
82
Improved Algorithm for Network Intrusion Detection System based on K-Nearest Neighbour: Survey (IJIRST/ Volume 3 / Issue 03/ 015)
programming based methods. From an application point of view they have demonstrated how relational structures can be uncovered when modeling network traffic data which may have potential uses to build improved attack filters for IDS. Wei-Chao Lin The paper proposed simple yet effective IDS, which can be easily implemented in the secondary users’ cognitive radio software .The proposed IDS uses non-parametric cusum algorithm, which offers anomaly detection. By learning the normal mode of operations and system parameters of a CRN, the proposed IDS is able to detect suspicious (i.e., anomalous or abnormal) behavior arising from an attack. This paper also presented an example of a jamming attack against a CRN secondary user, and demonstrated how proposed IDS able to detect the attack with low detection latency. Wenying Fenga The paper classifies modern CPS Intrusion Detection System (IDS) techniques based on two design dimensions: detection technique and audit material. This paper also summarizes advantages and drawbacks of each dimension’s options. III. SUMMARY OF DIFFERENT CLASSIFIERS Table – 1 Summary of different Classifiers Parameters Advantages The effectiveness of SVM lies in the selection of kernel and soft margin parameters. For kernels, 1. Highly Accurate 2. Able to different pairs of (C, γ) values model complex nonlinear are tried and the one with the decision boundaries 3. Less best cross-validation accuracy is prone to over fitting than picked. Trying exponentially other methods growing sequences of C is a practical method to identify good parameters.
Classifier
Method
Support Vector Machine
A support vector machine constructs a hyper plane or set of hyper planes in a high or infinite dimensional space, which can be used for classification, regression or other tasks.
K Nearest Neighbor
An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors (k is a positive integer). If k = 1, then the object is simply assigned to the class of its nearest neighbor.
Two parameters are considered to optimize the performance of the kNN, the number k of nearest neighbor and the feature space transformation.
1. Analytically tractable. 2. Simple in implementation 3. Uses local information, which can yield highly adaptive behavior 4. Lends itself very easily to parallel implementations
1. Large storage requirements. 2. Highly susceptible to the curse of dimensionality. 3. Slow in classifying test tuples.
Artificial Neural Network
An ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase.
ANN uses the cost function C is an important concept in learning, as it is a measure of how far away a particular solution is from an optimal solution to the problem to be solved.
1. Requires less formal statistical training. 2. Able to implicitly detect complex nonlinear relationships between dependent and independent variables. 3. High tolerance to noisy data. 4. Availability of multiple training algorithms.
1. "Black box" nature. 2. Greater computational burden. 3. Proneness to over fitting. 4. Requires long training time.
In Bayes, all model parameters (i.e., class priors and feature probability distributions) can be approximated with relative frequencies from the training set.
1. Naïve Bayesian classifier simplifies the computations. 2. Exhibit high accuracy and speed when applied to large databases.
1 The assumptions made in class conditional independence. 2. Lack of available probability data.
Decision Tree Induction uses parameters like a set of candidate attributes and an attribute selection method.
1. Construction does not require any domain knowledge. 2. Can handle high dimensional data. 3. Representation is easy to understand. 4. Able to process both numerical and categorical data.
1. Output attribute must be categorical. 2. Limited to one output attribute. 3. Decision tree algorithms are unstable. 4. Trees created from numeric datasets can be complex.
Bayesian Method
Decision Tree
Based on the rule, using the joint probabilities of sample observations and classes, the algorithm attempts to estimate the conditional probabilities of classes given an observation. Decision tree builds a binary classification tree. Each node corresponds to a binary predicate on one attribute; one branch corresponds to the positive instances of the predicate and the other to the negative instances.
Disadvantages 1. High algorithmic complexity and extensive memory requirements of the required quadratic programming in large-scale tasks. 2. The choice of the kernel is difficult 3. The speed both in training and testing is slow.
All rights reserved by www.ijirst.org
83
Improved Algorithm for Network Intrusion Detection System based on K-Nearest Neighbour: Survey (IJIRST/ Volume 3 / Issue 03/ 015)
IV. CONCLUSION Due to our increased dependence on Internet and growing number of intrusion incidents, building effective intrusion detection systems are essential for protecting Internet resources and yet it is a great challenge. In literature, many researchers utilized k-NN in supervised learning based intrusion detection successfully. Here, k-NN maps the network traffic into predefined classes i.e. normal or specific attack type based upon training from label dataset. However, for k-NN based IDS, detection rate (DR) and false positive rate (FPR) are still needed to be improved. In this study, we propose an ensemble approach, called MANNE, for kNN-based IDS that evolve k-NN by Multi-Objective Genetic Algorithm to solve the problem. It helps IDS to achieve high DR, less FPR, improve accuracy and in turn high intrusion detection capability. REFERENCES [1]
[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21]
Amin Dastanpour Suhaimi Ibrahim, and Reza Mashinchi. "Using Genetic Algorithm to Supporting Artificial Neural Network for Intrusion Detection System." In The International Conference on Computer Security and Digital Investigation (ComSec2014), pp. 1-13. The Society of Digital Information and Wireless Communication, 2014. Alexander J. Gibberd and James D. B. Nelson,’ High Dimensional change point detection with a dynamic graphical Lasso’, (2014) IEEE Alina, Oprea, Zhou Li, Ting-Fang Yen, Sang H. Chin, and Sumayah Alrwais. "Detection of early-stage enterprise infection by mining large-scale log data." In Dependable Systems and Networks (DSN), 2015 45th Annual IEEE/IFIP International Conference on, pp. 45-56. IEEE, 2015 Antonis Papadogiannakis, Michalis Polychronakis, Evangelos P. Markatos,’ Improving the Accuracy of Network Intrusion Detection Systems Under Load Using Selective Packet Discarding’, (2010) ACM Asma Gul, Aris Perperoglou, Zardad Khan, Osama Mahmoud, Miftahuddin, Werner Adler, Berthold, ‘Ensemble of a subset of k-NN classifiers’, (2015) Springer Feng Gu Julie Greensmith, and U. Aickelin. "The dendritic cell algorithm for intrusion detection." Bio-Inspired Communications and Networking, IGI Global (2011): 84-102. Kalyanmoy Deb Amrit Pratap, Sameer Agarwal, and T. Meyarivan,’ A Fast and Elitist Multi objective Genetic Algorithm: NSGA-II’, (2002) IEEE Liyuan Xiao Yetian Chen, and Carl K. Chang. "Bayesian Model Averaging of Bayesian Network Classifiers for Intrusion Detection." In Computer Software and Applications Conference Workshops (COMPSACW), 2014 IEEE 38th International, pp. 128-133. IEEE, 2014. M. Govindarajan "Hybrid Intrusion Detection Using Ensemble of Classification Methods." IJ Computer Network and Information Security 2 (2014): 45-53. Mradul Dhakar and Akhilesh Tiwari. "A Novel Data Mining based Hybrid Intrusion Detection Framework." Journal of Information and Computing Science 9, no. 1 (2014): 037-048. Nikos Tsikoudis, Antonis Papadogiannakis, Evangelos P.Markatos,’ LEoNIDS: a Low-latency and Energy-efficient Network-level Intrusion Detection System’, (2013) IEEE. Reema Patel, Amit Thakkar, Amit Ganatra,’ A Survey and Comparative Analysis of Data Mining Techniques for Network Intrusion Detection Systems’, (March 2012) ISSN: 2231-2307, Volume-2, Issue-1 Robert Mitchell, Ing-ray Chen,’ A Survey of Intrusion Detection Techniques for Cyber-Physical Systems, (March 2014) ACM Comput. Surv. 46, 4, Article 55 Salma Elhag, Alberto Fernández , Abdullah Bawakid, Saleh Alshomrani, Francisco Herrera,’ On the combination of genetic fuzzy systems and pairwise learning for improving detection rates on Intrusion Detection Systems’,(11 August 2014) ,science direct Sumaiya Thaseen, Ch. Aswani Kumar,’ Intrusion detection model using fusion of chi-square 4 feature selection and multi class SVM’, (4 October 2015); accepted 3 December 2015 Wang Huai-bin, YANG Hong-liang, XU Zhi-jian, YUAN Zheng, ’ A clustering algorithm use SOM and K-Means in Intrusion Detection’, (2010) IEEE Wei-Chao Lin, Shih-Wen Ke, Chih-Fong Tsai,’ CANN: An intrusion detection system based on combining cluster centers and nearest neighbors’,(23 January 2015),Science direct Weiming Hu Jun Gao, Yanguo Wang, Ou Wu, and Stephen Maybank. "Online adaboost-based parameterized methods for dynamic distributed network intrusion detection." Cybernetics, IEEE Transactions on 44, no. 1 (2014): 66-82. Wenying Fenga, Qinglei Zhang, Gongzhu Hu, Jimmy Xiangji Huang , ‘Mining network data for intrusion detection through combining SVMs with ant colony networks’, (2014) Science direct Yogita B. Bhavsar1, Kalyani C.Waghmare,’Intrusion Detection System Using Data Mining Technique: Support Vector Machine’, (March 2013) Volume 3, Issue 3 International Journal of Emerging Technology and Advanced Engineering Zubair Md. Fadlullah, Hiroki Nishiyama, and Nei Kato, Tohoku University,’ An Intrusion Detection System (IDS) for Combating Attacks Against Cognitive Radio Networks ‘, (2013) IEEE.
All rights reserved by www.ijirst.org
84