ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. II, Issue VI, June 2016.
Enhanced Detection Guard System Against Malwares In Network Blessy Rajra M B1, Dr. A J Deepa ME., Ph.D2 P.G Student,Department of Computer Science Ponjesly College of Engineering Nagercoil,India1 Associate Professor, Department of Computer Science Ponjesly College of Engineering Nagercoil,India2
Abstract: Today we live in the 21st century, where technology & development are within our fingertips. We use a lot of gadgets in which computerization and systemization is no exception, with this the number of provoking, pestering, intruding and hacking too has kept on steady phase. Scientifically they are directly proportional. Today in this calculatorized world the common man or his superior has no place to hide in, his place of concealment is soon detected and hence privacy curbed (eg: website trawling). Alerts are produced by IDS when an intrusion happens in the network or host; managing is difficult. In this paper, an IDS alert correlator, called Enhanced Detection Guard System (EDGS) is introduced, to detect intrusions within the monitored network. EDG uses Heuristic algorithm to identify infected packets and it can identify the family of malware. The heuristic algorithm uses Entropy Measure and J-Measure to find the infected packets. Finally performance evaluation is made to calculate the specificity, sensitivity, False Positive Rate, False Negative Rate, accuracy and precision. Keywords: Network Security, Intrusion Detection, Alert Correlation, Malware, Performance Evaluation
behavior is studied and according to that patterns are created. Any behavior that deviates the established behavior is categorized as attack. (ii) Signature based IDS: A database is used to store the signature. Any attack that matches the stored signature is categorized as attack. This is well effective for known attacks. In this work, An Enhanced Detection Guard System (EDGS) is used to detect intrusions within the monitored networks. EDGS uses Heuristic algorithm to identify intrusions and it can identify the family of malwares. Detection Heuristic is capable of detecting many previously unknown malwares and new variants of current malwares. The Heuristic algorithm uses Entropy Measure and J-Measure. Entropy Measure is used to identify tuples. J-Measure combine’s two metrics and compare with threshold to identify the intrusions within monitored networks finally performance evaluation is made to calculate the specificity, sensitivity, False Positive Rate, False Negative Rate, accuracy and precision. II. RELATED WORKS
I.INTRODUCTION Illegal access and data modification can be secured by Intrusion Detection. Intrusion Detection System (IDS) is a device or software application that monitors the network activities for malicious activities. In general the intrusion detection system is classified into two categories namely, (i) Network based IDS: It analyses all the packets on the network whether they have originated from inside or outside your firewall. (ii) Host based IDS: Here, software is installed and maintained in the host to be monitored. And it alerts the user or administrator in case of attack. All the intrusion detection system uses either one of the intrusion detection techniques namely, (i) Statistical anomaly based IDS: Here at first the normal
There has been numerous works which explains the IDS. The authors Binkley and Singh proposed anomaly based algorithm to detect the infected systems, but unfortunately the proposed theory in practical is too slow and there is no guarantee that all the infected systems are detected. IRC nickname evaluation was proposed by Goebel and Holz which eventually losses its control over the server once a bot is found. Another proposed theory Wide-scale Botnet Detection and Characterization could not detect botnets that use encrypted communications. Sharma’s proposed model Analysis of security data from a large computing organization could recognize the thread only after the attack has occurred. The threat produced in the case of Chen, Extracting ambiguous sessions from real traffic with intrusion prevention systems is not necessarily true all the time is likely to create a large number of false alerts. False Positives (FPs) and False Negatives (FNs)
All Rights Reserved @ IJARMATE
34
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. II, Issue VI, June 2016.
happen to every Intrusion Detection/Prevention System (IDS/IPS). IDS developers could pay attention on eliminating these FN/FP cases.
The objective of data preprocessing is to transform the raw input data into an appropriate format for subsequent analysis. •
III. PROPOSED MODEL FOR INTRUSION DETECTION The proposed methodology for Intrusion Detection is based on using Enhanced Detection Guard System (EDGS) for detecting the infected packets. The he following issues are to be addressed while developing an EDGS for Intrusion Detection:
1.Data collection 2.Data preprocessing and Normalization 3.Detection heuristic 4.Validation
Normalization:
Normalization is a computerized method of redundancy that makes the collected data free from insert, update, delete and saves space by removing duplicate data. The attributes are scaled to the range [0, 1] using (Equation 7). One way to normalize the data da x is by using the expression:
is the normalized value and Where, and are the minimum and maximum values of the data. Thus, data available for subsequent analysis are real numbers between 0 and 1.
C. DETECTION HEURISTIC Here, the aim is to detect the infected networks chosen from IDS datasets. Detection Heuristic is capable c of detecting many previously unknown malwares and new variants of current malwares. The 41 features from KDD CUP’99 dataset is given as an input. EDG uses information theoretic measures, called Entropy Measure and J-Measure. Figure 1: The proposed EDGS Model for IDS
A. DATA COLLECTION There are two ways to build IDS, one is to create our own simulation network, and collect relevant data and the other one is by using previously collected datasets. The advantage of using previously collected datasets is that the results can be compared with others hers in the literature. Some of the popularly used IDS datasets [8, 9] are DARPA 1998 data set, DARPA 1999 data set and KDD CUP’99 data set which are available in the MIT Lincoln Labs. In this work, we use KDD CUP’99 data set for developing the IDS.
The Entropy Measure is used to captures the random variables les X and Y among ‘n‘variables.The ‘n‘variables. variables are nothing but the features of the dataset KDD CUP’99. EDG detects infected network with high confidence. EDG uses J-Measure Measure for balancing the frequency effectively. J-Measure re is used to identify associations between randomly selected data’s from the dataset KDD CUP’99. . The J-Measure Measure sums the tuples X and Y and compare it with the Jthresh to generate M. Jthresh is the threshold which can be either maximum or minimum value which hich serves as a benchmark. If Jthresh is less than the benchmark then it is categorized as attack, if equal it is categorized as normal.
B. DATA PREPROCESSING AND NORMALIZATION •
Data Preprocessing:
All Rights Reserved @ IJARMATE
35
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. II, Issue VI, June 2016.
J-Measure(Y; X) = P(X)(P (Y|X) log
|
P
| (
Y|X log )
Where,
competition is to develop intrusion detection system models to detect attack categories i.e. DOS, PROBE, R2L and U2R Attacks in KDD CUP’99 falls into the following four categories:
DOS: The attacker blocks the legal users from accessing the server by back, land, Neptune, pod, smurf and tearsdrop. 2. R2L: The attacker hacks the victim system by password guessing or breaking the password. 3. U2R: The attacker, from his local access attacks the administrators through bufferoverflow. 4. Probing: The information from the victim machine is provoked. e.g., Port scanning. Each KDD CUP’99 records contains 41 input features which is given in table 2 and one output that is labeled as either normal or as an attack. 1.
P(X) is the probability that X occurs; P(Y) is the probability of at least one Y; P(Y |X) is the probability that alert X is followed by at least one alert Y;
Denotes the event that Y does not occur.;
Algorithm 1 Pseudo-code of EDGS for detecting infections begin define the normalized dataset D for each process in the dataset D calculate J-Metric(X,Y) if J-Metric(X,Y) equals then D is normal S � Add(D) else D is abnormal S � Update(D) end if Output (S) end for end
Label 1 2 3 4 5 6 7 8 9 10 11 12
Features Duration protocoltype service flag src_bytes dst_bytes land Wrong Urgent Hot
21 22 23
is_host_login is_guest_login Count
24 25 26 27 28 29 30 31
srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_ rate dst_host_count
num_faile d_logins logged_in
32
num_com promised root_shell su_attemp ted num_root
34
num_file_ creations num_shell s num_acce ss_files num_outb ound_cmd s
38
33
D. VALIDATION 13 The validation is three step processes; at first the expected behavior of the infected machine is extracted. Secondly, the evidence from the monitored host is collected. Thirdly, both are compared. If the evidence matches the expected behavior, then it is categorized as attack else if the evidence doesn’t matches the expected behavior then it is termed as normal.
14 15 16 17 18
IV. IDS DATASET KDD CUP’99 creates a standard dataset for surveillance and evaluation of research in intrusion detection. This is an extended version of DARPA 1998. The KDD dataset was used in the UCI KDD1999 competition. The objective of the
19 20
All Rights Reserved @ IJARMATE
35 36 37
39 40 41
dst_host_srv_ count dst_host_same_srv_ rate dst_host_diff_srv_rate dst_host_same_src_port _rate dst_host_srv_diff_host_ rate dst_host_serror_rate dst_host_srv_serror_rat e dst_host_rerror_rate dst_host_srv_rerror_rat e
36
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. II, Issue VI, June 2016.
Table 1 Features of KDD CUP’99
V.PERFORMANCE ANALYSIS This is done to evaluate the IDS such as speed, cost, resource usage, effectiveness, etc. Nowadays accuracy and false alarms rate are issues and challenges in designing effective IDSs [5]. False alarm rate and true alarm rate are predicted based on correct classification of events to be attack or normal behavior [6]. The classification of the event and its prediction are shown in the following table [6, 7].
Actual Class
Predicted Class Normal True negative (TN False negative (FN)
Normal
Attack
Attack False positive (FP) True positive (TP)
True negatives as well as true positives indicate the correct operation of the IDS; True negatives (TN) are number of normal events predicted as normals, true positives (TP) are number of attack events predicted as attacks. Respectively, false positives (FP) refer to number of normal events predicted as attacks; false negatives (FN) are number of attack events incorrectly predicted as normal events. False Positive Rate (FPR) =
False Negative Rate (FNR) =
True Positive Rate (TPR) =
Precision =
True Negative Rate (TNR) = Accuracy =
False positive rate (FPR) also known as false alarm rate (FAR), refers to the proportion that normal data is falsely detected as attack behavior. A high FPR will seriously cause the low performance of the IDS and a high FNR will leave the system vulnerable to intrusions. TNR also known as detection rate or sensitivity refers to proportion of detected attacks among all attack events. Accuracy refers to the proportion of events classified as an accurate type in total events [7]. So, to have effective IDS both FP and FN rates should be minimized, together with maximizing accuracy and TP and TN rates.
VI. EXPERIMENTAL SETUP This section presents the details of the simulation study carried out on KDD CUP’99 Dataset [9] using the proposed method. The details of the records selected for training and testing the EDGS is given in Table 3.
Total Number of Samples: 238 Data Normal Attacks Distribution Training: 34 104 100 Testing: 23 77 100 Table 3 Distribution of Data The EDGS model is developed using MATLAB 2010 in Intel core i5 with 2.40 GHz processor with 2 GB of RAM. Initially all the 41 input features are given as input to the data preprocessing unit. Normalization procedure is followed to find the duplication and makes the data available for subsequent analysis. In Detection phase, a heuristic algorithm which uses J-Metric is used to identify the infected hosts. And in validation phase a positive assessment is made to find the True Positives and False Positives.
TP TP+FP
Detection Rate (DR) =
100%
All Rights Reserved @ IJARMATE
37
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. II, Issue VI, June 2016.
The overall detection rate of EDGS model is Table 4 Testing Performance Finally performance evaluation is made to calculate the specificity, sensitivity, False Positive Rate, False Negative Rate, accuracy and precision. Table 5 Performance Metrics 1.
RESULT COMPARISION
Finally comparison is made; by changing the threshold value the False Positives are reduced to 14% and Detection Rate is increased to 12%.
Positive & Negative Assessment
Jthresh≥2
Jthresh Jthresh≥1.6
False Positive Rate
0.26
0.04
False Negative Rate
0.09
0.09
Detection Rate
80
92
Attack Classes
Normal
Jthresh≥2 Jthresh≥1. Attacks 77 70 706 0.74 0.95 Specificity 0.90 0.90 Sensitivity 0.26 0.04 False Positive Rate 0.09 0.09 False Negative Rate 0.92 0.87 Accuracy 0.92 0.98 Precision found to be 92%. This shows that the heuristic algorithm is able to identify the different type of attack accurately with less false positive and negative rates.
7 CONCLUSIONS
Table 6 Result Comparison 2.
EXPERIMENTAL RESULTS
In figure 2, we plot the FPR and FNR. It shows FPR is reduced while compared to existing up to 14%.
0.3 0.25 0.2 0.15 0.1 0.05 0
No. of No. of Correctly Correctly identified identified attack attack Jthresh≥2 Jthresh≥1.6 23 Performance 17 Metrics 22
No. of attacks
Due to the high end flexibility and extensibility given using the design of the system it will be easy to add more number of attacks to the system in future. This technique has reduced the numbers of false positive rates and increases the accuracy of the systems in the 100 samples of randomly selected KDD CUP’99 dataset. In future, we hope that we find different attacks and their classes and also we can get better results of accuracy and increase the detection rate for the IDS system.
REFERENCES
FPR FNR
[1] Anderson, derson, J P, Computer Security threat Monitoring and surveillance (Technical Report). Fort Washington, PA: James P Anderson Company, 1980. [2] William Stallings, “Cryptography & Network Security Principles & Practices”, Intrusion Detection (pp. 571), 2003, 3rd Edition.
Figure 2 Decrease in False Positive Rate (FPR)
[3] Stefano Zanero (2007), “Flaws and Frauds in the Evaluation of IDS.IPS Technologies”, first accessed on 21.09.07, http:// www.first.org/conference /2007/papers/zanero-stefano-paper.pdf, paper.pdf, 2007.
All Rights Reserved @ IJARMATE
38
ISSN (ONLINE): 2454-9762 ISSN (PRINT): 2454-9762 Available online at www.ijarmate.com
International Journal of Advanced Research in Management, Architecture, Technology and Engineering (IJARMATE) Vol. II, Issue VI, June 2016.
[4] K. Das, “Protocol anomaly detection for networkbased intrusion detection”, GSEC Practical Assignment Version 1.2f SANS Institute, 2001. [5] F.N. Sabri, N.M. Norwawi, K. Seman, “Identifying false alarm rates for intrusion detection system with Data Mining”, IJCSNS International Journal of Computer Science and Network Security, VOL.11, 2011. [6] S.X. Wu, W. Banzhaf, “The use of computational intelligence in intrusion detection systems: A Review”, Applied Soft Computing Journal 10, 2010. [7] S. Wu, E. Yen, “Data mining-based intrusion detectors”, Expert Systems with Applications 36, 2009.
[18] C. yuan Ho, Y. dar Lin, Y. cheng Lai, I. wei Chen, F. yu Wang, and W. hsuan Tai, “False positives and negatives from real traffic with intrusion detection/prevention systems.” [19] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Comput. Surv., vol. 41, no. 3, pp. 15:1–15:58, Jul. 2009. [20] E. Raftopoulos and X. Dimitropoulos, “IDS Alert Correlation in the wild with EDGe,” IEEE Selected areas in communications, Volume:32 , Issue: 10, oct. 2014.
[8] DARPA Intrusion Detection Evaluation – MIT Lincoln Laboratory – (http://www.ll.mit.edu/IST/ideval). [9] KDD-cup dataset, http://kdd.ics.uci.edu/ databases/kddcup99/ kddcup99.htm. [10] Frederic Cuppens and Alexandre Miege. Alert correlation in a cooperative intrusion detection framework. In Proceedings of the 2002 IEEE Symposium on Security and Privacy, pages 202–, Washington, DC, USA, 2002. IEEE Computer Society. [11] C. Elkan, “Results of the KDD’99 classifier learning”, SIGKDD Explorations, ACM SIGKDD, January 2000, Vol 1.(2), pp. 63-64. [12] “Emerging Threats http://www.emergingthreats.net, 2003.
Rules,”
[13] A. Valdes and K. Skinner, “Probabilistic alert correlation,” in Proceedings of the 4th International RAID Symposium, 2001, pp. 54–68. [14] “Network Security Archive,” http://www.networksecurityarchive.org, 2006. [15] P. Smyth and R. M. Goodman, “An information theoretic approach to rule induction from databases,” IEEE Trans. on Knowl. and Data Eng., vol. 4, pp. 301–316, August 1992. [16] G. Piatetsky-Shapiro, “Discovery, analysis and presentation of strong rules,” in Knowledge Discovery in Databases, G. Piatetsky Shapiro and W. J. Frawley, Eds. AAAI Press, 1991, pp. 229–248. [17] E. Raftopoulos and X. Dimitropoulos, “Detecting, validating and characterizing computer infections in the wild,” in Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference.
All Rights Reserved @ IJARMATE
39