IJSRD - International Journal for Scientific Research & Development| Vol. 3, Issue 1, 2015 | ISSN (online): 2321-0613
Malware Detection Based on Behavior in Delay Tolerant Network Using Support Vector Machine K. M. Rajiha1 Dr. D. C. Joy Winnie Wise2 S. N. Ananthi3 1 P.G Scholar 2Professor and Head 3Assistant Professor 1,2,3 Department of Computer Science & Engineering 1,2,3 Francis Xavier Engineering College, India Abstract— Delay tolerant network (DTN) is widely being used for short range communication as it works well in intermittent connectivity areas. DTN propagates information in its network via opportunistic contacts of its nodes. Proximity malware exploits this propagation nature of DTN. The previous methods for behavioral characterization of proximity malware have used naive Bayes model. To improve detection rate, we propose a Support Vector Machine (SVM) that detects the malware in the delay tolerant networks in an efficient and quick way. SVM is a supervisory learning model that can analyze and classify patterns. It not only recognizes the already existing patterns of malware based on its behavior but also the predictable patterns. SVM is less prone to over fitting, which becomes an advantage and makes it desirable for bigger networks. Key words: Delay tolerant network (DTN), Support Vector Machine (SVM), Naïve Bayes I. INTRODUCTION Delay Tolerant Network (DTN) is ideal for areas with intermittent connectivity due to environmental conditions or malware activity. There is no end to end connection in DTN. The communication is through store and forward approach. The information is organized into bundles rather than packets. In cases of lost connection, the intermediate nodes will store the data and resume communication when connection is re-established. DTN works well in areas that have intermittent connectivity, long delays, high error rates, and different data rates. It was originally developed for deep space communication and then used in military, agricultural, vehicular and mobile applications. DTN makes the network to continue its work and be reliable in environments where communications are most challenging. The transmission in DTN takes place via opportunistic contacts. These contacts are not scheduled. They are mostly available and in the proximity. This nature of transmission makes DTN vulnerable to proximity malware. Malware are a class of malicious software. They can be in the form of viruses, Trojans, worms, rootkits etc. Proximity malware is a piece of malicious code that attacks and alters the functionality of the node it attaches itself to and duplicates itself on interaction with other nodes, this duplication to other codes is called malware infection. Proximity malware can infect nodes opportunistically. These proximity malware exploit DTN and use its opportunistic contacts transmission for the malware propagation. Proximity malware in DTN pose security challenges that are not found in infrastructure mode. Cellular carrier would centrally monitor and look for abnormalities in infrastructure model. In DTN there is no central monitoring. So the detection of proximity malware in DTN becomes necessary.
Malware detection based on behavior is better than pattern matching especially in cases of polymorphic malware. Naïve Bayes classifier has been used for behavioral detection of proximity malware in DTN settings. The chances of malware not getting detected if they turn malicious in a short time are possible with Naïve Bayes. We propose a Support Vector Machine (SVM) that is trained with data collected by Naïve Bayes classifier. II. METHODOLOGY A. Naïve Bayes: Naïve Bayes classifiers are based on the Bayes probability theorem. They are used to create simple and well performing models. The adjective Naïve is used because the features considered are mutually independent. Naïve Bayes classifiers perform well under this assumption. For small size classification problems, Naïve Bayes is the best option. Here we need to gather behavioral information of malware. For that purpose, we use a Naïve Bayes classifier. We classify the nodes in our network as malicious node or normal node. This is simple and easily done for small networks. Since we are dealing with proximity malware, we need to check the nodes that are close which would be the neighbours of each node. First we assume that all nodes know their neighbors. Then we check one hop neighbors for malware infection and then do the same for two hop neighbors. We will rely our decision on asessments. We compute packet delivery ratio for every data transmission. The nodes with transmissions that have low packet delivery ratio are considered as the malware infected nodes. After completing our assessments for both one hop and two hop neighbours, we compile the entire collected data and use it to train our SVM. The training data provided by Naïve Bayes will influence optimality of the SVM classifier. B. Support Vector Machine: SVM is one of state of the art machine learning techniques. SVM is primarily based on statistical learning theory. SVMs are supervised learning models that can analyze data and recognize patterns. SVMs give good performance in different fields and real world applications. SVMs are used in text classification, image processing, bio-informatics, neural networks, functional analysis, optimization and many other fields. SVM works with minimum parameters and generalizes a problem well. SVM is a quadratic programming problem. SVM would map input data to a a point in high dimensional space. The mapping helps in the classification. SVMs can perform both linear classification and non-linear classification. If our data is in 2 dimensions, we can separate them out by a line. SVM finds a margin between the two sets of data. Hyperplanes are used for higher dimension
All rights reserved by www.ijsrd.com
5
Malware Detection Based on Behavior in Delay Tolerant Network Using Support Vector Machine (IJSRD/Vol. 3/Issue 1/2015/002)
problems. The separating hyperplane can be found by linear programming, often called as perceptron. Support vectors are the subset of the trained data that are kept within the SVM classifier. SVM with linear kernals suits well for classification problems especially when we treat our problem as a binary classification problem.
After setting up the network, the nodes are initialized and each node finds its neighbors. The first hop and second hop neighbors are considered, since DTN is more vulnerable to proximity malware. After data transmission starts, Naïve Bayes uses the packet delivery ratio to find out malware affected nodes. And the collected information is used to train SVM. Then SVM will detect malware using behavior based on trained data and predictable data. This way malware can be detected accurately. IV. CONCLUSION
Fig. 1: Working of SVM. Here we use a linear SVM. We train SVM with a set of data and then test it with new data. SVM learns from the trained data. So SVM will find out if the new data fits which side of the separator. The data can fit into any one side of the separator, not both. For training, SVM is fed with data from Naïve Bayes classifier. SVM can work with minimal data. SVM finds a relationship between proximity malware and its associated behavior. This way SVM can detect already existing behavior of proximity malware. The added advantage of SVM is that it learns from the trained data. So it can predict new behavior and detect them too. In SVMs, the term prediction does not mean guessing a future trait from past traits, rather prediction means classification where we train SVM with some part of particular data set and use it to classify all the remaining data in that data set. This helps to improve the detection rate. Further SVM is less prone to overfitting which makes SVM ideal for large networks. III. SYSTEM DESIGN
In this paper, we have used Support Vector Machine to detect malware based on behavior in Delay Tolerant Network. This approach has better accuracy than just using Naïve Bayes, since SVM can predict and detect new behavior. We can add more parameters to our SVM in future. This will further improve detection rate and accuracy. V. ACKNOWLEDGMENT I thank my Guide, Dr. D.C.Winnie Wise, Professor and Head, Department of CSE, for her inspiration and guidance. I also thank Ms. S.N.Ananthi, Assistant Professor, Department of CSE, for her support. REFERENCES [1] K. Fall and S. Farrell, “DTN: An Architectural Retrospective”, IEEE Journal on Selected Areas in Communications, Vol.26, Issue-5, pp.828836,2008. [2] Androutsopoulos, J. Koutsias, K. V. Chandrinos, and C. D. Spyropoulos, “An Experimental Comparison of Naïve Bayesian and KeywordBased Anti-Spam Filtering with Personal E-mail messages”, Proceedings of the 12th Annual International ACM Conference Research and Development in Information Retrieval (SIGIR), pp.160-167,2000. [3] Bose, X. Hu, K. G. Shin, and T. Park, “Behavioral detection of Malware on Mobile Handsets,” Proceedings of thr 6th International Conference on Mobile Systems, Applications and services, pp.225-238,2008. [4] W. Peng, F. Li, X. Zou and J. Wu, “Behavioral Detection and Containment of Proximity Malware in Delay Tolerant Networks,” IEEE 8th International Conference on Mobile Adhoc and Sensor Systems, pp.411-420,2011. [5] W. Peng, F. Li, X. Zou and J. Wu, “Behavioral Malware Detection in Delay Tolerant Networks,” IEEE Transactions on Parallel and Distributed systems, Vol.25, pp.53-62, 2014. [6] G. Zyba, G. Voelker, M. Liljenstam, A. Me’hes and P. Johansson, “Defending Mobile Phones from Proximity Malware,” Proceedings of IEEE INFOCOM, pp. 1503-1511,2009. [7] Y. Tang, Y. Q. Zhang, N. V. Chawla and S. Krasser, “SVMs Modeling for Highly Imbalanced Classification”, IEEE Transactions on Systems, Man and Cybernetics, Vol.39, pp.281-288,2009.
Fig. 2: System Design
All rights reserved by www.ijsrd.com
6
Malware Detection Based on Behavior in Delay Tolerant Network Using Support Vector Machine (IJSRD/Vol. 3/Issue 1/2015/002)
[8] Cortes and V. N. Vapnik, “Support Vector Networks”, Machine Learning, Vol.20, pp.273297, 1995. [9] N. Drucker W. Dongui and V. N. Vapnik , “Support Vector Machines for spam categorization”, IEEE Transactions on Neural Networks, Vol.10, Issue-5, pp.1048-1054,1999. [10] F. Li, Y. Yang and J. Wu, “CPMC: An Efficient Proximity Malware Coping Scheme in Smartphone-based Mobile Networks”, Proceedings of IEEE INFOCOM, 201, pp.1-9,2010.
All rights reserved by www.ijsrd.com
7