International Journal of Computer Trends and Technology (IJCTT) – volume 8 number 4– Feb 2014
A Proposed Methodology for Virus Detection Using Data Mining and Reverse Engineering Tools with Client-Server Model Uday Babu P1, Visakh R2 1 2
(Department of Computer Science & Engineering, Rajagiri School Of Engineering & Technology-Kochi, India) (Department of Computer Science & Engineering, Rajagiri School Of Engineering & Technology-Kochi, India)
ABSTRACT : Viruses are a class of malicious programs that cause unfavourable effects on the computer system and thereby becomes an obstacle to the standard operation of the system. Their existence and execution within the system should be detected within an apt time to prevent them from causing irrecoverable and devastating problems that can cause loss of performance and loss of confidentiality of sensitive information. To detect the presence of a virus within a system, firstly the effects of various viruses on the computer systems are analysed by executing them one by one in a virtual environment. These effects are captured using reverse engineering tools. Data mining is applied on the data recorded by the reverse engineering tools to extract the significant patterns that characterize the respective viruses. These patterns are converted into a unique binary code which can be used to detect viruses using a clientserver model. Keywords - Client-server model, Data mining, FP growth algorithm, Reverse engineering, Virus detection.
I.
INTRODUCTION
Viruses are malwares that are designed to damage the computer systems and thereby make them vulnerable to security threats and performance degradation. The evolution of internet has resulted in the spawning of new malwares including viruses. Viruses have mutated into a sophisticated form that their detection may become a laborious process using major conventional methods like signature-based and heuristic detection. Cyber security is under threat and cyber wars are forecasted in the near future [1]. Hence a methodology that is capable enough to detect the presence of malwares in a system should be formulated so that the malwares and their effects can be removed from the infected system. Data mining is the process of excavating frequent and relevant patterns from a humongous data set [2]. In the proposed system, reverse engineering tools and data mining algorithm are applied one after the other in each virtual system before and after infecting the system with a given virus, to extract the relevant patterns that characterize the effects of the given virus. These patterns are exploited to discover the presence of an unidentified virus in a system on the basis of a
ISSN: 2231-2803
client-server model. Characteristic pattern of each known virus, after transforming it into a binary code, is saved in to database at the server. Binary code formulated for an unknown virus is compared with binary codes of known viruses to spot the unknown virus. In [3], Burji, Liszka and Cha stated that malwares can be detected by integrating reverse engineering tools and data mining. Three virtual machines are created in each system and each of them is infected with a given malware. Reverse engineering tools like file monitor, registry monitor, API call tracer, etc are executed in each of the virtual machines to record various aspects of the machine state of each of the infected virtual machines. Data mining is applied on the reverse engineered data of the virus to retrieve pertinent and frequent data patterns that characterize the virus efficiently. The output of the data mining step is supplied to rough set theory based tool known as Blem2. Blem2 will generate the rules of required confidence and strength that can be used to detect malwares. But here machine state is only captured after infection; hence the observations taken may contain effects that may not be caused by the malware attack. Hence rules developed by the rough set based tool may not be precise enough to catch the malware and this may result in detection of false positives. Reverse engineering of a malware is the analysis of a malware in order to comprehend and capture its design, components, behaviour and effects by executing them in a controlled and isolated virtual environment. Reverse engineering tools like File system monitor, Registry monitor, etc are used to trace the machine state of the system. Each reverse engineering tool captures a single aspect of the machine state. Virus changes one or more aspects of the machine state of the system, when it infects that system. Few of the reverse engineering tools available to capture the state of the machine are the following: 1.1 File System Monitor When a process is executed in a system, it makes changes in the file system by adding, deleting or editing the files in the system. File system monitor captures all the file system activity performed by all the processes running in the system. Changes made by the malware in the file system of the
www.internationaljournalssrg.org
Page 200