Data Mining - a small but major contribution

Page 1

An increasing number of organisations are acquiring the ability to aggregate and mine data. This can only bring about exceedingly positive change to modern society. By: Daniel de Jager


Abstract Cyber Crime bares a significant financial burden on individuals as with organisations, especially those who operate their business in the financial sectors. Often, we read news reports about new strains of malware responsible for millions of dollars of loss banks have incurred as well as new vulnerabilities that malware is exploiting. It is difficult to take a pro-active approach, given the budgetary constraints as well as the complexity of business, without the appropriate technology and the security services supporting it. In this paper, we explore the significance of data mining within the cyber security domain and how data mining is providing the means for speed and visibility that is required to discover new or emerging threats. Since many organisations store or process our personal identifiable information, we provide a view on the protection of this information and data mining’s benefits, which have a direct bearing in terms of its implications for society in a positive way. Introduction

This paper examines data mining techniques in the context of information security. The focus is placed on three technologies, Intrusion Prevention Systems, Anti-Virus and Security Incident and Event Management Systems. We examine each technology and identify use cases for Data Mining, including its benefits. Our views on the use of data mining are provided and then assess its impact on society.

Literature Study Even before the year 2000, Intrusion Detection Systems (IDS) where critical network defense controls, just as they are today. These days where refer to IDS as Intrusion Prevention Systems (IPS) since they have the ability to drop packets when a rule or signature is triggered. Lee, Stolfo & Mok (1999) described how slow the process to update signatures and rules where due to the expert knowledge required to write up these rules and signatures. As a solution, data mining techniques was applied to 2


analyse network traffic data, in order to speed up this process. Using data mining techniques, such as association rules, Lee et al. shows that it is possible to discover new or normal network activity patterns just as well as the existing manual process was at the time.

This is a significant contribution of data mining to move towards automating the analysis process. It means that one can effectively monitor network based traffic and have a certain level of assurance that anomalous traffic will be flagged and blocked, but it does not give one a sense that you are ahead of the curve, especially in the domain of malware.

Malware is still a concern today, not due to the creation of new malware, but due to the same malware code being changed so frequently, in that, since the malware still performs the same activities, it evades anti-virus technology since its signature varies from what it was originally.

Anti-virus technology relies on signatures of known malware. New variants of malware take some time to detect, since there are no signatures available at the point of creation of the variant. This implies that classic anti-virus detection technology is not a reliable defense mechanism as a result of obfuscation techniques, and there exists a reliance on security researchers to analyse the code to produce new signatures.

However, by the time a new signature is released, a new variant of the same malware is already in existence. Data mining techniques in conjunction with hashing techniques shows that there is potential, to determine how similar binaries of obfuscated malware are, through similarity algorithms, in order to build a better defensive mechanism (Azab, Layton & Oliver, 2014).

In response to the malware threat, many security vendors, such as FireEye developed commercial solutions for the automated analysis of malware in terms of its behavior using sandboxing technology. The aim is to collect as many Indicators of Compromise 3


(IOC) data points, in order to prevent malware attacks (FireEye, 2015). Intel Security also provides the same type of solution through its Threat Intelligence Exchange (TIE) Platform (McAfee, 2015). Since these are closed systems, there is no indication of what type of data mining techniques are being used. However, we do know that sandboxing technology is being utilised.

Edem, Bencaid, Al-Nemrat & Watters (2015) describe using sandboxing technology as well as Data Clustering, a Data Mining technique, in order to determine how close malware in terms of its behaviour are related. What they have gained through their research is, is an overall improvement in terms of forensic processing of the malware. Since the IOC data is of utmost importance, especially if it is to be shared, either commercially or through open platforms, it enables organisations to respond faster to emerging as well as known threats. However, the time between analysis and signature release is still not good enough, since if one was to be exposed by a new variant, no effective protection mechanism in place.

Security Event and Information Management (SEIM) systems is a way process many different log data and gain visibility of potential threats of an organisations infrastructure. The sources of data must include security infrastructure such as IPS, Firewalls, AntiVirus and other related data sources that can enrich the datasets.

It combines Events and then identifies Incidents based on rules. A Data Mining Technique, called Association Rules is a technique which attempts to identify frequent items given a set of items. It is based on the premise of Confidence and Support which is then used for the purposes of determining Interesting Associations (Leskovec, Rajaraman, Ullman, 2014:191-226).

Association Rules can be applied in support of SIEM technology use. It provides a mechanism whereby one can enrich the data inference or analysis stage of the monitoring process, in order to reduce false positive rates, as well as identify better and

4


relevant Events of Interest (EOI) to aid an organisation to mitigate threats (Gabriel, Pastwa & Sowa, 2009).

Opinion

Modern society performs transactions online with banks and insurance firms. Our personal information is stored in their data warehouses and they are subject to compromise. There is a reliance on technology to protect our personal information in these organisations, since there would be loss on both ends, if a security incident involves our personal information being compromised.

Data Mining provides a step into the right direction for security vendors, system integrators and security analysts, since the results it produces provides for faster analysis, faster mitigation, relevant and correct information, which is paramount to curb the threat. Data Mining is not a silver bullet to secure an organisation completely; however, it does provide the mechanisms to improve insight and overall visibility supporting decision makers. Within the industry, there is a well-known saying: “Security Technology is only as good, as the signatures that they have applied�. This is very true in the case of IPS and AntiVirus systems, since malware is not easily defeated due to obfuscation. If data mining techniques can be applied in almost real time, an organisation can be in a better defensible position if and only if the information is reliable.

Conclusion

We live in a society that is inter-connected and inter-twined on the World Wide Web. Data Mining provides, in the context of information security, the means to understand security data, interpret its results and supports data protection in its effort. It might be one of the ways to encourage society to embrace the digital age, despite the risks of

5


cybercrime activity knowing that a data scientist somewhere, is working on a solution to support the next generation of security.

References Azab, A., Layton, R., Alazab, M., & Oliver, J. (2014). Mining Malware to Detect Variants. In Cybercrime and Trustworthy Computing Conference (CTC), 2014 Fifth (pp. 44-53). IEEE. Edem, E. I., Benzaid, C., Al-Nemrat, A., & Watters, P. (2014). Analysis of Malware Behaviour: Using Data Mining Clustering Techniques to Support Forensics Investigation. In Cybercrime and Trustworthy Computing Conference (CTC), 2014 Fifth (pp. 54-63). IEEE. FireEye, (2015). Threat Analysis Platform. Available Online: [https://www.fireeye.com/products/threat-analytics-platform.html]. Accessed: 29 July 2015 Gabriel, R., Hoppe, T., Pastwa, A., & Sowa, S. (2009). Analyzing malware log data to support security information and event management: Some research results. In Advances in Databases, Knowledge, and Data Applications, 2009. DBKDA'09. First International Conference on (pp. 108-113). IEEE. Lee, W., Stolfo, S. J., & Mok, K. W. (1999). A data mining framework for building intrusion detection models. In Security and Privacy, 1999. Proceedings of the 1999 IEEE Symposium on (pp. 120-132). IEEE. Leskovec, J., Rajaraman, A., Ullman, J., (2014). Mining of Massive Datasets, 2nd Edition. Chapter 6, Frequent Item Sets, pp. 191-226, Cambridge University Press. McAfee, (2015). McAfee Threat Intelligence Exchange. Available from: [http://www.mcafee.com/us/products/threat-intelligence-exchange.aspx]. Accessed: 29 July 2015

6


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.