Tr 00096

IDL - International Digital Library Of Technology & Research Volume 1, Issue 6, June 2017

Available at: www.dbpublications.org

International e-Journal For Technology And Research-2017

Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy 1.Ambika M Patil, M.Tech Computer Science Engineering, Center for P G Studies Jnana Sangama VTU Belagavi, Belagavi, INDIA, Ambika702@gmail.com 2.Assistant Prof.Ranjana B Nadagoudar, Computer Science Engineering Department, Center for P G Studies Jnana Sangama VTU Belagavi, Belagavi, INDIA 3.Dhananjay A Potdar , Dhananjay.potdar@gmail.com

ABSTRACT - While Big Data gradually become a hot topic of research and business and has been everywhere used in many industries, Big Data security and privacy has been increasingly concerned. However, there is an obvious contradiction between Big Data security and privacy and the widespread use of Big Data. There have been a various different privacy preserving mechanisms developed for protecting privacy at different stages (e.g. data generation, data storage, data processing) of big data life cycle. The goal of this paper is to provide a complete overview of the privacy preservation mechanisms in big data and present the challenges for existing mechanisms and also we illustrate the infrastructure of big data and state-of-the-art privacy-preserving mechanisms in each stage of the big data life cycle. This paper focus on the anonymization process, which significantly improve the scalability and efficiency of TDS (top-down-specialization) for data anonymization over existing approaches. Also, we discuss the challenges and future research directions related to preserving privacy in big data. KEYWORDS - Big data, privacy, big data storage, big data processing. Data anonymization, top-down specialization, MapReduce, cloud, privacy preservation.

I. INTRODUCTION As a result of recent technological development, the amount of data generated by social networking sites, sensor networks, Internet, healthcare applications, and many other companies, is significantly increasing day by day. The term “Big Data” reflects the trend and salient features of the data being produced from various sources. Basically Big Data can be described by “3Vs” which stands for Volume, Velocity and

IDL - International Digital Library

Variety. Volume shows the huge amount of data being produced from multiple sources. Velocity is concerned with both how fast we produce and collect data, but also how fast some of the collected data is changing. Variety shows their highly distributed and various nature. The data generation rate is growing so rapidly that it is becoming very difficult to handle it using traditional methods or systems [1]. In the “3Vs” model, Variety indicates the various types of data which include structured, semistructured and unstructured data; Volume means data scale is large; Velocity indicates all processes of Big Data must be quick and timely in order to maximize value of Big Data as shown in Fig.1. These features that Big Data handles huge amount of data and uses various types of data including unstructured data and attributes that were never used in the past distinguish data mining from Big Data. In 2011, IDC defined big data as “big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling the highvelocity capture, discovery, and/or analysis”[2]. In this definition, features of big data may be abridged as 4Vs, i.e., Variety, Velocity, Volume and Value, where the implications of Variety, Velocity, Volume is same as the 3Vs model respectively and Value refers big data have great social value. The 4Vs model was widely recognized because it indicates the most critical problem which is how to discover value from an enormous, various types, and rapidly generated datasets in big data.

1|P a g e

Turn static files into dynamic content formats.

Create a flipbook