IDL - International Digital Library Of Technology & Research Volume 1, Issue 5, May 2017
Available at: www.dbpublications.org
International e-Journal For Technology And Research-2017
Algorithm to Convert Unstructured Data in Hadoop and Framework to Secure Big Data in Cloud 1
Lakshmikantha G C2Anusha Desai3Keerthana G4Keerthi Kiran M E5Siri S Garadi 1
( Assistant professor, Department of CSE, VKIT, Bengaluru, Lakshmikantha.gc@gmail.com) 2 (B.E Student, Department of CSE, VKIT, Bengaluru, anushadesai21@gmail.com) 3 (B.E Student, Department of CSE, VKIT, Bengaluru, bramara.keerthana@gmail.com) 4 (B.E Student, Department of CSE, VKIT, Bengaluru,keerthikiran351995@gmail.com) 5 (B.E Student, Department of CSE, VKIT, Bengaluru, Sirigaradi82@gmail.com)
Abstract— Extracting unstructured BigData from the
dataset and converting to Hadoop format. The resultant data is stored in the cloud and secured by double encryption. The user can retrieve the data in the cloud with the help of user interface by double decryption. The security to the data in the cloud is provided using Fully Homomorphic algorithm. As a result of efficient encryption, transmission and storage of sensitive data is achieved. We analyze the existing search algorithm over cipher text, for the problem that most algorithm will disclosure user's access patterns, we propose a new method of private information retrieval supporting keyword search which combined with Homomorphic encryption and private information retrieval.
I. INTRODUCTION Big data is a term for data sets that are so large or complex that traditional data processing application softwares are inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy. The goal of most big data systems is to surface insights and connections from large volumes of heterogeneous data that would not be possible using conventional methods. Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop Distributed File System a distributed file-system that stores data on the commodity machines, providing very high aggregate bandwidth across the cluster. A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in
IDL - International Digital Library
a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Cloud computing is a type of Internet-based computing that provides shared computer processing resources and data to computers and other devices on demand. Cloud computing security or, more simply, cloud security refers to a broad set of policies, technologies, and controls deployed to protect data, applications, and the associated infrastructure of cloud computing. Homomorphic encryption is a form of encryption that allows computations to be carried out on ciphertext, thus generating an encrypted result which, when decrypted, matches the result of operations performed on the plaintext. We define the relaxed notion of a semihomomorphic encryption scheme, where the plaintext can be recovered as long as the computed function does not increase the size of the input “too much”. But the disadvantage of these two algorithms is that the encrypted data can be decrypted easily. To overcome the disadvantages of homomorphic and semi-homomorphic encryption we propose a fully homomorphic encryption scheme – i.e., a scheme that allows one to evaluate circuits over encrypted data without being able to decrypt.
II. EXISTING SYSTEM Crawlers algorithm for extraction A Web crawler, sometimes called a spider, is an Internet but that systematically browses the World Wide Web, typically for the purpose of Web indexing.Web search engines and some other sites use Web crawling or spidering software to update their web content or indices of others sites' web
1|P a g e
Copyright@IDL-2017