Ijctt v9p116

Page 1

International Journal of Computer Trends and Technology (IJCTT) – volume 9 number 2– Mar 2014

Revival of Secure Top-k Multi-Keyword over Encrypted Cloud Data Juvairiya K P (Department of Computer Science & Engineering ,Shree Devi Institute Of Technology, Mangalore),

Rasheeda Z Khan ( Professor & HOD,Department of Information Science , Shree Devi Institute Of Technology, Mangalore)

ABSTRACT : Cloud computing has recently

Keywords - Cloud, data privacy, relevance

emerged as a new platform for deploying,

score, similarity relevance, homomorphic

managing, and provisioning large-scale services

encryption, vector space model

through an Internet-based infrastructure. However, concerns

of

cloud

I

Data

Cloud computing, a critical pattern for

encryption protects data security to some extent,

advanced data service, has become a necessary

but at the cost of compromised efficiency.

feasibility for data users to outsource data.

Searchable symmetric encryption (SSE) allows

Controversies on privacy, however, have been

retrieval of encrypted data over cloud. Here, focus

incessantly presented as outsourcing of sensitive

on addressing data privacy issues using SSE. For

Information including e-mails, health history and

the first time, formulate the privacy issue from the

personal photos is explosively expanding. Reports

aspect

of data loss and privacy breaches in cloud

potentially

of

sensitive information on cause

privacy

similarity

problems.

relevance

and

scheme

robustness. We observe that server-side ranking

Introduction

computing systems appear from time to time.

based on order-preserving encryption (OPE)

Cloud computing is a long dreamed vision

inevitably leaks data privacy. To eliminate the

of computing as a utility, where cloud customers

leakage,

searchable

can remotely store their data into the cloud as to

encryption (TRSE) scheme that supports top-k

enjoy the on-demand high-quality application and

multi keyword retrieval. In TRSE, employ a vector

services from a shared pool of configurable

space model and homomorphic encryption. The

computing resources. The main threat on data

vector space model helps to provide sufficient

privacy roots in the cloud itself. When users

search accuracy, and the homomorphic encryption

outsource their private data onto the cloud, the

enables users to involve in the ranking while the

cloud service providers are able to control and

majority of computing work is done on the server

monitor the data and the communication between

side by operations only on cipher text. As a result,

users and the cloud at will, lawfully or unlawfully

propose

a

two-round

information leakage can be eliminated and data security is ensured. Thorough analysis shows that

1.1

Existing System

proposed solution enjoys “as-strong-as possible” security guarantee compared to previous SSE

Besides, in order to improve feasibility

schemes, while correctly realizing the goal of

and save on the expense in the cloud paradigm, it is

keyword search. Extensive experimental results

preferred to get the retrieval result with the most

demonstrate the efficiency of the proposed solution.

relevant files that match users’ interest instead of

ISSN: 2231-2803

www.internationaljournalssrg.org

Page 72


International Journal of Computer Trends and Technology(IJCTT) – volume X Issue Y–Month 2013

all the files, which indicates that the files should be

a keyword exists in a file or not, without

ranked in the order of relevance by users’ interest

considering the difference of relevance with the

and only the files with the highest relevances are

queried keyword of these files in the result. To

sent back to users. A series of searchable

solve the problem and to improve security without

symmetric encryption schemes have been proposed

sacrificing efficiency top-k multi I -keyword

to enable search on ciphertext. Traditional SSE

retrieval was done over encrypted cloud data. The

schemes enable users to securely retrieve the

cloud server is considered as “honest-but curious”,

ciphertext, but these schemes support only Boolean

a model extensively used in SSE and characterized

keyword search, i.e., whether a keyword exists in a

by that the cloud server will honestly follow the

file or not, without considering the difference of

designed protocol but is curious to analyze the

relevance with the queried keyword of these files in

hosted data and the received queries to learn extra

the result. Preventing the cloud from involving in

information.

ranking and entrusting all the work to the user is a natural

way

to

avoid

information

1.3

leakage.

However, the limited computational power on the user side and the high computational overhead precludes information security. 1.2

PROPOSED SYSTEM

In this paper, we introduce the concepts of similarity relevance and scheme robustness to formulate

the

privacy

issue

in

searchable

encryption schemes, and then solve the insecurity

Problem Statement

problem by proposing a two-round searchable

The main threat on data privacy roots in

encryption (TRSE) scheme. Novel technologies in

the cloud itself. When users outsource their private

the cryptography community and information

data onto the cloud, the cloud service providers are

retrieval community are employed, including

able to control and monitor the data and the

homo-morphic encryption and vector space model.

communication between users and the cloud at

In the proposed scheme, the majority of computing

will, lawfully or unlawfully. To ensure privacy,

work is done on the cloud while the user takes part

users usually encrypt the data before outsourcing it

in ranking, which guarantees top k multi-keyword

onto cloud, which brings great challenges to

retrieval

effective data utilization. However, even if the

highsecurity and practical efficiency. Contributions

encrypted data utilization is possible, users still

can be summarized as follows:

over

encrypted

cloud

data

with

need to communicate with the cloud and allow the

1. We propose the concepts of similarity

cloud operates on the encrypted data, which

relevance and scheme robustness. We, thus,

potentially causes leakage of sensitive information.

perform the first attempt to formulate the privacy

A series of searchable symmetric encryption (SSE)

issue in searchable encryption, and we show

schemes have been proposed to enable search on

server-side ranking based on order-preserving

cipher text.

encryption (OPE) inevitably violates data privacy.

Traditional SSE schemes enable users to

2. We propose a TRSE scheme, which

securely retrieve the cipher text, but these schemes

fulfills the secure multi-keyword top-k retrieval

support only Boolean keyword search, i.e., whether

over encrypted cloud data. Specifically, for the first

ISSN: 2231-2803

www.internationaljournalssrg.org

Page 73


International Journal of Computer Trends and Technology(IJCTT) – volume X Issue Y–Month 2013

The data owner has a collection of n n

time, we employ relevance score to support multikeyword top-k retrieval. 3.

Thorough

files ( f 1, f 2,....., fn ) to outsource onto the analysis

on

security

demonstrates the proposed scheme guarantees high data privacy. Furthermore, performance analysis and experimental results show that our scheme is efficient for practical utilization.

We

provide

scenario

cloud server to provide keyword retrieval service to data owner himself or other authorized users. To achieve this, the data owner needs to build a searchable

The rest of this paper is organized as follows:

cloud server in encrypted form and expects the

and

related

I

index

from a

collection

of

I keywords w  ( w1, w2,....., wn ) extracted out

background in Section 2, and then we give the

of c , and then outsources both the encrypted index

security definitions and problems with existing

I and encrypted files onto the cloud server. The

schemes in Section 3. In Section 4, we present the

data user is authorized to process multi-keyword

detailed description of the proposed searchable

retrieval over the outsourced data. The computing

encryption scheme Section 5 concludes this paper.

power on the user side is limited, which means that operations on the user side should be simplified.

2

The authorized data user at first generates

PRELIMINARIES

a query REQ . For privacy consideration, which 2.1

Scenario

keywords the data user has searched must be

We consider a cloud computing system

concealed. Thus, the data user encrypts the query

hosting data service, as illustrated in Fig. 1, in

and sends it to the cloud server that returns the

which three different entities are involved: cloud

relevant files to the data user. Afterward, the data

server, data owner, and data user. The cloud server

user can decrypt and make use of the file In this scheme, the data owner encrypts

hosts third-party data storage and retrieve services. Since data may contain sensitive information, the

the

searchable

index

with

homo-morphic

cloud servers cannot be fully entrusted in

encryption. When the cloud server receives a query

protecting data. For this reason, outsourced files

consisting of multi-keywords, it computes the

must be encrypted. Any kind of information

scores from the encrypted index stored on cloud

leakage that would affect data privacy are regarded

and then returns the encrypted scores of files to the

as unacceptable

data user. Next, the data user decrypts the scores and picks out the top-k highest scoring files, identifiers to request to the cloud server. The retrieval

takes

a

two-round

communication

between the cloud server and the data user, thus, name the scheme the TRSE scheme, in which ranking is done at the user side while scoring calculation is done at the server side Fig1 Scenarioo of Retrival of Encrypted Cloud Data

ISSN: 2231-2803

www.internationaljournalssrg.org

Page 74


International Journal of Computer Trends and Technology(IJCTT) – volume X Issue Y–Month 2013

2.2 Relevance Scoring

multi-term and non-binary presentation. Moreover,

Some of the multi-keyword SSE schemes support only Boolean queries, i.e., a file either matches or does not match a query. Considering the large number of data users and documents in the

it allows computing a continuous degree of similarity between queries and files, and then ranking files according to their relevance. It meets needs of top-k retrieval. A query is also represented ~ q , while each dimension of the

cloud, it is necessary to allow multi-keyword in the

as a vector

search query and return documents in the order of

vector is assigned with 0 or 1 according to whether

their relevancy with the queried keywords.

this term is queried. Given the scores, files can be

Scoring is a natural way to weight the relevance. Based on the relevance score, files can then

be

ranked

in

either

ascendingly

ranked in order and, therefore, the most relevant files can be found.

or

dissentingly. Several models have been proposed to

3.

TRSE DESIGN

score and rank files in IR community. Among these schemes, adopt the most widely used one

Existing SSE schemes employ server-side

(tf  idf ) weighting. The (tf  idf ) weighting

ranking based on OPE to improve the efficiency of

involves two attributes: Term frequency and

retrieval over encrypted cloud data. However,

inverse document frequency. Term frequency

server-side ranking based on OPE violates the

denotes the number of occurrences of term t

privacy

in

file f . Document frequency refers to the number of files that contains term t , and the inverse document

frequency

is

defined

as:

idf  log( N / df ) .Where N denotes the total number of files. By introducing the IDF factor, the weights of terms that occur very frequently in the collection are diminished and the weights of terms that occur rarely are increased.

of

sensitive

information,

which

is

considered uncompromisable in the securityoriented third party cloud computing scenario, i.e., security cannot be tradeoff for efficiency. To achieve data privacy, ranking has to be left to the user side. Traditional user-side schemes, however, load heavy computational

burden and high

communication overhead on the user side, due to the interaction between the server and the user including searchable index return and ranking score calculation. Thus, the user-side ranking schemes

2.3 Vector Space Model

are challenged by practical use. A more serversiding scheme might be a better solution to privacy

While (tf  idf ) depicts the weight of a

issues.

single keyword on a file, employ the vector space

We propose a new searchable encryption

model to score a file on multi-keyword. The vector

scheme,

space model is an algebraic model for representing

cryptography community and IR community are

a file as a vector. Each dimension of the vector

employed, including homo-morphic encryption and

corresponds to a separate term, i.e., if a term occurs

the vector space model. In the proposed scheme,

in the file, its value in the vector is nonzero,

the data owner encrypts the searchable index with

otherwise is zero. The vector space model supports

homo-morphic encryption. When the cloud server

ISSN: 2231-2803

in

which

www.internationaljournalssrg.org

novel

technologies

Page 75

in


International Journal of Computer Trends and Technology(IJCTT) – volume X Issue Y–Month 2013

receives a query consisting of multi keywords, it

scores of each files in I ' with T ' and returns the

computes the scores from the encrypted index

encrypted result vector V back to the data user.

stored on cloud and then returns the encrypted scores of files to the data user. Next, the data user decrypts the scores and picks out the top-k highest scoring files’ identifiers to request to the cloud server.

The

retrieval

takes

a

two-round

communication between the cloud server and the data user. We, thus, name the scheme the TRSE scheme, in which ranking is done at the user side while scoring calculation is done at the server side.

.

Rank( (V , SK , K ) .The

data

user

decrypts the vector V with secret key SK and then requests and gets the files with top-k scores. Note that Ỹ is only involved in the Setup algorithm, and the Setup algorithm needs to be processed only once by the data owner, ¥ thus, is a constant integer for one individual application instance. The whole framework can be dividedinto two phases: Initialization and Retrieval.

4.2 Framework of TRSE

The Initialization phase includes Setup

The framework of TRSE includes four algorithms: Setup, IndexBuild, TrapdoorGen;

secure initialization, while the IndexBuild stage involves operations on plaintext. For security

ScoreCalculate, and Rank. Setup(  ). The data owner generates the secret key and public keys for the homomorphic encryption scheme. The security parameter

and IndexBuild. The Setup stage involves the

 is

concerns, the vast majority of work should only be done by the data owner. Moreover, for convenience of retrieve, we modify the original vector space model by adding each vector Vi a head node idi at

taken as the input, the output is a secret key SK

the first dimension of Vi to store the identifier of fi.

and a public key set PK .

In this way, the correspondence between scores and

IndexBuild (C , PK ) . The data owner

files is established The

builds the secure searchable index from the file collection C . Technologies from IR community like stemming are employed to build searchable index I from C , and then I is encrypted into I ' with PK , output the secure searchable index I' .TrapdoorGen ( REQ, PK ).The

data

user generates secure trapdoor from his request.

Retrieval

phase

involves

TrapdoorGen, ScoreCalculate, and Rank, in which the data user and the cloud server are involved. As a result of the limited computing power on the user side, the computing work should be left to server side as much as possible. Meanwhile, the confidentiality privacy of sensitive information cannot be violated.

Vector T ' is built from user’s multi-keyword request REQ and then encrypted into secure trapdoor T ' ’ with public key from PK PK, output

5

CONCLUSION In this paper, we motivate and solve the

problem of secure multi-keyword top-k retrieval

the secure trapdoor T ' . . ScoreCalculate( T ' ; I ' ). When receives secure trapdoor T’, the cloud server computes the

over encrypted cloud data. We define similarity relevance and scheme robustness .Based on OPE invisibly leaking sensitive information, we devise a

ISSN: 2231-2803

www.internationaljournalssrg.org

Page 76


International Journal of Computer Trends and Technology(IJCTT) – volume X Issue Y–Month 2013

server-side ranking SSE scheme. We then propose

[4] S. Zerr, D. Olmedilla, W. Nejdl, and W. Siberski,

a TRSE scheme employing the fully homo-morphic

“Zerber+r: Top-k Retrieval from a Confidential Index,” Proc.

encryption, which fulfills the security requirements

12th Int’l Conf.Extending Database Technology: Advances in Database Technology (EDBT), 2009.

of multi-keyword top-k retrieval over the encrypted cloud data. By security analysis, we show that the

[5] M. van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan, Fully Homomorphic Encryption over the Integers,” Proc. 29th

proposed

scheme

guarantees

data

privacy.

According to the efficiency evaluation of the

Ann. Int’lConf. Theory and Applications of Cryptographic Techniques, H. Gilbert,pp. 24-43, 2010.

proposed scheme over a real data set, extensive [6] M. van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan,

experimental results demonstrate that our scheme

“Fully Homomorphic Encryption over the Integers,” Proc. 29th

ensures practical efficiency. The system is designed

Ann. Int’l Conf. Theory and Applications of Cryptographic

to solve the problem of supporting efficient ranked

Techniques, H. Gilbert,

keyword search for achieving effective utilization

pp. 24-43, 2010.

of remotely stored encrypted data in Cloud Computing

Acknowledgements I express my sincere thanks to Mrs.Rasheeda Z Khan

Professor

& HOD ,Department

of

Information Science , Shree Devi Institute Of Technology, Mangalore, who has always been with me

throughout

the process of design and

construction of this paper. I am very much thankful for her guidance, constant encouragement, support and valuable suggestions to successfully carryout this paper.

REFERENCES [1]International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3,March 2013 [2]R. Curtmola, J.A. Garay, S. Kamara, and R. Ostrovsky, “Searchable Symmetric Encryption: Improved Definitions and Efficient Constructions,” Proc. ACM 13th Conf. Computer and Comm. Security (CCS), 2006. [3] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure Ranked Keyword Search over Encrypted Cloud Data,” Proc. IEEE 30th Int’l Conf. Distributed Computing Systems (ICDCS), 2010.

ISSN: 2231-2803

www.internationaljournalssrg.org

Page 77


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.