International Journal of Computer Trends and Technology (IJCTT) – volume 9 number 2– Mar 2014
Revival of Secure Top-k Multi-Keyword over Encrypted Cloud Data Juvairiya K P (Department of Computer Science & Engineering ,Shree Devi Institute Of Technology, Mangalore),
Rasheeda Z Khan ( Professor & HOD,Department of Information Science , Shree Devi Institute Of Technology, Mangalore)
ABSTRACT : Cloud computing has recently
Keywords - Cloud, data privacy, relevance
emerged as a new platform for deploying,
score, similarity relevance, homomorphic
managing, and provisioning large-scale services
encryption, vector space model
through an Internet-based infrastructure. However, concerns
of
cloud
I
Data
Cloud computing, a critical pattern for
encryption protects data security to some extent,
advanced data service, has become a necessary
but at the cost of compromised efficiency.
feasibility for data users to outsource data.
Searchable symmetric encryption (SSE) allows
Controversies on privacy, however, have been
retrieval of encrypted data over cloud. Here, focus
incessantly presented as outsourcing of sensitive
on addressing data privacy issues using SSE. For
Information including e-mails, health history and
the first time, formulate the privacy issue from the
personal photos is explosively expanding. Reports
aspect
of data loss and privacy breaches in cloud
potentially
of
sensitive information on cause
privacy
similarity
problems.
relevance
and
scheme
robustness. We observe that server-side ranking
Introduction
computing systems appear from time to time.
based on order-preserving encryption (OPE)
Cloud computing is a long dreamed vision
inevitably leaks data privacy. To eliminate the
of computing as a utility, where cloud customers
leakage,
searchable
can remotely store their data into the cloud as to
encryption (TRSE) scheme that supports top-k
enjoy the on-demand high-quality application and
multi keyword retrieval. In TRSE, employ a vector
services from a shared pool of configurable
space model and homomorphic encryption. The
computing resources. The main threat on data
vector space model helps to provide sufficient
privacy roots in the cloud itself. When users
search accuracy, and the homomorphic encryption
outsource their private data onto the cloud, the
enables users to involve in the ranking while the
cloud service providers are able to control and
majority of computing work is done on the server
monitor the data and the communication between
side by operations only on cipher text. As a result,
users and the cloud at will, lawfully or unlawfully
propose
a
two-round
information leakage can be eliminated and data security is ensured. Thorough analysis shows that
1.1
Existing System
proposed solution enjoys “as-strong-as possible” security guarantee compared to previous SSE
Besides, in order to improve feasibility
schemes, while correctly realizing the goal of
and save on the expense in the cloud paradigm, it is
keyword search. Extensive experimental results
preferred to get the retrieval result with the most
demonstrate the efficiency of the proposed solution.
relevant files that match users’ interest instead of
ISSN: 2231-2803
www.internationaljournalssrg.org
Page 72
International Journal of Computer Trends and Technology(IJCTT) – volume X Issue Y–Month 2013
all the files, which indicates that the files should be
a keyword exists in a file or not, without
ranked in the order of relevance by users’ interest
considering the difference of relevance with the
and only the files with the highest relevances are
queried keyword of these files in the result. To
sent back to users. A series of searchable
solve the problem and to improve security without
symmetric encryption schemes have been proposed
sacrificing efficiency top-k multi I -keyword
to enable search on ciphertext. Traditional SSE
retrieval was done over encrypted cloud data. The
schemes enable users to securely retrieve the
cloud server is considered as “honest-but curious”,
ciphertext, but these schemes support only Boolean
a model extensively used in SSE and characterized
keyword search, i.e., whether a keyword exists in a
by that the cloud server will honestly follow the
file or not, without considering the difference of
designed protocol but is curious to analyze the
relevance with the queried keyword of these files in
hosted data and the received queries to learn extra
the result. Preventing the cloud from involving in
information.
ranking and entrusting all the work to the user is a natural
way
to
avoid
information
1.3
leakage.
However, the limited computational power on the user side and the high computational overhead precludes information security. 1.2
PROPOSED SYSTEM
In this paper, we introduce the concepts of similarity relevance and scheme robustness to formulate
the
privacy
issue
in
searchable
encryption schemes, and then solve the insecurity
Problem Statement
problem by proposing a two-round searchable
The main threat on data privacy roots in
encryption (TRSE) scheme. Novel technologies in
the cloud itself. When users outsource their private
the cryptography community and information
data onto the cloud, the cloud service providers are
retrieval community are employed, including
able to control and monitor the data and the
homo-morphic encryption and vector space model.
communication between users and the cloud at
In the proposed scheme, the majority of computing
will, lawfully or unlawfully. To ensure privacy,
work is done on the cloud while the user takes part
users usually encrypt the data before outsourcing it
in ranking, which guarantees top k multi-keyword
onto cloud, which brings great challenges to
retrieval
effective data utilization. However, even if the
highsecurity and practical efficiency. Contributions
encrypted data utilization is possible, users still
can be summarized as follows:
over
encrypted
cloud
data
with
need to communicate with the cloud and allow the
1. We propose the concepts of similarity
cloud operates on the encrypted data, which
relevance and scheme robustness. We, thus,
potentially causes leakage of sensitive information.
perform the first attempt to formulate the privacy
A series of searchable symmetric encryption (SSE)
issue in searchable encryption, and we show
schemes have been proposed to enable search on
server-side ranking based on order-preserving
cipher text.
encryption (OPE) inevitably violates data privacy.
Traditional SSE schemes enable users to
2. We propose a TRSE scheme, which
securely retrieve the cipher text, but these schemes
fulfills the secure multi-keyword top-k retrieval
support only Boolean keyword search, i.e., whether
over encrypted cloud data. Specifically, for the first
ISSN: 2231-2803
www.internationaljournalssrg.org
Page 73
International Journal of Computer Trends and Technology(IJCTT) – volume X Issue Y–Month 2013
The data owner has a collection of n n
time, we employ relevance score to support multikeyword top-k retrieval. 3.
Thorough
files ( f 1, f 2,....., fn ) to outsource onto the analysis
on
security
demonstrates the proposed scheme guarantees high data privacy. Furthermore, performance analysis and experimental results show that our scheme is efficient for practical utilization.
We
provide
scenario
cloud server to provide keyword retrieval service to data owner himself or other authorized users. To achieve this, the data owner needs to build a searchable
The rest of this paper is organized as follows:
cloud server in encrypted form and expects the
and
related
I
index
from a
collection
of
I keywords w ( w1, w2,....., wn ) extracted out
background in Section 2, and then we give the
of c , and then outsources both the encrypted index
security definitions and problems with existing
I and encrypted files onto the cloud server. The
schemes in Section 3. In Section 4, we present the
data user is authorized to process multi-keyword
detailed description of the proposed searchable
retrieval over the outsourced data. The computing
encryption scheme Section 5 concludes this paper.
power on the user side is limited, which means that operations on the user side should be simplified.
2
The authorized data user at first generates
PRELIMINARIES
a query REQ . For privacy consideration, which 2.1
Scenario
keywords the data user has searched must be
We consider a cloud computing system
concealed. Thus, the data user encrypts the query
hosting data service, as illustrated in Fig. 1, in
and sends it to the cloud server that returns the
which three different entities are involved: cloud
relevant files to the data user. Afterward, the data
server, data owner, and data user. The cloud server
user can decrypt and make use of the file In this scheme, the data owner encrypts
hosts third-party data storage and retrieve services. Since data may contain sensitive information, the
the
searchable
index
with
homo-morphic
cloud servers cannot be fully entrusted in
encryption. When the cloud server receives a query
protecting data. For this reason, outsourced files
consisting of multi-keywords, it computes the
must be encrypted. Any kind of information
scores from the encrypted index stored on cloud
leakage that would affect data privacy are regarded
and then returns the encrypted scores of files to the
as unacceptable
data user. Next, the data user decrypts the scores and picks out the top-k highest scoring files, identifiers to request to the cloud server. The retrieval
takes
a
two-round
communication
between the cloud server and the data user, thus, name the scheme the TRSE scheme, in which ranking is done at the user side while scoring calculation is done at the server side Fig1 Scenarioo of Retrival of Encrypted Cloud Data
ISSN: 2231-2803
www.internationaljournalssrg.org
Page 74
International Journal of Computer Trends and Technology(IJCTT) – volume X Issue Y–Month 2013
2.2 Relevance Scoring
multi-term and non-binary presentation. Moreover,
Some of the multi-keyword SSE schemes support only Boolean queries, i.e., a file either matches or does not match a query. Considering the large number of data users and documents in the
it allows computing a continuous degree of similarity between queries and files, and then ranking files according to their relevance. It meets needs of top-k retrieval. A query is also represented ~ q , while each dimension of the
cloud, it is necessary to allow multi-keyword in the
as a vector
search query and return documents in the order of
vector is assigned with 0 or 1 according to whether
their relevancy with the queried keywords.
this term is queried. Given the scores, files can be
Scoring is a natural way to weight the relevance. Based on the relevance score, files can then
be
ranked
in
either
ascendingly
ranked in order and, therefore, the most relevant files can be found.
or
dissentingly. Several models have been proposed to
3.
TRSE DESIGN
score and rank files in IR community. Among these schemes, adopt the most widely used one
Existing SSE schemes employ server-side
(tf idf ) weighting. The (tf idf ) weighting
ranking based on OPE to improve the efficiency of
involves two attributes: Term frequency and
retrieval over encrypted cloud data. However,
inverse document frequency. Term frequency
server-side ranking based on OPE violates the
denotes the number of occurrences of term t
privacy
in
file f . Document frequency refers to the number of files that contains term t , and the inverse document
frequency
is
defined
as:
idf log( N / df ) .Where N denotes the total number of files. By introducing the IDF factor, the weights of terms that occur very frequently in the collection are diminished and the weights of terms that occur rarely are increased.
of
sensitive
information,
which
is
considered uncompromisable in the securityoriented third party cloud computing scenario, i.e., security cannot be tradeoff for efficiency. To achieve data privacy, ranking has to be left to the user side. Traditional user-side schemes, however, load heavy computational
burden and high
communication overhead on the user side, due to the interaction between the server and the user including searchable index return and ranking score calculation. Thus, the user-side ranking schemes
2.3 Vector Space Model
are challenged by practical use. A more serversiding scheme might be a better solution to privacy
While (tf idf ) depicts the weight of a
issues.
single keyword on a file, employ the vector space
We propose a new searchable encryption
model to score a file on multi-keyword. The vector
scheme,
space model is an algebraic model for representing
cryptography community and IR community are
a file as a vector. Each dimension of the vector
employed, including homo-morphic encryption and
corresponds to a separate term, i.e., if a term occurs
the vector space model. In the proposed scheme,
in the file, its value in the vector is nonzero,
the data owner encrypts the searchable index with
otherwise is zero. The vector space model supports
homo-morphic encryption. When the cloud server
ISSN: 2231-2803
in
which
www.internationaljournalssrg.org
novel
technologies
Page 75
in
International Journal of Computer Trends and Technology(IJCTT) – volume X Issue Y–Month 2013
receives a query consisting of multi keywords, it
scores of each files in I ' with T ' and returns the
computes the scores from the encrypted index
encrypted result vector V back to the data user.
stored on cloud and then returns the encrypted scores of files to the data user. Next, the data user decrypts the scores and picks out the top-k highest scoring files’ identifiers to request to the cloud server.
The
retrieval
takes
a
two-round
communication between the cloud server and the data user. We, thus, name the scheme the TRSE scheme, in which ranking is done at the user side while scoring calculation is done at the server side.
.
Rank( (V , SK , K ) .The
data
user
decrypts the vector V with secret key SK and then requests and gets the files with top-k scores. Note that Ỹ is only involved in the Setup algorithm, and the Setup algorithm needs to be processed only once by the data owner, ¥ thus, is a constant integer for one individual application instance. The whole framework can be dividedinto two phases: Initialization and Retrieval.
4.2 Framework of TRSE
The Initialization phase includes Setup
The framework of TRSE includes four algorithms: Setup, IndexBuild, TrapdoorGen;
secure initialization, while the IndexBuild stage involves operations on plaintext. For security
ScoreCalculate, and Rank. Setup( ). The data owner generates the secret key and public keys for the homomorphic encryption scheme. The security parameter
and IndexBuild. The Setup stage involves the
is
concerns, the vast majority of work should only be done by the data owner. Moreover, for convenience of retrieve, we modify the original vector space model by adding each vector Vi a head node idi at
taken as the input, the output is a secret key SK
the first dimension of Vi to store the identifier of fi.
and a public key set PK .
In this way, the correspondence between scores and
IndexBuild (C , PK ) . The data owner
files is established The
builds the secure searchable index from the file collection C . Technologies from IR community like stemming are employed to build searchable index I from C , and then I is encrypted into I ' with PK , output the secure searchable index I' .TrapdoorGen ( REQ, PK ).The
data
user generates secure trapdoor from his request.
Retrieval
phase
involves
TrapdoorGen, ScoreCalculate, and Rank, in which the data user and the cloud server are involved. As a result of the limited computing power on the user side, the computing work should be left to server side as much as possible. Meanwhile, the confidentiality privacy of sensitive information cannot be violated.
Vector T ' is built from user’s multi-keyword request REQ and then encrypted into secure trapdoor T ' ’ with public key from PK PK, output
5
CONCLUSION In this paper, we motivate and solve the
problem of secure multi-keyword top-k retrieval
the secure trapdoor T ' . . ScoreCalculate( T ' ; I ' ). When receives secure trapdoor T’, the cloud server computes the
over encrypted cloud data. We define similarity relevance and scheme robustness .Based on OPE invisibly leaking sensitive information, we devise a
ISSN: 2231-2803
www.internationaljournalssrg.org
Page 76
International Journal of Computer Trends and Technology(IJCTT) – volume X Issue Y–Month 2013
server-side ranking SSE scheme. We then propose
[4] S. Zerr, D. Olmedilla, W. Nejdl, and W. Siberski,
a TRSE scheme employing the fully homo-morphic
“Zerber+r: Top-k Retrieval from a Confidential Index,” Proc.
encryption, which fulfills the security requirements
12th Int’l Conf.Extending Database Technology: Advances in Database Technology (EDBT), 2009.
of multi-keyword top-k retrieval over the encrypted cloud data. By security analysis, we show that the
[5] M. van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan, Fully Homomorphic Encryption over the Integers,” Proc. 29th
proposed
scheme
guarantees
data
privacy.
According to the efficiency evaluation of the
Ann. Int’lConf. Theory and Applications of Cryptographic Techniques, H. Gilbert,pp. 24-43, 2010.
proposed scheme over a real data set, extensive [6] M. van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan,
experimental results demonstrate that our scheme
“Fully Homomorphic Encryption over the Integers,” Proc. 29th
ensures practical efficiency. The system is designed
Ann. Int’l Conf. Theory and Applications of Cryptographic
to solve the problem of supporting efficient ranked
Techniques, H. Gilbert,
keyword search for achieving effective utilization
pp. 24-43, 2010.
of remotely stored encrypted data in Cloud Computing
Acknowledgements I express my sincere thanks to Mrs.Rasheeda Z Khan
Professor
& HOD ,Department
of
Information Science , Shree Devi Institute Of Technology, Mangalore, who has always been with me
throughout
the process of design and
construction of this paper. I am very much thankful for her guidance, constant encouragement, support and valuable suggestions to successfully carryout this paper.
REFERENCES [1]International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3,March 2013 [2]R. Curtmola, J.A. Garay, S. Kamara, and R. Ostrovsky, “Searchable Symmetric Encryption: Improved Definitions and Efficient Constructions,” Proc. ACM 13th Conf. Computer and Comm. Security (CCS), 2006. [3] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure Ranked Keyword Search over Encrypted Cloud Data,” Proc. IEEE 30th Int’l Conf. Distributed Computing Systems (ICDCS), 2010.
ISSN: 2231-2803
www.internationaljournalssrg.org
Page 77