An Efficient Reversible Watermarking Technique for Textual Data

Page 1

IJIRST –International Journal for Innovative Research in Science & Technology| Volume 3 | Issue 02 | July 2016 ISSN (online): 2349-6010

An Efficient Reversible Watermarking Technique for Textual Data Vaidyanathan A N M. Tech Student Department of Computer Science & Engineering NCERC, Pampady Thiruvilwamala, Thrissur, Kerala

Dr. S Subasree Professor & Head of Dept. Department of Computer Science & Engineering NCERC, Pampady Thiruvilwamala, Thrissur, Kerala

Ms. Preethymol B Assistant Professor Department of Computer Science & Engineering NCERC, Pampady Thiruvilwamala, Thrissur, Kerala

Abstract Database is a collection of large set of data and information which are organized so that it can be accessed efficiently for knowledge discovery. Many real world applications uses open databases which are available in the internet to extract information based on their needs. The relational databases which are freely available are used by research community for mining new information regarding to their research works. These databases are vulnerable to security issues related to ownership and data tampering. The reliability of the data source must be verified before using it for any research or application purpose. In order to ensure ownership and reliability, watermarking is done to the data. When watermark is embedded to the database it reduces the quality of the data thereby making it unfit for information retrieval .In order to avoid this scenario reversible watermarking is deployed which preserves data quality by recovering the original data along with data security .There are many effective approaches that performs reversible watermarking to ensure ownership along with data recovery. But the main problems with these techniques are, they only focus on numerical databases. Due to this, many of the databases which contain textual data cannot be watermarked with the existing approaches. In order to watermark the textual database an efficient method is proposed here, that uses the Unicode and ASCII value of the alphabets to watermark the textual data. It encodes the textual data with numeric values but retrieves the original textual data at the receiving end. Since a numerical value replaces the textual data field during transmission it makes it difficult for the attacker to retrieve the original information held in the database. Keywords: Reversible watermarking, genetic algorithm, relational data, Textual data _______________________________________________________________________________________________________ I.

INTRODUCTION

The advancement of information technology has boosted the growth of business and research. In many fields, data are extracted widely from various sources for information retrieval and decision making. Many real world application mine data available in different formats like text, audio, video, images and relational data to gather new ideas and information. Especially relational data which is more prominent among the scholar community is shared extensively by the researchers. Open databases are surplus in the internet which helps the scholars to refer different sources. However these databases are viable to many attacks. The data are illegally copied by the attackers thereby posing threat to its ownership rights. The personal information of customer is also retrieved by the attacker causing major security issue for the data. In order to resolve these issues, and to enforce ownership to data, watermarking technique is being used for many years which effectively denies illegal copyrighting. The watermark generated will be embedded to the original data which helps to identify the ownership of data. The data owner can easily identify their data if it contains a unique watermark. The issue regarding watermark is that, while embedding the watermark to the data, the database undergoes certain modification based on the bandwidth of the watermark causing the quality to be compromised. To resolve this scenario reversible watermarking technique is introduced in which the embedded watermark can be revised by the data owner and the original data can be decoded from the watermarked data thereby the data quality is kept intact. Moreover in reversible watermarking the data owner can specify the distortion tolerance i.e. the amount of change in the data that can be allowed by owner while embedding watermark. Based on the distortion tolerance the watermark is embedded to the data. The Deferential Expansion Watermarking (DEW), Genetic algorithm based on difference expansion watermarking (GADEW), A robust and reversible watermarking technique for relation data (RRW) are the main reversible watermarking approaches used in which all the technique uses different method to watermark and decode the data. Since these approaches follow reversible watermarking it is possible to recover the original data from them. But the fact is that none of these focus on textual database. It all focuses on numerical data and does not watermark textual fields. Here an efficient technique is proposed, by which the textual database can be watermarked and can be made secured against heavy attacks.

All rights reserved by www.ijirst.org

316


An Efficient Reversible Watermarking Technique for Textual Data (IJIRST/ Volume 3 / Issue 02/ 054)

II. RELATED WORKS Some of the reversible watermarking techniques [5] used for watermarking relational databases are discussed below. Each of these technique use different methods to generate and embed the watermark. Difference Expansion Watermarking: Difference Expansion Watermarking (DEW) [1] is a simple reversible watermarking approach which can recover the original database exactly. It randomly selects attribute values called Target value from the database and performs mathematical operations to calculate average and difference. The value obtained after the calculations for the selected attribute is called Change value. Then it analyzes the distortion tolerance of the attribute selected for watermarking. Distortion tolerance is enforced to limit each attribute such that the value of the attribute does not lose its meaning in the context of knowledge discovery. If the change value lies within the distortion limit then the selected attribute is watermarked otherwise it is simply discarded. The change value replaces the original value in the database. In the data recovery phase it again calculates the average and difference from the change value and retrieves the original value of the related attribute in the database. The main limitation in this technique is that, watermark capacity is not fully utilized. It can be increased only if the distortion tolerance is increased. It is practically difficult for data owners to raise the distortion limit beyond an extent since the data quality will be compromised. The attacker can use distortion to predict the watermarked attribute. Genetic Algorithm and Difference Expansion based Reversible Watermarking: The Genetic Algorithm and Difference Expansion based Reversible Watermarking (GADEW) [2] is an efficient reversible watermarking approach which uses the concept of genetic algorithm. The difference expansion along with genetic algorithm effectively increases the watermarking capacity and decreases the data distortion. Genetic Algorithm [4] is an optimization technique inspired from natural evolution process in order to solve optimization and search problems. The basic structure is chromosome, which represents the data structure to provide solution for the problem at hand. In this technique instead of selecting attribute randomly, message authentication code (MAC) for the attribute is computed. It selects a tuple from the relation and compute the MAC value based on its public key and private key. Based on the MAC value the tuple is accepted or discarded for watermark insertion. While embedding watermark to the data, distortion occurs to the original value. However to reduce the data distortion, GADEW calculates Attribute wise distortion (AWD) and Tuple wise distortion (TWD) for the selected attribute. In order to identify the AWD, it checks the neighboring value of the attribute .The neighboring value is defined as the two value above and below within the attribute. However to calculate TWD only the selected tuple is taken rather than its neighboring values since it belong to different attribute. GADEW then uses Genetic Algorithm(GA) which employs a fitness function that evaluates and determine whether the tuple needed to be watermarked or not. The fitness function actually compute a Total Cost (TC).The TC mainly involves the three parameter’s namely, AWD,TWD and Capacity Related Cost (CrC).The CrC have two value’0’ and ‘1’ .If the tuple is selected for watermarking the CrC value is set to ‘0’ otherwise it is set to ‘1’.The Total Cost is computed by adding these parameter and watermarking is finally done to the data. Since in this approach separate measure for both attribute wise and tuple wise distortion is done, it significantly reduces the data distortion thereby increasing the data quality. But despite these merits the robustness of GADEW is compromised with high values of AWD and TWD. Robust and Reversible Watermarking Technique for Relational Data: The Robust and Reversible Watermarking Technique for Relational Data is an efficient technique which can efficiently watermark any number of selected attribute in the database. It is highly robust and has the ability to recover the original data in the presence of active malicious attack. It mainly focuses on the numerical data present in the database and uses the genetic algorithm concept to watermark the selected attributes. The attributes which are to be watermarked are selected by calculating the Mutual Information(MI) .RRW [3] first calculate the MI of the selected attribute with the remaining attribute in the database .The data owner is allowed to select a threshold from the MI such that the attributes having MI value less than the threshold value are watermarked here The next step is to generate an optimal watermark string represented by 0’s and 1’s, for each selected features .It is done by employing genetic algorithm. In order to have an optimal watermark string, an initial random population is generated and tournament selection mechanism is applied. For every selected attribute unique optimal strings are allocated. After allocation the watermark encoding process begins. In order to encode the watermark string to the data an optimal value is determined. The optimal value represented by β, is the amount of change that is allowed to the data while embedding the watermark to the database. The optimal value is specified in such a way that it satisfies the constraint specified by the data owner. Optimal value β is added to the data value if bit is 0 otherwise β is subtracted if the bit in the string is 1.This process is continued until every bits in the string has been encoded. In the decoding phase it first identifies the total number of features which have been watermarked. The watermark decoder then calculates the amount of change in the value by working with one bit at a time until all the strings have been identified correctly. Then the original data is recovered by adding the β value if the bit value 1 and subtract β if the bit value is 0.The major problem with RRW is that it focus only on numerical data. The relational databases with textual fields are left without watermarking.

All rights reserved by www.ijirst.org

317


An Efficient Reversible Watermarking Technique for Textual Data (IJIRST/ Volume 3 / Issue 02/ 054)

III. REVERSIBLE WATERMARKING FOR TEXTUAL DATA The Robust and Reversible Watermarking technique is an efficient approach to watermark relational data. It watermarks the database effectively and allows the database owner an option to particularly select any attribute based on its importance in knowledge discovery. It is highly robust against heavy attacks and retrieves the original data exactly. But the major disadvantage of this approach is that it only focuses on numerical databases. The databases having the textual attribute cannot be watermarked using RRW. There are many real world applications that are having relational data with many textual fields. It is highly important to prove the ownership of these databases as well. During the information retrieval the data retrieved must be reliable otherwise if pirated information is fetched from the database or for example if the intruder replaces the textual data within the field it will adversely affects the whole system. Such application thus requires a secure mechanism to protect the information as well as to prove its ownership. In order to watermark the database with textual information the new method is proposed .It uses the concept of Unicode Transformation Format (UTF-8) [6] and the ASCII value of the alphabets to encode and decode the textual data. Textual Data Encoding: To watermark the alphabetic character the Unicode Transformation Format (UTF-8) is used. A Unicode is a unique number provide by Unicode Consortium for every character irrespective of any language. Each alphabet is provided with a unique value for uppercase as well as for lowercase. The UTF-8 represents unique eight bit binary string for every alphabet in both cases. The UTF-8 and ASCII value of the corresponding alphabets are given in the table 4.1. The first step is to select the textual field from the database. Each character from the word is identified separately. Based on the retrieved character the UTF-8 corresponding to the word is then generated. The ASCII value of the alphabets for both the upper and lower case is stored in a separate matrix. In order to encode the value the data owner can specify an optimized value represented by β. Since the value β is known only to the data owner it is difficult for the intruder to make modifications in the database. To encode the optimal value the first alphabet of the word is selected .Based on the alphabet the corresponding ASCII value is then changed using the UTF-8 and the optimal value. The encoder add the optimal value to the ASCII value if the bit is 0 otherwise it subtracts the optimal value from the ASCII value based on its UTF-8 of the selected alphabet. This process is repeated until all the 8 bit string of the UTF-8 is fully traversed. The final value thus obtained is then stored to a separate matrix that is used for the decoding purpose. The remaining alphabets of the word are also encoded in the same method and the values are updated to the matrix .After encoding all the alphabets of the word the values stored in the matrix of the selected word are added. The value thus obtained after the addition replaces the actual word in the database. This process is repeated for all the textual data that are selected for watermarking. The encoding stops once the entire selected fields are watermarked. Textual Data Decoding: The decoder checks the values stored in the textual field one by one. It then selects the first value from the textual field. Decoder then identifies the corresponding matrix related to the selected value. The matrix holds every value of the characters stored separately during the phase of encoding. The decoder identifies the first character’s value from the matrix and reduces that value from the selected value .Then it checks for the second value in the matrix and performs the same operation and continues this process until all the values in the matrix gets completed. The decoder retrieves each character one by one and finally recovers the original textual word. IV. RESULT AND PERFORMANCE ANALYSIS OF THE SYSTEM Fig (1) represents the database selected for watermarking. A database having both the numerical and textual data is selected for watermarking.

All rights reserved by www.ijirst.org

318


An Efficient Reversible Watermarking Technique for Textual Data (IJIRST/ Volume 3 / Issue 02/ 054)

Fig. 1: Database selection

In the encoding process the numerical data are watermarked using Genetic Algorithm (GA).The textual data are watermarked based on their UTF-8 and ASCII Value. The encoded value shown in fig (2) replaces the original textual data after encoding.

Fig. 2: Encoded Textual data

In decoding phase, to recover the original textual string the matrix used to store the corresponding value of the characters are selected first. The matrix table for the selected database is shown in fig (3).

All rights reserved by www.ijirst.org

319


An Efficient Reversible Watermarking Technique for Textual Data (IJIRST/ Volume 3 / Issue 02/ 054)

Fig. 3: Matrix table

By using the matrix value the decoder retrieves the character one by one. The decoding process is shown in fig (4).

Fig. 4: Textual data recovery

The original textual data is recovered finally and is shown in the fig (5).

All rights reserved by www.ijirst.org

320


An Efficient Reversible Watermarking Technique for Textual Data (IJIRST/ Volume 3 / Issue 02/ 054)

Fig. 5: Recovered Data

The performance of the system is also analyzed. The new approach used to watermark textual data takes less time to when compared to the numerical data. In the existing approach since it need to identify and allocate optimal string for each of the selected fields, it takes more computation time. More over the time taken for mutation and merging of string back to pool also adds to the computational time. The comparison graph for the textual and numerical data is shown in the fig (6).

Fig. 6: Comparison Graph

V. CONCLUSION It is important to preserve data quality of the relational data since they are highly used for research and application purpose. The different techniques studied here are some of the watermarking technique used for relational database. All the existing techniques only focus on numerical data set. Here a new technique is proposed in which it can watermark textual data effectively. This method can be used in many real world applications with textual data field to ensure their ownership and to secure their information.

All rights reserved by www.ijirst.org

321


An Efficient Reversible Watermarking Technique for Textual Data (IJIRST/ Volume 3 / Issue 02/ 054)

REFERENCES [1] [2] [3] [4] [5] [6]

M. Alattar, “Reversible watermark using difference expansion of triplets,” in Proc. IEEE Int. Conf. Image Process., 2003, pp. I–501,vol. 1. K. Jawad and A. Khan, “Genetic algorithm and difference expansion based reversible watermarking for relational databases,” J. Syst. Softw., vol. 86, no. 11, pp. 2742–2753, 2013. Saman Iftikhar, M. Kamran, Zahid Anwar, "RRW—A Robust and Reversible Watermarking Technique for Relational Data,“ IEEE Transactions on Knowledge & Data Engineering, vol.27, no. 4, pp. 1132-1145, April 2015. M. Mitchell, An introduction to genetic algorithms. Cambridge, MA, USA: MIT Press, 1996. R. Agrawal and J. Kiernan, “Watermarking relational databases,” in Proc. 28th Int. Conf. Very Large Data Bases, 2002, pp. 155–166. Unicode Inc., the Unicode Standard, Version 6.2 – Core Specification, Chapter 4.12, 2012.

All rights reserved by www.ijirst.org

322


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.