International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)
ISSN (Print): 2279-0047 ISSN (Online): 2279-0055
International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) www.iasir.net ONTOLOGY BASED RANKING WEB DOCUMENTS USING SEMANTIC SIMILARITY M.Mahalaksmi1R.Anusuya2 Dr.S.Srinivasan Computer Science and Engineering Anna University Madurai Regional, Chennai, Tamilnadu, INDIA. Abstract: Many web search engines retrieve enormous amounts of irrelevant information in answer to users ‘queries. The semantic web provides a promising approach to improve search operation. This paper is to show how to measure the closeness (relevancy) of retrieved web sites to user query-concepts and re-rank them accordingly. Therefore paper proposed a new relevancy measure to re-rank retrieved documents. We termed the approach ‘‘ontology concepts’’ and it on the domain of electronic commerce. Results suggested that we could re-rank the retrieved documents (web sites) according to their relevancy to the search query. This paper proposed a method depends on the frequency of the ‘‘ontology concepts’’ in the retrieved documents and uses this to compute their relevancy Keywords: Ontology, Ontology concepts, Ranking, Semantic web, Electronic commerce
I. Introduction The semantic web uses ontology as a tool to capture concepts for specific domains. As a result, computers can deal with the data of those domains semantically. An ontology language can be used generate class and property descriptions based on their names, along with some axioms about them. Ontologies have many benefits. First, they capture the concepts, their properties, and their relationships. Second, they represent the domain data in a semantic way and define the knowledge that is embedded in the domain. Third, they can be used to analyze the domain independent of any application requirements. Fourth, they are used to satisfy the new vision of the next generation of the WWW, the semantic web. Fifth, they can be used to build web data in a structured way. One of the main challenges for search engines is to provide a good ranking for documents that are retrieved as relevant to the users’ query [2]. Our approach used the ontology to build a relevancy measure that checked how close the content of a document was to the user query. The ‘‘ontology concepts’’ approach differs from ‘‘keyword concepts’’ because ‘‘ontology concepts’’ search on the semantic of the users’ query not merely on keywords. Ontology concepts and relations were used to define hyperlink relationships that indicate the important entities but unimportant entities might not be selected. Ontology concepts and the frequencies are the important measures that are used to specific document. Figure 1 Methodology of building Ontologies.
IJETCAS 14-446; © 2014, IJETCAS All Rights Reserved
Page 407
M.Mahalaksmi et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 8(5), March-May, 2014, pp. 407-410
II.
Ranking method and search engine results
1. The ranking method 1.1. The first phase: building ‘‘ontology concepts’’ We split the methodologies for building Ontologies around three major stages of the ontology life cycle Building, Manipulating, and Maintaining (see Fig. 1). These three stages are overlapped. Ellipses in Fig. 1 represent the inner steps for each stage. Building ‘‘ontology concepts’’ is a necessity in order for them to be used in the second phase. The electronic commerce domain was selected for this research. The key motivation for choosing this was the increasing number of web documents that discuss electronic commerce. The common terms and most frequent terms in specific domains are pointed out [3]. The input is a set of documents. It is collected from several resources such as online reports, news, banking, teleconference and academic research. The extracted ‘‘ontology concepts’’ for electronic commerce consisted of concepts that are not only the most frequent terms but also those having high ontological relevance keywords. 1.2. The second phase: using the ‘‘ontology concepts’’ to measure relevance Documents/sites are retrieved in the domain of interest (e-commerce here) using the specified search engines; the ranks of these documents are stored according to the search engines’ (e.g., Google or Yahoo) ordering. This step was also divided into two parts; the first converts the retrieved documents/sites into text format saving their original ranking, while, In the second, the retrieved documents were input into our algorithm where each was given a new rank based on its ‘‘distance’’ from the ontology. The ranks produced by this method and those of the search engines were compared. Ranking each document in the best order by its relevancy to the user query. Only the first thirty documents were selected because it was difficult to find domain experts to rank more. At the same time, the relevancy ordering would be likely to be inaccurate after the first twenty. The distance between each document’s position in this proposed method and its original position are calculated and find out their error. The average ranking error represents the average distance for the documents between their original rank and the our method of ranking.[8] Figure 2 Flow of the process
III.
Procedure for Ranking method
The ranking method Part one: Obtain the documents and theirs ranks Step A: Retrieve documents using search engines. The query ‘‘e-commerce’’ was used to retrieve the relevant web documents. Step B: Save the first 30 (or any desired number) documents in text format and save them. These are the data source for testing.
IJETCAS 14-446; © 2014, IJETCAS All Rights Reserved
Page 408
M.Mahalaksmi et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 8(5), March-May, 2014, pp. 407-410
Step C: Save the original ranking of each document as retrieved by each search engine. Thus document N will be given rank number N, etc. Here, the original ranks were saved for comparison with our measure. Part two: The ranking method is based on the ‘‘ontology concepts’’. The algorithm splits each document for each search engine into words and computes the occurrences of these words in the proposed ontology concepts; it then re-ranks these documents according to the number of occurrences. IV.
Procedure for Re-ranking method
This procedure will be run separately for each search engine. Step A: For each text document, store its words into an array. Read the text files to divide each document into words. Then store the words in a string array called split. Step B: Store only one occurrence for each word into an array. Eliminate the frequency of words for each document and store them without frequency in a string array called unique Split. Step C: Eliminate the stop words by using porter stemming algorithm. Store stop words in an array to eliminate them from each document. They are to be ignored during the comparison process. Step D: Determine the ‘‘ontology concepts’’ for each document. Words in the unique Split Array for each document are compared with the words of the ‘‘ontology concepts’’. Store only the words in the document that are included as ‘‘Ontology Concepts” . Step E: Count the frequency of ‘‘ontology concepts’’ for each document. To find the term frequency in each document,
- frequency of terms in document based on ontology concept. -maximum frequency of most repeated concepts in document. To find the inverse document frequency,
D – total number of documents web doc set: Step F: Re-rank the documents according to their frequency. Use the array the frequency of Exist Term and give the highest rank for the highest frequency, and the second highest for the second highest rank (two), etc. V.
Implementaion and Result
Evaluation metrics is used to measure the re-ranking the documents. After re-ranking the documents according to their frequency, the performance is evaluated using precision and recall methods. These are calculated using following formulas,
IJETCAS 14-446; © 2014, IJETCAS All Rights Reserved
Page 409
M.Mahalaksmi et al., International Journal of Emerging Technologies in Computational and Applied Sciences, 8(5), March-May, 2014, pp. 407-410
Figure 3 fairness Distance Evaluation Graph
The resultant curves shows that the blue one shows the average difference between each document’s position in Google and the position of each document according to our re-ranking method. The pink curve shows the average difference between each document’s position in our method and the position of each document according to the three experts. VI.
Conclusions
We have proposed a new approach, the use of ‘‘ontology concepts’’, as a relevancy measure to re-rank retrieved web documents. We showed its value in the electronic commerce domain. The re-ranking of documents enhanced their relevancy. Our results showed that the average ranking error was less than several search engines. VII. References [1] [2] [3] [4] [5] [6] [7]
[8]
A. Kayed, R. Colomb, Extracting ontological concepts for tendering conceptual structures, Data and Knowledge Engineering 40 (1), 2002, pp. 71–89. A. Kayed, N. Hirzallah, L. Al-Shalabi, M. Najjar, Building ontological relationships: a new approach, Journal of the American Society for Information Science and Technology, ISSN: 1532-2882, John Wiley & Sons Inc., pp. 1801–1809, 2008. L. Ding, R. Pan, T. Finin, A. Joshi, Y. Peng, P. Kolari, Finding and ranking knowledge on the semantic web, in: Proceedings of the 4th International Semantic Web Conference, 2005, pp. 156–170. Ontology Ranking based on the Analysis of Concept Structures, Harith Alani Dept. of Electronics & Computer Science University of Southampton, UK, Christopher Brewster Dept. of Computer Science University of Sheffield, UK. Concept Based Information Access Using Ontologies and Latent Semantic Analysis Rifat Ozcan, Y. Alp Aslandogan {ozcan,alp}@cse.uta.edu Semantic Search using Ontology and RDBMS for Cricket S. M. Patil Information Technology Department, BVCOE, Navi Mumbai, Maharashtra, India D. M. Jadhav Information Technology Department, PIIT, New Panvel, Maharashtra, India. Identifying key concepts in an ontology, through the integration of cognitive principles with statistical and topological measures Silvio Peroni, Enrico Motta, and Mathieu d’Aquin Knowledge Media Institute The Open University Milton Keynes, United Kingdom Ranking web sites using domain ontology concepts, Ahmad Kayed a,*, Eyas El-Qawasmeh b, Zakariya Qawaqneh c, Science direct(2010)
IJETCAS 14-446; © 2014, IJETCAS All Rights Reserved
Page 410