Paper id 2620148

Page 1

International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637

Back Index Generation Tool for E-Books: An Implicit Ontology Approach Raj Kumar Singh1, Prateeksha Pandey2 1

Assistant Professor , Department of Information Technology1, Bhilai Institute of Technology, Durg, India1 Assistant Professor2, Department of Computer Science & Engineering2, Chhatrapati Shivaji Institute of Technology, Durg, India2 Email: rajkumarsingh33@gmail.com1, prateekshapandey@csitdurg.in2 Abstract- Book reading is a frequent activity which each one of us does in our existence. Traditional way to navigate or search a topic in a book is done with book index. A universal strategy to find a page for reading is to use front index and back index. A front index generally contains the sections and subsections matter with their corresponding page numbers. Back index is a page number wise list of nouns used in book. The importance and accuracy of Back-of-book index is now becoming very important aspects of research. Researchers working on text mining field related to books, uses back-of-book indexes as a seed keyword for searching and topic spotting. There are different types of back indexes are used by the publishers. Most common of them are Flat and Hierarchical. Also various automatic tools are available which generates the back-of-book indexes. But they also need to define ontology before processing a book. Ontology formally represents knowledge as a set of concepts within a domain. The present paper demonstrates one efficient method which generates back-of-thebook index without any predefined ontology. Index Terms- Stanford Typed Dependency Parser, Noun Phrase Extraction, Bi-grams, Tri-grams, Hierarchical Back Index. 1. INTRODUCTION The proposed approach will automate the generation of back-of-book indexes. This approach converts the portable document format files to text format. The text files are then arranged according to the page numbers. In next step these text pages are passed through Stanford Typed Dependency Parser [1]. Parser helps us by generating the nouns phrases that can be used as ontology. In computer science and information science, ontology formally represents knowledge as a set of concepts within a domain, using a shared vocabulary to denote the types, properties and interrelationships of those concepts [2] [3]. The intermediate outputs are refined by using string matching techniques. At last the final output contains individual or combination of noun phrases can be called as nouns bi-gram and tri-gram that will be arranged in alphabetical order.

can be embedded inside each other; for instance, the noun phrase some of his constituents contains the shorter noun phrase his constituents. In some modern theories of grammar, noun phrases with determiners are analyzed as having the determiner rather than the noun as their head; they are then referred to as determiner phrases. 2. METHODOLOGY The overall process of back-of-book index generation can be represented by a block diagram as shown in fig. 1 below.

1.1. Back Index Components Back index component includes, Noun Phrases with or without sub-headings, References (i.e. page numbers). If a phrase as a noun [4] (or indefinite pronoun) as its head word, or which performs the same grammatical function as such a phrase then the phrase is called Noun Phrase. Noun phrases are very common cross-linguistically, and they may be the most frequently occurring phrase type. Noun phrases Fig.1: Process of back-of-the-book index generation

5


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.