IDL - International Digital Library Of Technology & Research Volume 1, Issue 4,April 2017
Available at: www.dbpublications.org
Internati onal e-J ournal For Technol ogy And Research-2017
Spell checker for Kannada OCR Suma S, Sneha N UG scholars, Depart ment of Information Science and Engineer ing Siddaganga Institute of Technology, Tumakuru suma.sinchu.ds@gmail.com snehan769@g mail.co m
Sharathkumar S Assistant Professor Depart ment of Information Science and Engineering Siddaganga Institute of Technology, Tumakuru skumars@sit.ac.in
Abstract— A spell checker is an application program to process the natural languages in machine readable format effectively. Spelling checking and correction is a basic necessity and a tedious work in any language, so we require spell checker software to do this, which is the fundamental necessity for any work. Spell checker is a set of program which analyzes the wrongly used word and corrects it by the most possible correct word. The challenging task here is the work done for a Kannada language. In a software system many Kannada words are typed in several formats since Kannada has many fonts to write the grammar properly. In this paper, we describe some techniques used in Kannada language by a spell checker. We use NLP, which is a field of computer science having relationship between human (i.e., natural languages) and computers. Usually, we have some modern NLP algorithms based on machine learning to carry out the work. Keywords—S pell checker, NLP, OCR, Dictionary Lookup;
I.
Ex:
may be wrongly written as . Phoneme to grapheme mappi ng errors Errors occurred while writ ing the dictated words.
Ex:
Ex: may be wrongly typed as . OCR generated errors Errors occurred by incorrect recognition of a character by OCR. Ex: may be wrongly recognized as . Errors generated by s peech recog nizer Errors occurred due to wrong pronunciation of words or wrong recognition of words by speech recognizer. Ex:
A linguistic error analy zer is a tool which studies the types and causes of language errors. Errors may be classified as: Conceptualization errors (i.e., thinking), phoneme to grapheme mapping errors (i.e., writing), typing errors, OCR generated errors, errors generated by speech recognizer. Conceptualization errors Errors occurred due to one‘s way of thinking.
IDL - International Digital Library
.
Typing errors Errors occurred wh ile typing by pressing wrong key.
INT RODUCTION
Kannada is a Dravid ian language spoken predominantly by people of Karnataka and other neighboring states. It has roughly forty million native speakers and a total o f 50.8 million speakers according to 2001 census. Spell checking is the critical problem in NLP. The tool named spell checker is the important tool for the number of tightly coupled components for various software like OCR, word processor and even translators. 1.1 Error Analyzer
may be wrongly written as
may be wrongly recognized as
.
1.2 Optical Character Recognition (OCR) Optical character recognition is a technique for moving text fro m paper form to electronic form. To convert an image, written text or e-text into a machine readable format we require an OCR, the input to this can be a plain document, image etc. The source for OCR can be bank statements, ATM transactions, e-statements, mailing documents etc. To process different tasks like speech to text, image to text and vice-versa, analyzing of the text is done in digitized format, so that it can be easily edited, stored and even accessed easily via open-access system. OCR is a field of research in NLP, Machine learning, artificial intelligence and computer vision. In a modern era, there is a need of flexibility to produce an accurate OCR system so that it can recognize any type of fonts with the support of various digital image inputs to get more accurate outputs for the proper inputs supplied .
1 |P a g e
Copyright@IDL-2017