Divya Joyti, International Journal of Advanced Trends in Computer Applications (IJATCA) Volume 3, Number 4, April - 2016, pp. 13-16 ISSN: 2395-3519
International Journal of Advanced Trends in Computer Applications www.ijatca.com
A Review on image binarization in degraded documents images 1
DivyaJyoti, 2Arun Sharma, 3Bodh Raj, 4KapilKapoor 1,2,3,4
Department of Computer Science Engg.
Himachal Pradesh Technical University, Hamirpur
Abstract: At present time image processing is a hot immerging research area, and in this paper we have reviewed one of the hot immerging area of image processing that is image binarization, with the help of image binarization we would able to retrieve text from degraded documents, we have also present a solution for improving the existing research in the later section of the paper.
I. INTRODUCTION Digital image processing is a vital field in image processing. Images are processed using algorithms. It has numerous plus points over analog image processing. In digital image processing large number of algorithms can be used with input. Problems like noise, signal distortion can be avoided. Image segmentation is a process of dividing a digital image into different segments so as to analyze it properly and represent it in a more meaningful way. The very initial steps of image segmentation are pattern recognition and image analysis. Image segmentation techniques are based upon two approaches: 1) Detection of discontinuities and 2) Detection of similarities. Detection of discontinuities includes algorithms like edge detection. Partitioning of an image depends upon abrupt changes in intensities. Detection of similarities include algorithms like region based segmentation methods and thresholding. In this approach image is partitioned according to the similarity between the regions and some predefined criterion. Document images, as a substitute of paper documents, mainly consist of common symbols such as handwritten or machine-printed characters, symbols and graphics. In many practical applications, we only need to keep the content of the document, so it is sufficient to represent text and diagrams in binary format which will be more efficiently transmit and processed instead of the original gray-scale image. It is essential to threshold the document image reliably in order to extract useful information and make further processing such as character recognition and feature extraction, especially for those poor quality document images with shadows, non-uniform illumination, low contrast, large signaldependent noise, smear and smudge. Therefore, thresholding a scanned gray-scale image into two levels is the must step and also a critical part in most
document image analysis systems since any error in this stage will propagate to all later phases. Document Image Binarization aims to segment the foreground text from the document background and is performed in the preprocessing stage for document analysis. For the ensuing document image processing tasks such as optical character recognition (OCR), a fast and accurate document image binarization technique is essential. Though document image binarization has been developed for many years, the thresholding of degraded document images is still an unsolved problem due to the high inter/intra variation between the document background and the text stroke across different document images.
www.ijatca.com
Fig.1(a)
13