IJETR Paper 11 May 2018 by Engineering Research Publication

International Journal of Engineering and Technical Research (IJETR) ISSN: 2321-0869 (O) 2454-4698 (P) Volume-8, Issue-5, May 2018

A Combination of Neuron Networks and Genetic Algorithms for Nom Character Recognition Truong Thi Huong, Nguyen Van Truong  Abstract— Nom character, Vietnamese ancient language, is a cultural heritage that has played a particularly important role in creating a brilliant literary culture throughout the centuries in Vietnam. Nom character is very valuable in studying the life of the ancient Vietnamese in many fields literature, thought, philosophy, language, law, morality, etc. Reading and writing the character in Vietnam nowadays are not popular, so the digitization of its letters is an important task. In this paper, we present an approach of Neuron Networks and Genetic Algorithms to provide a good optical recognition method of Nom character. This contribution helps to build a tool that can identify, transform Nom character into Vietnamese language effectively. Index Terms— Nom character, Networks, Genetic Algorithms.

Recognition,

Neuron

Fig. 1. A Nom character recognition model II. NEW GANN ALGORITHM FOR NOM CHARACTER RECOGNITION

I. INTRODUCTION Nom character is a logographic writing system formerly used to write the Vietnamese language. It used the standard set of classical Chinese characters to represent Vietnamese vocabulary and some native Vietnamese words, while new characters were created on the Chinese model to represent other words. Nowadays, many people are loyal to brush and ink pot in studying Nom Character inherited from their forefathers to study the life of the ancient Vietnamese. Building Nom optical character recognition software (Nom-OCR) is a necessity as with other languages. Nom-OCR will act as a strong motivator for the study of Nom character, explorer precious resources of the nation for thousands of years in politics, culture and society. Nom recognition systems can refer to the technical models of other OCRs, especially hieroglyphs OCR such as Chinese, Japanese. Based on the study of OCR models, the authors propose the recognition model for Nom as shown in Figure 1. In the model, the source can be an image or a PDF file. The source of the OCR system can include many types of information, such as images of different language types. Therefore, it is necessary to conduct page analysis, character recognition. After separating the character from the page, we proceed to the necessary pre-processing steps, split into blocks, split the blocks into lines, split the lines into discrete characters. From discrete characters, we extract the character of the character to be used to carry out the identification. The result of the identification step may not be the final step, but will be post-processing, dictionary-based, grammar, etc., to determine the final result [1].

2.1 Previous works Artificial Neural Network (ANN) is an information processing model illustrating to the way information is processed by biological neural systems with the expectation that it can handle intelligent tasks in similar way that human brain can do. An artificial neural network is configured for a specific application such as pattern recognition, data classification, etc. by a learning process from a set of training samples. The essence of learning is the adjustment of weights indicate linkage among neurons. Neural networks have been successfully applied in classification by means of highly adaptive learning, discriminatory and generalization. It has been widely used in the studies of hieroglyphic identification such as Korean [3, 8], Indian [4], Chinese [2, 5, 6, 9], Japanese [7] that can achieve accuracy of 98.82%. Direct propagation neuron is one of the most common type of networks. Back-propagation algorithms (BPA) are used to calculate the gradient of error functions using the diffusion rule. The errors after the initial calculation on the transmission line will be transmitted back from the output through each layer, called "back-propagation". In the process of network training, the initial weight is an importan factor that will affect to acuracy. If the weights are initialized with large values, then the total input signal from the beginning of the absolute value is large and therefore the output of the network is only 0 or 1. This causes the system to be clogged at a local minimum or in a flat area near the starting point. Therefore, these weights are usually initialized by small random numbers in the range [-1/n, 1/n], where n is the number of weights connected to the class l. Wessels and Barnard have investigated and pointed out that the initialization of Wij weights should be within the range of

Truong Thi Huong, Faculty of Mathematics, Thai Nguyen University of Education, Thai Nguyen City, Vietnam, Mobile No. +(84)-914061236 Nguyen Van Truong, Faculty of Mathematics, Thai Nguyen University of Education, Thai Nguyen City, Vietnam, Mobile No. +(84)-915016063