Performance of different classifiers in speech recognition

IJRET: International Journal of Research in Engineering and Technology

ISSN: 2319-1163

PERFORMANCE OF DIFFERENT CLASSIFIERS IN SPEECH RECOGNITION Sonia Suuny1, David Peter S2, K. Poulose Jacob3 1

Associate Professor, 2, 3Professor, Dept. of Computer Science, Prajyoti Niketan College, Kerala, India, 2, 3 Professor., Dept. of Computer Science, Cochin University of Science & Technology, Kerala, India, sonia.deepak@yahoo.co.in, davidpeter@cusat.ac.in, kpj@cusat.ac.in 1

Abstract Speech is the most natural means of communication among human beings and speech processing and recognition are intensive areas of research for the last five decades. Since speech recognition is a pattern recognition problem, classification is an important part of any speech recognition system. In this work, a speech recognition system is developed for recognizing speaker independent spoken digits in Malayalam. Voice signals are sampled directly from the microphone. The proposed method is implemented for 1000 speakers uttering 10 digits each. Since the speech signals are affected by background noise, the signals are tuned by removing the noise from it using wavelet denoising method based on Soft Thresholding. Here, the features from the signals are extracted using Discrete Wavelet Transforms (DWT) because they are well suitable for processing non-stationary signals like speech. This is due to their multiresolutional, multi-scale analysis characteristics. Speech recognition is a multiclass classification problem. So, the feature vector set obtained are classified using three classifiers namely, Artificial Neural Networks (ANN), Support Vector Machines (SVM) and Naive Bayes classifiers which are capable of handling multiclasses. During classification stage, the input feature vector data is trained using information relating to known patterns and then they are tested using the test data set. The performances of all these classifiers are evaluated based on recognition accuracy. All the three methods produced good recognition accuracy. DWT and ANN produced a recognition accuracy of 89%, SVM and DWT combination produced an accuracy of 86.6% and Naive Bayes and DWT combination produced an accuracy of 83.5%. ANN is found to be better among the three methods.

Index Terms: Speech Recognition, Soft Thresholding, Discrete Wavelet Transforms, Artificial Neural Networks, Support Vector Machines and Naive Bayes Classifier. -----------------------------------------------------------------------***----------------------------------------------------------------------1. INTRODUCTION Automatic Speech recognition (ASR) is one of the intensive areas of research since it helps people to communicate in a more natural and effective way[1]. Speech recognition systems can be characterized by many parameters. The commonly used method to measure the performance of a speech recognition system is the recognition accuracy. Many parameters affect the accuracy of the speech recognition system. The accuracy and acceptance of speech recognition has improved a lot in the last few years. Though automatic speech recognition systems have improved a lot these years and are now extensively used, their accuracy continues to lag behind human performance, particularly in adverse conditions [2]. In spite of the advances in ASR, there is still a considerable gap between human and machine performance [3]. Speech is a multi-component signal with time varying frequency and amplitude. Due to this variability, transitions may occur at different times in different frequency bands. Automatic recognition of spoken digits is one of the challenging tasks in the field of speech recognition [4]. A

spoken digit recognition process is needed in many applications that need numbers as input such as automated banking system, airline reservations, voice dialing telephone, automatic data entry, command and control etc [5]. ASR is basically a pattern recognition problem which also involves a number of technologies and research areas like Signal Processing, Natural Language Processing, Statistics etc [6]. Speech signals are non stationary in nature. Speech recognition is a complex task due to the differences in gender, emotional state, accent, pronunciation, articulation, nasality, pitch, volume, and speed variability in people speak [7]. Presence of background noise and other types of disturbances also affect the performance of a speech recognition system. Speaker independence is difficult to achieve because these models recognize the speech patterns of a large group of people. The paper is organized as follows. Section 2 gives a brief description of the problem definition. The methodology used in the design is explained in section 3. The creation of the digits database is illustrated in section 4. Section 5 describes the method used for preprocessing. Section 6 elaborates the feature extraction technique used in this work followed by the classification techniques in section 7. Section 8 presents a

__________________________________________________________________________________________ Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org

590

Turn static files into dynamic content formats.

Create a flipbook