Journal for Research| Volume 03 | Issue 01 | March 2017 ISSN: 2395-7549
Voice Recognition System Shruti Joshi Student Department of Computer Engineering Don Bosco College of Engineering Goa, India
Aarti Kumari Student Department of Computer Engineering Don Bosco College of Engineering Goa, India
Pooja Pai Student Department of Computer Engineering Don Bosco College of Engineering Goa, India
Saiesh Sangaonkar Student Department of Computer Engineering Don Bosco College of Engineering Goa, India
Prof. Melba D’Souza Assistant Professor Department of Computer Engineering Don Bosco College of Engineering Goa, India
Abstract Voice recognition system is a system which is used to convert human voice into signal, which can be understood by the machines. When this is achieved, the machine can be made to work, as desired. The machine could be a computer, a typewriter, or even a robot. There are systems available, in which the machine ‘speaks’ the recorded word. But that is out of the scope of this paper. Here, only the human is expected to talk. Further, the voice recognition systems described here, can be used for projects only. Keywords: Speech Recognition System, Acoustic Model, DTMF Decoder, HM 2007, Voice Recognition Moule VR3 _______________________________________________________________________________________________________ I.
INTRODUCTION
The technical paper aims to explain various voice recognition systems, available. There are various software and hardware devices, which use various techniques to decode human speech. History The concept of speech recognition started somewhere in 1940s.Practically the first speech recognition program appeared in 1952 at the bell labs[2],[3], that was about recognition of a digit in a noise free environment. Bell Laboratories designed in 1952 the "Audrey" system, which recognized digits spoken by a single voice. This first speech recognition system, could understand only digits. 1940s and 1950s is considered as the foundational period of the speech recognition technology. In this period, work was done on the foundational paradigms of the speech recognition, which is, automation and information theoretic models. Later, this device was improved to recognize spoken words, numbers etc. to obtain ASR(Automatic Speech Recognition) system. II. LITERATURE SURVEY Types of Speech Recognition Speech recognition systems can be divided into a number of classes based on their ability to recognize different words. A few classes of speech recognition [1], [3], are classified as under: Isolated Speech Isolated words usually involve a pause between two utterances; it doesn’t mean that, it only accepts a single word, but requires one utterance at a time. Connected Speech Connected words or connected speech is similar to isolated speech, but allows separate utterances with minimal pauses between them. Continuous Speech Continuous speech allows the user to speak almost naturally, and is also called computer dictation. Spontaneous Speech At a basic level, it can be thought of as speech, that is natural sounding and not rehearsed. An ASR system with spontaneous speech ability should be able to handle a variety of natural speech features such as words being run together, "ums" and "ahs", and even slight stutters.
All rights reserved by www.journal4research.org
6
Voice Recognition System (J4R/ Volume 03 / Issue 01 / 002)
III. BLOCK DIAGRAM Modeling of the System For a detailed understanding of the voice recognition system, consider the block diagram, shown in figure1.The voice input takes in the spoken words. The A/D converter then decodes it. The modeling can then be done in two ways, as described below.
Fig. 1: General block diagram of speech recognition system
Acoustic Model A model[1],[4] that is created by taking audio recordings of speech and their text transcriptions, and using a software to create statistical representations of the sounds that make up each word.It is used by a speech recognition engine to recognize speech.The software in this model, breaks the words into the phonemes.Phonemes are any of the perceptually distinct units of sound in a specified language that distinguish one word from another. For example p, b, d,and t in the English words, pad, pat, bad, and bat. Language Model Language modeling[1] is used in many natural language processing applications such as speech recognition. It tries to capture the properties of a language and to predict the next word in the speech sequence. The software of this model compares the phonemes to words in its built in dictionary .But, as said earlier, this technical paper will not discuss this type of speech model. IV. WORKING The basic idea behind any speech recognition system is that, the speaker first records the text, desired to be recognized. The recording is done through a microphone, connected to a mobile(in case a software is used), or the voice recognition device (if the recognition hardware is used). This text is retrieved, when called for. As said earlier, there are various speech recognition systems, few among which are discussed in this paper. The software systems: Visual Basic This is a software[9] program, which is based on three labels: yes, no and may-be. These are initialized to large font and a lightgray colour, as shown in figure 3.
Fig. 3: A window showing ‘YES-NO’ recognizer
A reference to the System. Speech component is made, and the code is added. When the application starts the Windows, speech recognition system will be loaded. After saying "Start Listening" or by clicking on the microphone icon, recognition starts. Upon saying "yes", "no" or "maybe", the appropriate label lights up. And if anything else is said, the labels turn back to grey. MATLAB In this software [6], a word-detection algorithm that separates each word from ambient noise, is developed. Then, an acoustic model that gives a robust representation of each word at the training stage is derived. Finally, an appropriate classification algorithm for the testing stage is selected. The speech-detection algorithm is developed by processing the prerecorded speech, frame by frame, within a simple loop. To detect isolated digits, a combination of signal energy and zero-crossing counts for each speech frame is used. Signal energy works well for detecting voiced signals, while zero-crossing counts work well for detecting unvoiced All rights reserved by www.journal4research.org
7
Voice Recognition System (J4R/ Volume 03 / Issue 01 / 002)
signals. Calculating these metrics is simple using core MATLAB mathematical and logical operators. To avoid identifying ambient noise as speech, it is assumed that each isolated word will last for certain time period. This can also be done in hardware, using DSP module. The Hardware Systems DTMF In DTMF[7],[8] there are 16 distinct tones. Each tone is the sum of two frequencies: one from a low and one from a high frequency group. There are four different frequencies in each group. This system uses the same concept that is used in a telephone. IC 8807 is used for this, and it is called DTMF decoder. Along with this, SAPI (an API developed for speech recognition and speech synthesis for windows) has to be used. Figure 4 shows DTMF decoder circuit.
Fig. 4: DTMF decoder circuit using IC 8870
IC HM2007 HM2007 [10] is a voice recognition chip, as shown in the figure 5, with on-chip analog front end, voice analysis, recognition process and system control functions. The input voice command is analyzed, processed, recognized and then obtained at one of its output port which is then decoded , amplified and given to the machine.
Fig. 5: Voice recognition circuit using IC HM2007
Voice Recognition System VR 3[11] This is a compact and easy-control speaking recognition board, which is shown in figure 6. This product is a speaker-dependent voice recognition module. It supports up to 80 voice commands in all. Max 7 voice commands could work at the same time. Any sound could be trained as command. Users need to train the module first, before let it recognizing any voice command. This board has 2 controlling ways: Serial Port (full function), General Input Pins (part of function). General Output Pins on the board could generate several kinds of waves while corresponding voice command is being recognized. This module is arduino compatible.
All rights reserved by www.journal4research.org
8
Voice Recognition System (J4R/ Volume 03 / Issue 01 / 002)
Fig. 6: The voice recognition module VR 3
V. CONCLUSION From the detail study of various voice recognition systems discussed above, it can be concluded that, although, speaker independent systems are also available, they are costly. Thus, the voice recognition module VR 3, which is speaker dependent, is best suited, for use in projects of making automated systems. REFERENCES [1] [2] [3]
Jibran Abbasi, Muzamil Hussain, Shoaib Ahmed, An Implementation of Speech Recognition for Desktop Application, www.scribd.com Speech recognition-The next revolution,5th edition. Sameer Shewalkar, Shoaib Ansari, Masuma Mujawar, Prof.Patil S.S, ‘Handling PC through Speech Recognition and Air Gesture’ International Journal of Computer Science and Information Technology Research ,Vol. 3, Issue 1,January - March 2015 [4] Mark Gales ‘Acoustic Modeling for Speech Recognition: Hidden Markov Models and Beyond?’ December 2009 [5] Charu Joshi, ’Speech Recognition’, www.slideshare.net [6] Developing an Isolated Word Recognition System in MATLAB, in.mathworks.com. [7] Rachna Jain,Dr. S.K Saxena, “Voice Automated MobileRobot,”International Journal of Computer Applications Volume 16–No.2, February 2011. [8] Sija Gopinathan, Athira Krishnan R, Renu Tony, Vishnu M, Yedhukrishnan,” Wireless Voice Controlled Fire Extinguisher Robot,” International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering Vol. 4, Issue 4, April 2015. [9] Madhavi Pednekar, Joel Amanna, Jino John, Abhishesh Singh, Suresh Prajapati, Don Bosco Institute of Technology, Mumbai, India, ‘Voice Operated Intelligent Fire Extinguishing Vehicle’, 2015 International Conference on Technologies for Sustainable Development (ICTSD-2015), Feb. 04 – 06, 2015. [10] Voice Controlled Robot, Engineering Degree by the University of Mumbai By Pratik Chopra Harshad Dange Under the guidance of Mr. Shirish S. Halbe (Asst. Professor & Hobby Centre Co-ordinator ) Department of Electronics Engineering, K. J. Somaiya College of Engineering, Vidyavihar, 2006 (report). [11] S.Suresh, Y. Sindhuja Rao, Modelling Of Secured ‘Voice Recognition Based Automatic Control System’, International Journal of Emerging Technology in Computer Science & Electronics (IJETCSE), Volume 13 Issue 2 –MARCH 2015
All rights reserved by www.journal4research.org
9