Acct 14 ig 52 by IJEEE (Elixir Publications)

IJEEE, Vol. 1, Spl. Issue 1 (March 2014)

e-ISSN: 1694-2310 | p-ISSN: 1694-2426

Automatic Control of Instruments Using Efficient Speech Recognition Algorithm Abhishek Thakur1, Rajesh Kumar2, Amandeep Bath3, Jitender Sharma4 1,2,3,4

Electronics & Communication Engineering Department, Indo Global College of Engineering, Punjab, India

abhithakur25@gmail.com, 2errajeshkumar2002@gmail.com, amandeep_batth@rediffmail.com, 4er_jitender2007@yahoo.co.in

Abstract- Matlab straight forward programming interface make it an ideal tool for Hindi Key word Recognition. For the extraction of the feature, Hindi Key word database has been designed by using the Matlab 7.5. The database consists of the eight key words. Each key word has been stored in database by the ten speakers, eight male speakers and two female speakers consist of total 80 samples for eight commands. Features of the speech signal which are extracted in the form of MFCC coefficients and Dynamic Time Warping (DTW) has been used as features matching techniques. This thesis presents the technique to detect utterance using end point detection, MFCC to extract features and DTW to compare the test patterns. The recognition results are tested for clean and noisy test data. The system can be said to be robust as average accuracy for clean data is 97.50 % while that for noisy data is 91.25 % or above is acceptable since most people would not mind repeating a command to the system one out of ten times or less. The system can be implemented using one of the common microcontrollers with a small amount of dedicated memory and an analog to digital converter to accept the input speech. The system would be fast, small and cost efficient to be incorporated into a wide variety of consumer electronics. The aim of this thesis is therefore to develop a speaker dependent, isolated word, limited vocabulary speech recognition system that is small enough to fit in a small household appliance and that can be operated in real time. Index Terms- Automatic Speech Recognition (ASR), Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Wrapping (DTW) I. INTRODUCTION Although many systems exist for speech recognition, none of them address the needs for consumer level applications. In order for a system to be incorporated in the everyday needs of a consumer, the system must be speaker independent, fast, low cost, require no training and small enough to be fit inside a consumer appliance. Such a system will move speech recognition from the domain of the academic or industrial application to that of a common home user. The above system can be implemented using current technology once a certain number of compromises are made. For example, let's say a speech recognition system is to be developed so that it can be incorporated into a home microwave oven. One can immediately see that there is no need to have a 60,000 word International Journal of Electrical & Electronics Engineering

vocabulary for such a system, a dozen words including the digits are sufficient for its operation. The system could be further simplified if one does not allow the user to change the number of words in the vocabulary. The Second aspect of the system is that it does not have to accept continuous speech. For example, a common command may be "Move.... Forward.... Fast.... Start. Proposed design for home automation system and Matlab based Hindi key word speech recognition system is for disabled persons, as they are not able to move from one place to other and canâ&#x20AC;&#x;t locate switches. This paper attempt to provide them solution, by sitting on wheel chair or bed they can switch on and off home appliances and also control internal parameters like wheel chair direction, fan speed, heater temperature. Physically challenged persons find difficulty in power ON/OFF their home loads such as fan, light, AC etc. they require an attendee to do these things. In the absence of the attendee their world seems to be more difficult. This design helps the person with physical disability and elderly to navigate easily within their home in a wheelchair by giving voice commands. [3-5] designed for navigation of robot and forklift by giving voice commands. Some of the voice based design uses a voice recognition chip with integrated or interfaced memory chip that has a drawback of having limited number of voice commands. The reported design Speech Recognition Based Wireless Automation of Home Appliances for Disabled Persons involves automation home loads by giving voice commands in a wireless environment. II.

SYSTEM OVERVIEW

This paper is related to the controlling of the electronic/electrical equipment using voice key words. In this paper we are going to recognize the Hindi key word of the person and control the desired parameter. The goal of the thesis is to help the disabled and handicapped persons, who are not able to locate switches or, not able to reach there. This thesis can also work as a security purpose by operating the machines through the voice and will be operated by only one person. This can also work for the home atomization and replace the switches and remotes by the voice command. This is done using software designed in Matlab 7.5 using MFCC and DTW. By using this software Hindi key words spoken in real time will match with pre recorded samples and generate ASCII code. These ASCII codes send to microcontroller using serial communication RS-232. All peripherals are controlled by the microcontroller. The output of the www.ijeee-apm.com

microcontroller controls the various applications upon receiving the input from the software. The relays are controlled on the ports of microcontroller to activate a particular appliance connected to the particular port.

III.

HARDWARE DESIGN

a) Voice processor: Next stage is voice processor stage consisting of .m voice processor file. After comparison in voice processor data is send to microcontroller for control or driving action, we are using RS232 as application communication protocol. The whole process goes in the following manner e.g. if we say AAGE key word the action related to “AAGE key word” has to performed and if we say “PICHE key word” then the action related to PICHE key word has to be performed. As shown in figure 2 when we say key any word the microphone takes analog signal and converts it to the electrical signal then attenuation of the signal is performed by the attenuator. Attenuated signal is transferred to the voice processor, these files are executed and an ASCII code is then transferred to the microcontroller through the RS 232 standard communication protocol. In this manner the voice will hold the control action of the machine or the electric appliance.

Fig. 1: Microcontroller Interfacing. Automatic speech recognition system and home automation system port connection with external peripherals is shown in Table 1. All peripheral are connected to corresponding port pin of microcontroller (89C52) as given in Table 1. These peripherals work according to our program and discussed in software design section. When command word given by user through microphone it is recognized by proposed algorithm and ASCII code will be generated. These ASCII code given to 89C52 microcontroller, if recognized code match then appliance will perform particular operation related to that key word. TABLE 1: MICROCONTROLLER PORT CONNECTION S.N. 1 2 3 4 5 6 7 8

Ports of 89C52 µc P1.0, P1.1, P1.2 P1.4 P1.5 P1.6 P1.7 P1.6, P1.7 P2.2 P2.3

Hardware Devices Control ADC Temperature 30 deg. Temperature 50 deg. Go Reverse Break Fan low set Fan medium set

Hindi Key Word BAND TIESH PACHAS AAGE PICHE RUKO DHEERE TEJ

As we can see in table 1, if AAGE key word recognized then Port 1.7 goes logic one and Port 1.6 goes logic zero. Which means that robot moves in forward direction. The logic one and logic zero position of the port is shows in table 1 for corresponding key word.

www.ijeee-apm.com

Fig. 2: Speaker recognition process. b) Temperature sensor circuit: We can use wide range of supply voltages lies between single supply 3 V to 30 V (LM2902 and LM2902Q 3V to 26V), or Dual supplies. Common mode input voltage range includes ground that allow direct sensing to near ground. The low supply current drain is independent to the supply voltage 0.8 mA Typ. Low input bias and offset parameters includes input offset voltage 3 mV Typ. input offset current 2 nA Typ. input bias Current 20 nA Typ. differential input voltage range equal to maximum rated supply voltage 32 V open loop differential voltage amplification 100 V/mV Typ.

Fig. 3: LM 35 Interface. c) Analog to digital converter: Analog to digital converter device is a high current four channel driver designed to accept standard DTL or TTL logic levels, monolithic integrated high voltage and drive inductive loads (such as relays solenoids, DC and stepping motors) and switching power transistors. To simplify use as two bridges International Journal of Electrical & Electronics Engineering

each pair of channels is equipped with an enable input. A separate supply input is provided for the logic, allowing operation at a lower voltage and internal clamp diodes are included. This device is suitable for use in switching applications at frequencies up to 5 kHz. The L293D is assembled in a 16 lead plastic package which has 4 center pins connected together and used for heat sinking. The L293DD is assembled in a 20 lead surface mount which has 8 center pins connected together and used for heat sinking. 600Milli amperes output current capability per channel, 1.2A peak output current per channel, enable facility, over temperature protection, logical input voltage up to 1.5 V, internal clamp diodes.

Fig. 5: ASK Transmitter and Receiver. The transmitter module accepts serial data at a maximum of XX baud rate. They can be directly interfaced to a microcontroller or can be used in remote control applications with the help of encoder/decoder ICs. The encoder IC takes in parallel data at the TX side packages it into serial format and then transmits it with the help of a RF transmitter module. At the RX end, the decoder IC receives the signal via the RF receiver module, decodes the serial data and reproduces the original data in the parallel format. Now in order to control say one motor, we require 2 bits of information while we need 4 bits of information to control 2 motors. HT12E and HT12D is 4 channel encoder/decoder ICs directly compatible with the specified RF module. e) Wheel chair control: Receiver receives the data in serial form then it decodes that data and at last it is again converted into parallel form and given to the receiver side CPU. At the receiver side the decoder circuit IC HT 12D is used as a decoder. At the decoder again the codes are received in serial form which then again converted into parallel form. These decoded signals are then given as an input to CPU. At the receiver side the IC MN4519 is used as the buffer. 3

VCC VCC +12V

TIP-127

LM339

2K7

+12V

2K7

U19C

2K7 14

Q9 2K7

LM339

8 9

COMP

NOT

U13

6K8

R31

VREF

DC MOTOR CONTROL CARD

2K7

7404

+5V

NAND

3 adc

R30

U11B

NAND1 7400 2

7400

VREF

NOT

NAND NAND2

NOT

4 U10A

DIR/2

NOT 7404

PAD4

COMP. -

VCC +12V TIP-122 NPN

Q8 NPN TIP-122

2K7

OCPAD

U14

LM339

NOT DIR 2

VREF

DC--MOTOR

U20B

VCC

VREF

NOT

12V

LM339

2K7

PNP

2K7 1

COMP. -

U21D

6 -

NOT 7404

2K7

TIP-127 PNP Q11

+12V

VREF 2

U18A

DIR DIR/1 1

+12V

VCC

d) Building a wireless remote control: Now question arises that how you can get rid of that long wired tail dangling out of your remote control robot? Well, transforming your wired remote control into a wireless one isnâ&#x20AC;&#x;t as difficult as you may think. The easiest solution would be to hack those cheap wireless toy cars, take their electronic guts out and use them in your robot. But if you want more flexibility, you can build a custom remote control system. The idea is to use off the shelf RF Tx/Rx modules. These modules, once a rare commodity, are now widely and cheaply available. In this particular discussion, we shall be using ASK (Amplitude Shift Keying) based TX/RX pair operating at 433 MHz

Fig. 4: Functional block diagram of A to D converter.

7404 PULSE

NOT

CNTRL=0

CONTROL

Fig. 6: Robot Control. The nature of this buffer is FIFO that is First In First Out. In order to drive motors, we would need to connect a suitable motor driver at the output of the decoder IC. The motor driver circuit can consist of a Relay, transistorized H-Bridge or motor driver ICs like the L293D, L298 etc.

International Journal of Electrical & Electronics Engineering

www.ijeee-apm.com

IV. SOFTWARE DESIGN Keyword recognition algorithm is designed according to the block diagram as shown in figure below.

Fig. 9: After End Point Detection for Hindi Key word “AAGE”. Fig. 7: Block diagram of Mel Frequency Cepstral Coefficient Speech recognition algorithm is written in matlab 7.0 and results are tested in clean or noisy test data. The explanation and results are discussed in main program step by step as shown below: Step1. Declare variables: clear all; % clear all variables close all;% close all files clc % clear screen ncoeff = 13; %Required number of mfcc coefficients N = 8; %Number of words in vocabulary k = 4; %Number of nearest neighbors to choose fs=16000; %Sampling rate duration1 = 0.15; %Initial silence duration in seconds duration2 = 2; %Recording duration in seconds G=2; %vary this factor to compensate for amplitude variations NSpeakers = 5; %Number of training speakers Step2. Input Keyword and perform EPD:

Step3. Addition of silence: p=length(speechIn)-length(silence); for i=1:p silence=[silence ;0]; end fprintf('Finished recording.\n'); fprintf('System is trying to recognize what you have spoken...\n'); speechIn1 = [silence;speechIn]; %pads with 150 ms silence speechIn2 = speechIn1.*G;

Fig. 10: Addition of silence 0.15 seconds in Hindi key word “AAGE”. Step4. Noise Reduction: speechIn3 = speechIn2 - mean(speechIn2); %DC offset elimination speechIn = nreduce(speechIn3,fs); %Applies spectral subtraction

Fig. 11: After noise reduction for Hindi key word “AAGE”. Step5. Windowing, DFT and Mel filter bank: rMatrix1 = mfccf(13,speechIn,fs); %Compute test feature vector

Fig. 8: End Point Detection for Hindi Key word “AAGE”. for i=1:8; % Check real time 8 keywords fprintf('Press any key to start %g seconds of speech recording...', duration2); pause; % Wait for 0.15 second silence = wavrecord(duration1*fs, fs); %Record keyword fprintf('Recording speech...'); speechIn = wavrecord(duration2*fs, fs); % duration*fs is the total number of sample points www.ijeee-apm.com

Fig. 12: Shows the time signal of the Hindi key word AAGE and Mel filter bank of the word computed via FFT. Step6. Inverse DFT: rMatrix = CMN(rMatrix1); %Removes convolutional noise Sco = DTWScores(rMatrix,N); %computes all DTW scores [SortedScores,EIndex] = sort(Sco); %Sort scores increasing K_Vector = EIndex(1:k); %Gets k lowest scores Neighbors = zeros(1,k); %will hold k-N neighbors

International Journal of Electrical & Electronics Engineering

else fprintf('You have just said %s.\n',Word(Modal,:)); %Prints recognized word end V. RESULT DISCUSSION We made two experiments, in noise and in clean environment one using traditional method (Md. Rashidul Hasan et al. 2004) and the other using the developed technique. The templets were used as input to the same recognition system using DTW in order to measure the performance for each method. First experiment uses the

Fig.13: DCT and Spectrogram for „AAGE‟ Key Word. % Code below uses the index of the returned k lowest scores to determine their classes for t = 1:k u = K_Vector(t); for r = 1:NSpeakers-1 if u <= (N) break else u = u - (N); end end Neighbors(t) = u; end

Fig. 15: Shows results in chart for noise environment with or without EPD.

Fig.14: Result for keyword recognition „AAGE‟ Key Word.

%Apply k-Nearest Neighbor rule Nbr = Neighbors[Modal,Freq] = mode(Nbr); %most frequent value Word = strvcat('Forward-AAGE', 'Reverse-PICHE', 'BreakRUKO', 'Thirty-TEESH', 'Fifty-PACHAS', 'low-DHERE', 'Medium-TEJ', 'Stop-BAND'); if mean(abs(speechIn)) < 0.01 fprintf('No microphone connected or you have not said anything.\n'); elseif ((k/(Freq)) > 2) %if no majority fprintf('The word you have said could not be properly recognised.\n'); International Journal of Electrical & Electronics Engineering

traditional method (Md. Rashidul Hasan et al. 2004). The dictionary contains Hindi key words and digits. For each hindi key word and digits were selected a number of templates from several training candidates (4-10) and second experiment use 8 templates. A new generated template was used for each key word and digit. Both experiments were speaker dependent. The test was made using 8 test records for each key words and digits. The accuracy for Hindi Key Word recognition is calculated by speaking one command 10 times and find out how many times it recognize Key Words with different rate of speech. Chart shows approximately 91.25 % accuracy with end point detection when user 1 say key Word in 10 × 12 room with noise environment (Fan On, Tv On, and Cooking in Kitchen) and without end point detection average accuracy is 80.00 %. Figure shows chart for Hindi key word recognition in noise environment with or without EPD.

www.ijeee-apm.com

REFERENCES [1]

[2]

[3]

[4]

[5]

Fig. 16: Shows results in chart for clean environment with or without EPD.

Chart shows approximately 97.50 % accuracy with end point detection when user 1 say key Word in 10 × 12 room with clean environment (Fan Off, Tv Off, No Cooking in Kitchen) and without end point detection average accuracy is 87.50 %. Figure 2 shows chart for hindi key word recognition in clean environment with or without EPD. After calculating MFCC features, DTW finds nearest distance between spoken word and recorded samples of 10 speakers. If nearest distance of recorded samples matches with five or more samples then it will show output and related to key word operation performed, if match is below five samples then play recording word not recognized please try again. VI.

[6]

[7]

[8]

[9]

[10] CONCLUSION AND FUTURE WORK

This paper presents a simple technique for word detection using end point detection, feature extraction using Mel frequency cepstral coefficient and feature matching using dynamic time warping. The implemented algorithm and control system control fan speed, temperature of heater and robot direction using the voice key word. It demonstrates its reliability and ease of future development. Based on obtained experimental results it demonstrates that the proposed algorithm is indeed functional and it can be used in voice key word recognition home automation system and industrial robots. Percentage of correct recognition of key word is high enough. The recognition results are tested for clean and noisy test data. The system can be said to be robust as average accuracy for clean data is 97.50 while that for noisy data is 91.25 %. The main contribution of this study is that it presents the idea of Hindi key word recognition and Home Automation system. The experiments also show that the approach is good for Hindi key word recognition. The proposed ASR and Control System was completely implemented, our effort will be directed toward developing the more appropriate and convenient method.

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

www.ijeee-apm.com

A. Rathinavelu, G.Anupriya, A.S.Muthanantha murugavel, “Speech Recognition Model for Tamil Stops”, Proceedings of the World Congress on Engineering, ISBN:978-988-98671-5-7, Vol I, pp. 543 – 547, July 2 - 4, 2007. Adriana. Tapus and Brian Scassellati, “The grand challenges in helping humans through social robotics”, IEEE Robotics & Automation Magazine, Vol 14, Issue 1, pp. 35–42, 2007. Anjli Bala, Abhijeet Kumar and Nidhika Birla, “Voice Command Recognition System Based on MFCC and DTW”, International Journal of Engineering Science and Technology, ISSN: 0975-5462, Vol. 2, No 12, pp. 7335-7342, Dec. 2010. Atanas Ouzounov (2010) “Acestral Feature and Text Dependent Speaker Identification-A Comparative stdy”, Cybernetics and Information Technologies, Vol. 10, No. 1, pp. 1-12, 2010. B. H. Juang and Lawrence R. Rabiner, “Automatic Speech Recognition – A Brief History of the Technology”, Vol. 10, No. 3, August 2004 Bengt J. Borgstrom, “HMM-Based Reconstruction of Unreliable Spectrographic Data for Noise Robust Speech Recognition”, IEEE Transactions on Audio and Language Processing, Vol. 18, No. 6, pp. 1612-1623 August 2010. Bharti W. Gawali, Santosh Gaikwad, Pravin Yannawar, Suresh C.Mehrotra, “Marathi Isolated Word Recognition System using MFCC and DTW features”, ACEEE Int. J. on Information Technology, Vol. 01, No. 01, Mar 2011. Cini Kurian and Kannan Balakrishnan, “Automated Transcription System for Malayam Language”, International Journal of Computer Application, Vol. 19, No. 5, April 2011. F. K. Soong, A. E. Rosenberg, L. R. Rabiner and B. H. Juang, “A Vector Quantization Approach to Speaker Recognition”, Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '85, vol 10, No 3, pp. 387-390, 1985. Fausto “Tito” Poz and Durand R. Begault, “Voice Identification and Elimination Using Aural Spectographic Protocol”, AES 26th International Conference, Denver, Colorado, USA, 7–9 July 2005. Josef Rajnoha et al. (2011) “ASR systems in Noisy Environment: Analysis and Solutions for Increasing Noise Robustness”, Radioengineering, Vol. 20, No. 1, April 2011. K. H. Davis, R. Biddulph and S. Balashek, “Automatic Recognition of spoken digits”, The Journal of the acoustical society of america, vol 24, No 6, November, 1952. K. M. Ravikumar, R. Rajagopal and H. C. Nagaraj, “An Approach for Objective Assessment of Stuttered Speech Using MFCC Features”, DSP Journal, Volume 9, Issue 1, June, 2009. Khalid Saeed, “Sound and Voice Verification and Identification A Brief Review of Töeplitz Approach”, Znalosti 2008, pp. 2227, 2008. Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, “Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques”, Journal of Computing, ISSN 2151-9617, Volume 2, Issue 3, March 2010 M. A. Anusuya and S. K. Katti, “Speech Recognition by Machine: A Review”, (IJCSIS) International Journal of Computer Science and Information Security, Vol. 6, No. 3, 2009. Maayan Geffet, Yair Wiseman and Dror Feitelson, “Automatic Alphabet Recognition”, Springer Science, Vol. 8, pp. 25–40, 2005. Mark D. Skowronski and John G. Harris, “Improving the Filter Bank of a Classic Speech Feature Extraction Algorithm”, IEEE

International Journal of Electrical & Electronics Engineering

Intl Symposium on Circuits and Systems, Bangkok, Thailand, vol 4, pp. 281-284, May 25 - 28, 2003.

AUTHORS First Author– Abhishek Thakur: M. Tech. in Electronics and Communication Engineering from Punjab Technical University, MBA in Information Technology from Symbiosis Pune, M.H. Bachelor in Engineering (B.E.- Electronics) from Shivaji University Kolhapur, M.H. Five years of work experience in teaching and one year of work experience in industry. Area of interest: Digital Image and Speech Processing, Antenna Design and Wireless Communication. International Publication: 7, National Conferences and Publication: 6, Book Published: 4 (Microprocessor and Assembly Language Programming, Microprocessor and Microcontroller, Digital Communication and Wireless Communication). Working with Indo Global College of Engineering Abhipur, Mohali, P.B. since 2011. Email: abhithakur25@gmail.com Second Author – Rajesh Kumar is working as Associate Professor at Indo Global College of Engineering, Mohali, Punjab. He is pursuing Ph.D from NIT, Hamirpur, H.P. and has completed his M.Tech from GNE, Ludhiana, India. He completed his B.Tech from HCTM, Kaithal, India. He has 11 years of academic experience. He has authored many research papers in reputed International Journals, International and National conferences.

International Journal of Electrical & Electronics Engineering

His areas of interest are VLSI, Microelectronics and Image & Speech Processing. Third Author – Amandeep Batth: M. Tech. in Electronics and Communication Engineering from Punjab Technical University, MBA in Human Resource Management from Punjab Technical University , Bachelor in Technology (B-Tech.) from Punjab Technical University . Six years of work experience in teaching. Area of interest: Antenna Design and Wireless Communication. International Publication: 1, National Conferences and Publication: 4. Working with Indo Global College of Engineering Abhipur, Mohali, P.B. since 2008. Email: amandeep_batth@rediffmail.com Fourth Author – Jitender Sharma: M. Tech. in Electronics and Communication Engineering from Mullana University, Ambala, Bachelor in Technology (B-Tech.)from Punjab Technical University . Five years of work experience in teaching. Area of interest:, Antenna Design and Wireless Communication. International Publication: 1 National Conferences and Publication:6 and Wireless Communication). Working with Indo Global college since 2008. E-mail: er_jitender2007@yahoo.co.in