Er. Abhishek Thakur* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 8, Issue No. 1, 100 - 106
Design of Matlab-Based Automatic Speaker Recognition and Control System Er. Abhishek Thakur
Assistant Prof. Neeru Singla
ECE department Student RIEIT Railmajra, Punjab, India abhithakur25@gmail.com
ECE department Faculty RIEIT Railmajra, Punjab, India neerusingla99@gmail.com
I.
II.
ASR AND CONTROL SYSTEM
ES
Keywords- Automatic Speech Recognition (ASR); Matlab; Microcontroller (89C52)
In the current design project a basic speaker recognition algorithm has been written to sort through a rule base in matlab and choose the one most likely match based on the pre define time frame of the speech utterance as well as the location of the formants in the frequency and time domain representation.
T
Abstract— This project gives the design of control system and speaker recognition code using matlab. Matlab’s straightforward programming interface makes it an ideal tool for speech analysis projects. For the current project, experience was gained in general matlab programming and the design of control system. A basic speaker recognition algorithm has been written to sort through a rule base in matlab and choose the one most likely match based on the pre define time frame of the speech utterance.
INTRODUCTION
A
Development of speaker identification systems began as early as the 1960s with exploration into voiceprint analysis, where characteristics of an individual's voice were thought to be able to characterize the uniqueness of an individual much like a fingerprint. The early systems had many flaws and research ensued to derive a more reliable method of predicting the correlation between two sets of speech utterances. Speaker identification research continues today under the realm of the field of digital signal processing where many advances have taken place in recent years [1].
The main objective of this paper is to design and implement English key word speech recognition and control system using matlab, which is capable of recognizing and responding to key word speech inputs. This English key word speech recognizer would be applicable and useful for various key word-based applications, such as automation of office or business, monitoring of manufacturing processes, automation of telephone or telecommunication services, editing of specialized medical reports and development of aids for the handicapped [4]-[6]. In this research, we utilized rule based method to recognize English language key words.
B. System’s Architecture and Algorithums
IJ
The performance of the speech recognition systems is given in terms of a word error rate (%) as measured for a specified technology, for a given task, with specified task syntax, in a specified mode, and for a specified word vocabulary. Robust speech recognition systems can be applied to high accuracy connected digits recognition system finds application in the recognition of personal identification numbers, credit card numbers, and telephone numbers. Continuous speech recognition systems find applications in voice repertory dialer where eyes free, hands free dialing of numbers is possible [2]. Vocal communication between people and computers includes the synthesis of speech from text, automatic speech recognition and speaker recognition. Speaker recognition involves the speaker identification to output the identity of the person most likely to have spoken from among a given population or to verify a person's identity who he/she claims to be from a given speech input [3].
A. Scopes
ISSN: 2230-7818
Figure-1 Block diagram of control system
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 100
Er. Abhishek Thakur* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 8, Issue No. 1, 100 - 106
When we say key word microphone convert analog signal into the electrical signal and after doing the attenuation of the signal by the attenuator signal is transferred to the voice processor. In voice processor voice file is executed in matlab and a recognized word ASCII code is then transferred to the microcontroller through the RS232 standard communication protocol. This ASCII key word code perform particular task which is assigned in the microcontroller program. We are using LM 35 temperature sensor to sense the temperature below 30 degree and above 50 degree. Relay card is used to control temperature and speed of the fan. We used transmitter and receiver for wireless robot. Motor driver is used to control the direction of robot.
Step First: As shown in table 1 record these key words in database as .wav file having different time frames for each key word using these commands. 1) file_reverse=wavrecord(duration1,fs); This command is used to record command word with parameters: time frame “duration1=40000” and frequency “fs=8000”. 2) [x_reverse y_reverse]=find(file_reverse>.1); This command is used to take above 0.1 magnitude speech sample and discard below.
APPROACH
This multi faceted design project can be categorized into different sections: software section, hardware section.
T
3) diff_reverse=max(x_reverse)-min(x_reverse); III.
This command finds the difference between maximum and minimum value of speech sample and store in a variable. 4) wavwrite(file_reverse,'c:\voice\reverse.wav'); This command is used to store voice sample in memory location of the computer.
ES
A. Software section
Then plot the graph between time and magnitude axis. The code for this process can be found in Appendix A.
A
In this section, the first step is to define time frame for recording command words having duration=40000 mille seconds, frequency fs=8000 HZ. The next step is to record key word sample using “wavrecord” command, take value above 0.1 magnitude voice sample, calculate the difference and store the file using “wavwrite” command. To store other samples for key words procedure is same as previous. In the second step read the file and take above 0.1 magnitude value for the current voice sample. Calculate the difference and store in a variable which is then compared with pre define time frame if it match then give the output. The time frame to speak and store key words is as shown in table-I below. TABLE I. TIME FRAME FOR COMMAND WORDS Time Frame for Key words
S.N.
Code to 89C52
Time in mille seconds
01 02 03 04 05 06 07 08
1000 -TO- 2300 2300 -TO- 6000 6000 -TO- 10000 10000 -TO- 13000 13000 -TO- 16000 16000 -TO- 18500 18500 -TO- 21000 21000 -TO- 25000
IJ
Command Words 1 2 3 4 5 6 7 8
Go Reverse Robo Stop Temperature 30 Degree Temperature 50 Degree Set Fan Low Set Fan Medium Set Fan Stop Right Now
Table -I Show go key word having time frame between 1 to 2.3 seconds for voice sample speak and store in the memory. If voice sample in between this frame then ASCII code 01 will generate.
ISSN: 2230-7818
Figure-2 Recorded voice key words
Step Second: In this step we compare the recorded sample with the real time speech. This comparison is based on the recorded sample time duration comparison as shown in table -1. If the real time
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 101
Er. Abhishek Thakur* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 8, Issue No. 1, 100 - 106
speech matched with recorded speech then plot the graph “voice matched” otherwise “No match” for that key word and send ASCII code to the microcontroller using serial communication which performs particular operation to that key word defined in the program. This process is very simple for key word recognition. The code for this process can be found in Appendix B. B. HARDWARE SECTION
S.N. 1 2 3 4 5 6 7
2) fwrite(s,01); This command is used for serial communication it send 01 code to microcontroller to perform related operation. 3) xlabel('go matched');
MICROCONTROLLER PORT CONNECTION Ports of 89C52 µc P1.0, P1.1, P1.2 P1.4 P1.5 P1.6 P1.7 P2.2 P2.3
Hardware Devices Control ADC Temperature 30 degree Temperature 50 degree set Go Reverse Fan low set Fan medium set
4) End
ES
As we can see in table -II all peripheral are connected to corresponding port pin of microcontroller (89C52). Port 1.0, 1.1, 1.2, 1.3 pins are connected to the analog to digital converter. Port 1.4 and 1.5 pins are connected to the temperature sensor. Port 1.6 and 1.7 are connected to the robot control section. Port 2.2 and 2.3 are connected to relay circuit. These peripherals work according to our program which is stored in microcontroller. When command word given by user through microphone is recognized in matlab ASCII code will generate. This ASCII code given to 89C52 microcontroller which will perform particular operation related to that key word. The code for this process can be found in Appendix C.
This command is used to plot graph for the recognized word.
T
TABLE II.
If difference between real time and recorded signal in between 1 second to 2.3 seconds then give output.
Figure-4 Go Key word matched.
Eight key words are used in this project to control the hardware. User said go key word and this key word recognized by comparing with the recorded go key word as shown in figure -4.
IJ
A
To show the results for these key words we use the logic as shown below.
Figure-3 Implementation of control system
IV.
RESULTS
A. Software Results
To recognize speech key word we use these commands. 1) if (diff_rec >1000 && diff_rec< 2300)
ISSN: 2230-7818
1) if (size_xrec(1,1)==0) diff_rec=0;
If the difference is zero then variable hold value zero and show the figure no matched voice. 2) else diff_rec=max(x_rec)-min(x_rec); diff_rec1=num2str(diff_rec); figure(2),plot(t,file_rec); title('current voice') ylabel(diff_rec1) end If the time frame is in between maximum and minimum range then it calculate the numerical value in between the range specified in table 1 and put that value in variable after that it will plot the figure for current voice and
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 102
Er. Abhishek Thakur* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 8, Issue No. 1, 100 - 106
matched command words showing x label as time and y label for calculated numerical value of the time frame. Speech recognition procedure for other key words are similar as previous. The voice recognition figure for other key words is shown below in figure 5.
TABLE III. SN 1 2 3 4 5 6 7 8
PORT OPERATION FOR COMMAND WORDS
Command Go Reverse Robo Stop Fan Low Set Fan Medium Set Temperature 30 Degree Temperature 50 Degree Set Fan Stop Right Now
V
Task Executed Port 1.7=1and P1.6=0 Port 1.6=1 and P1.7=0 Port 1.6 & P 1.7=1 Port 2.2=1 and P2.3=0 Port 2.3=1 and P2.2=0 Port 1.4=1else 0 Port 1.5 =1 else 0 Port 2.2 & 2.3=0
CONCLIUSION AND FURTHER WORK
T
The implemented algorithm and control system control fan speed, temperature of heater and robot direction using the voice key word. It demonstrates its reliability and ease of future development. Based on obtained experimental results it demonstrates that the proposed algorithm is indeed functional and it can be used in voice key word control of home appliances and industrial robots. Percentage of correct recognition of commands is high enough.
ES
The main contribution of this study is that it presents the idea of key word recognition and control system. The experiments also show that the approach is good for key word recognition. The proposed ASR and Control System was completely implemented as shown in Figure-3, our effort will be directed toward developing the more appropriate and convenient method.
REFERENCES
A
[1]
IJ
Figure-5. Recognized voice key words.
B. Hardware Results
As we can see in table –III, if go key word recognized then Port 1.7 goes logic one and Port 1.6 goes logic zero. Which means that robot moves in forward direction. The logic one and logic zero position of the port is shows in table III for corresponding key word.
ISSN: 2230-7818
[2]
[3]
[4]
[5]
[6]
E. Darren Ellis, “Design of a Speaker Recognition Code using Matlab”. Department of Computer and Electrical Engineering – University of Thennessee, Knoxville Tennessee 37996 09 May 2001 Revathi, R. Ganapathy and Y. Venkataramani, “Text Independent Speaker Recognition and Speaker Independent Speech Recognition Using Iterative Clustering Approach”, IJCSIT, Vol 1, No 2, November 2009 Claudia Moisa, Helga Silaghi, Andrei Silaghi, “Speech and Speaker Recognition for the Command of an Industrial Robot”, Mathematical Methods and Computational Techniques in Electrical Engineering, ISBN: 978-960-473-238-7 Ahmad A. M. Abushariah(1), Teddy S. Gunawan(2) and Mohammad A. M. Abushariah, “English Digits Speech Recognition System Based on Hidden Markov Models”, (ICCCE 2010), 11-13 May 2010, Kuala Lumpur, Malaysia Zhao Lishuang, Han Zhiyan, “Speech Recognition System Based on Integrating feature and HMM”, International Conference on Measuring Technology and Mechatronics Automation 2010 Md. Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani, Md. Saifur Rahman, “Speaker Identification Using Mel Frequency Cepstral Coefficients”, 3rd International Conference on Electrical and Computer Engineering (ICECE 2004), 28-30 December 2004, Dhaka, Bangladesh
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 103
Er. Abhishek Thakur* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 8, Issue No. 1, 100 - 106
%---------- recording pre defined keywords ---------------duration=16000; duration1=40000; fs=8000; %------------------------ saving files -------------------------------file_ready=wavrecord(duration,fs); wavwrite(file_ready,'c:\voice\ready.wav'); file_go=wavrecord(duration1,fs); [x_go y_go]=find(file_go>.1); diff_go=max(x_go)-min(x_go); wavwrite(file_go,'c:\voice\go.wav'); file_reverse=wavrecord(duration1,fs); [x_reverse y_reverse]=find(file_reverse>.1); diff_reverse=max(x_reverse)-min(x_reverse); wavwrite(file_reverse,'c:\voice\reverse.wav');
Appendix B %---------------------------------------------------------------------s = serial('COM1'); set(s,'BaudRate',4800,'DataBits',8,'Parity','none','StopBits',1,'Fl owControl','none') ; fopen(s); %----------------- recording pre defined keywords --------------duration=16000; duration1=40000; fs=8000; %---------- writing files --------------------------------------------file_ready=wavread('c:\voice\ready.wav'); file_go=wavread('c:\voice\go.wav'); [x_go y_go]=find(file_go>.1); diff_go=max(x_go)-min(x_go); file_reverse=wavread('c:\voice\reverse.wav');
ES
file_robostop=wavrecord(duration1,fs); [x_robostop y_robostop]=find(file_robostop>.1); diff_robostop=max(x_robostop)-min(x_robostop); wavwrite(file_robostop,'c:\voice\robostop.wav');
[x_fanstoprightnow y_fanstoprightnow]=find(file_fanstoprightnow>.1); diff_fanstoprightnow=max(x_fanstoprightnow)min(x_fanstoprightnow); wavwrite(file_fanstoprightnow,'c:\voice\fanstoprightnow.wav' );
T
Appendix A
file_fanlowset=wavrecord(duration1,fs); [x_fanlowset y_fanlowset]=find(file_fanlowset>.1); diff_fanlowset=max(x_fanlowset)-min(x_fanlowset); wavwrite(file_fanlowset,'c:\voice\fanlowset.wav');
A
file_fanmediumset=wavrecord(duration1,fs); [x_fanmediumset y_fanmediumset]=find(file_fanmediumset>.1); diff_fanmediumset=max(x_fanmediumset)min(x_fanmediumset); wavwrite(file_fanmediumset,'c:\voice\fanmediumset.wav');
IJ
file_temprature30degree=wavrecord(duration1,fs); [x_temprature30degree y_temprature30degree]=find(file_temprature30degree>.1); diff_temprature30degree=max(x_temprature30degree)min(x_temprature30degree); wavwrite(file_temprature30degree,'c:\voice\temprature30degr ee.wav');
file_temprature50degreeset=wavrecord(duration1,fs); [x_temprature50degreeset y_temprature50degreeset]=find(file_temprature50degreeset>. 1); diff_temprature50degreeset=max(x_temprature50degreeset)min(x_temprature50degreeset); wavwrite(file_temprature50degreeset,'c:\voice\temprature50de greeset.wav'); file_fanstoprightnow=wavrecord(duration1,fs);
ISSN: 2230-7818
[x_reverse y_reverse]=find(file_reverse>.1); diff_reverse=max(x_reverse)-min(x_reverse); file_robostop=wavread('c:\voice\robostop.wav'); [x_robostop y_robostop]=find(file_robostop>.1); diff_robostop=max(x_robostop)-min(x_robostop); file_fanlowset=wavread('c:\voice\fanlowset.wav'); [x_fanlowset y_fanlowset]=find(file_fanlowset>.1); diff_fanlowset=max(x_fanlowset)-min(x_fanlowset); file_fanmediumset=wavread('c:\voice\fanmediumset.wav');
[x_fanmediumset y_fanmediumset]=find(file_fanmediumset>.1); diff_fanmediumset=max(x_fanmediumset)min(x_fanmediumset); file_fanstoprightnow=wavread('c:\voice\fanstoprightnow.wav' ); [x_fanstoprightnow y_fanstoprightnow]=find(file_fanstoprightnow>.1); diff_fanstoprightnow=max(x_fanstoprightnow)min(x_fanstoprightnow); file_temprature30degree=wavread('c:\voice\temprature30degr ee.wav'); [x_temprature30degree y_temprature30degree]=find(file_temprature30degree>.1); diff_temprature30degree=max(x_temprature30degree)min(x_temprature30degree); file_temprature30degree=wavread('c:\voice\temprature30degr ee.wav');
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 104
Er. Abhishek Thakur* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 8, Issue No. 1, 100 - 106
[x_temprature50degreeset y_temprature50degreeset]=find(file_temprature50degreeset>. 1); diff_temprature50degree=max(x_temprature50degreeset)min(x_temprature50degreeset); file_temprature50degreeset=wavread('c:\voice\temprature50de greeset.wav'); %-----------------------main loop ---------------------------------for i=1:40000 t(i,1)=i; end for i=1:10 sound(file_ready,fs); pause(.3);
if (diff_rec >2300 && diff_rec< 6000) fwrite(s,02); xlabel('reverse matched'); end
T
ES
if (diff_rec >13000 && diff_rec< 16000) fwrite(s,07); xlabel('fan medium set matched'); end
figure(1),subplot(4,1,3); plot(file_robostop); title('ROBO STOP training file')
figure(1),subplot(4,1,4); plot(file_temprature30degree); title('TEMPRATURE 30 DEGREE training file')
A
figure(3),subplot(4,1,1); plot(file_temprature50degreeset); title('TEMPRATURE 50 DEGREE SET training file') figure(3),subplot(4,1,2); plot(file_fanstoprightnow); title('FAN STOP RIGHT NOW training file')
IJ
figure(3),subplot(4,1,3); plot(file_fanlowset); title('FAN LOW SET training file')
figure(3),subplot(4,1,4); plot(file_fanmediumset); title('FAN MEDIUM SET training file')
ISSN: 2230-7818
if (diff_rec >1000 && diff_rec< 2300) fwrite(s,01); xlabel('go matched'); end
if (diff_rec >10000 && diff_rec< 13000) fwrite(s,06); xlabel('fan low set matched'); end
figure(1),subplot(4,1,2); plot(file_reverse); title('REVERSE training file')
if (size_xrec(1,1)==0) diff_rec=0; else diff_rec=max(x_rec)-min(x_rec); diff_rec1=num2str(diff_rec);
end
if (diff_rec >6000 && diff_rec< 10000) fwrite(s,03); xlabel('robo stop matched'); end
file_rec=wavrecord(duration1,fs); figure(1),subplot(4,1,1); plot(file_go); title('GO training file')
[x_rec y_rec]=find(file_rec>.1); size_xrec=size(x_rec);
figure(2),plot(t,file_rec); title('current voice') ylabel(diff_rec1)
if (diff_rec >16000 && diff_rec< 18500) fwrite(s,04); xlabel('temp 30 degree matched'); end if (diff_rec >18500 && diff_rec< 21000) fwrite(s,05); xlabel('temp 50 degree set matched'); end if (diff_rec >21000 && diff_rec< 25000) fwrite(s,08); xlabel('fan stop right now'); end
if (diff_rec==0) xlabel('no matched voice'); end pause(3) end Appendix C ;---------------------------------------------------------------------$include(mod51) flags equ temp30 bit
20h 0
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 105
Er. Abhishek Thakur* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 8, Issue No. 1, 100 - 106
1 p2.0 p2.1 p2.2 p2.3 temp30 temp50 delay intlcd lcdwelcome starttimer temp30, checkfor50 p1.4, startdevice p2.1 recint
checkfor50:jnb jnb
temp50, recint p1.5, startdevice
p2.1 ri,wait ri a,sbuf #01h,next2 p1.6 p1.7 wait #02h,next3 p1.6 p1.7 wait #03h,next4
ES
clr
A
p2.1 ljmp recint startdevice: setb recint: jnb clr mov cjne a, clr setb ljmp next2: cjne a, setb clr ljmp next3: cjne a, setb p1.6 setb p1.7 ljmp wait next4: cjne a,#04h, setb temp30 clr temp50 ljmp wait next5: cjne a,#05h, clr temp30 setb temp50 ljmp wait next6: cjne a,#06h, setb p2.2 clr p2.3 ljmp wait next7: cjne a,#07h, clr p2.2 setb p2.3 ljmp wait next8: cjne a,#08h, clr p2.2 clr p2.3
next5
IJ
next6
ISSN: 2230-7818
next7
next8
wait
ljmp wait ;---------------------------------------------------------------------intlcd: mov a,#38h lcall commandsend mov a,#0eh lcall commandsend mov a,#01h lcal commandsend mov a,#06h lcall commandsend mov a,#80h lcall commandsend ret ;----------------------------------------------------------------------commandsend:clr p3.6 clr p3.7 mov p0,a setb p3.7 nop nop nop clr p3.7 lcall delay ret datasend:setb p3.6 clr p3.7 mov p0,a setb p3.7 nop nop nop clr p3.7 lcall delay ret ;----------------------------------------------------------------------delay: mov r2,#02h l31: mov r0,#0ffh l2: mov r1,#0ffh l1: djnz r1,l1 djnz r0,l2 djnz r2,l31 ret keytokeydelay: mov r2,#03h l5: mov r0,#0ffh l4: mov r1,#0ffh l3: djnz r1,l3 djnz r0,l4 djnz r2,l5 ret ;-----------------------------------------------------------------------starttimer: mov tmod,#20h mov th1,#0fah mov scon,#50h setb tr1 ret intports: ret
T
temp50 bit org 0000h main: clr clr clr clr clr clr lcall lcall lcall lcall wait: jnb jnb clr ljmp
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 106