ISSN No. Volume 1, No.1, July – August Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9
2012
International Journal of Computing, Communications and Networking Available Online at http://warse.org/pdfs/ijccn01112012.pdf
Acoustic Representation of BODO and RABHA Phonemes 1
Jyotismita Talukdar1, Nabankur Pathak2 Asia Institute of Technology, Bangkok, Thailand, E-mail:jyotismita4@gmail.com 2 Gauhati University, India, E-mail:phtassam@gmail.com
ABSTRACT In this paper we studied the spectral features of Bodo and Rabha Phonemes. The spectral features are studied using formant frequency and Cepstral coefficients. Depending on the analysis on cepstral features and formant frequencies of Bodo and Rabha phonemes and words we observed that significant variation of cepstral coefficients are observed among the Bodo vowels. The cepstral variation is found to be maximum with respect to vowel /o/ and minimum corresponding to vowel /u/, in case of male speakers. Similarly, for female Bodo speakers, the maximum variation of cepstral measure is found corresponding to vowels /o/ and minimum in case of /i/.In case of Rabha vowels, i.e., /o/, /a/, /i/, ./e/,, /u/ and /w/ for both male and female speakers the range of variation of the cepstral coefficient is found to be maximum in case of male speakers with respect to vowel /u/ and minimum with respect to vowel /o/. In case of female speaker, the maximum variation of cepstral co-efficient is found in case of vowel /o/ and minimum with respect to vowel /e/. This observation may be helpful in sex determination for both Bodo and Rabha speakers.The range of variation of cepstral coefficients for Bodo and Rabha male is found within the range of 3.8177 >CBodo>1.1523 and 8.1329>CRabha>2.0579 respectively. The range of variation for female is found 1.9578>CBodo>0.9276 and 7.6546>CRabha>2.4127. i.e. the variation of cepstral features for Bodo vowels is less (Male-2.6654; Female1.0302) with respect to the Rabha vowels (Male-6.0750; Female-5.2419) i.e., the former is stable as compared to the latter. The investigation have shown that the range of
formant frequency is maximum in case of isolated vowels, but when the vowels are placed in the nucleus of a structure like CV, VC or CVC, the formant frequency decreases. Keywords: Acoustic Representation, Phonemes, Cepstral Features 1. INTRODUCTION The Bodos and the Rabhas are the early ethnic and linguistic communities settled in the North-Eastern part of India. The Bodos belong to a larger group of ethnicity called the BodoKachari. Racially, they belong to a Mongoloid stock of the Indo-Mongoloids or Indo-Tibetans. Mythologically, according to Dr. Suniti Kumar Chatterjee, a well-known historian, the Bodos are “the Offspring of son of Vishnu and mother Earth”, who are termed as Kiratas during the epic period. They are recognized as a plain tribe in the 6th schedule of the constitution of India. Historically, there are different views on the early migration of the Mongolian into the North-Eastern part of India. Some of them are: According to Grierson’s “The Linguistic Survey of India”, the Mongolian settled in old Assam, migrated from HoangHo and Yangtze River banks and scattered and dwelt in different river banks of the state. The upper course of the Yangtz and Hoang-Ho in the North-West China were the original home of the Tibeto-Burman races. The hierarchy of Bodo community is shown in figure .
Hierarchy of Bodo & Rabha Languages 1 @ 2012, IJCCN All Rights Reserved
Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9
Speech Data Collection for Acoustic Representation
2. LPC ANALYSIS
Typically, the spoken language data can be classified based on
Linear prediction is a method for signal source modelling dominant in speech signal processing and having wide application in other areas. Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques. The glottis (the space between the vocal cords) produces the sound, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat, the mouth and the nasal cavity) forms the tube, which is characterized by its resonance frequencies, which are called formants.
Mode of speech Medium of recoding Language Dialects Environment
In the present study, speech data is collected from the native speakers of Bodo and Rabha language who are fluent in speaking and writing the language. Male and female speaker of age between 15 to 30 years, possessing a pleasant and a good voice quality are chosen to record the data. The recording is done one-by one manner. The speakers were instructed to read each word or sentence naturally, without emotions and expression. They were asked to speak clearly and to keep their normal speaking rate and volume. To keep the recording consistent, both in phonetic and prosodic (within the framework of symbolic Prosody) terms, an expert in acoustic phonetics supervised the recording. The average duration of recording session was about 4 hours (3 recording session) for each speaker (Male & Female). We have recorded the following data sets for analysis of the cepstral coefficients of vowel phonemes and formant frequencies of some selected Bodo and Rabha words. Bodo and Rabha vowel phonemes for cepstral analysis. Selected word sets of V, CV, VC and CVC structure in both languages for formant analysis. The recording is done in audio editing software Cool Edit Pro and the analysis was done in MATLAB 7.1. Each digitized voice uttered, is divided or blocked into 50 frames of duration 20 millisecond (ms). Every frame contains 441 samples and for each frame 20 cepstral coefficients have been calculated. The spectral characteristics of six Bodo and Rabha vowels, corresponding to male and female speakers were investigated. Approximately 12 samples were averaged to obtain one coefficient. Firstly, 10th frame of all utterances of male and female speakers have been considered for analysis. The variation of the cepstral coefficients for the Bodo and Rabha vowels corresponding to the selected speakers have been shown in Table-(1) & Table-(2) and depicted in Figures-(3 & 4) and Figures-(6 & 7). However, from continuous frame wise analysis, it is observed that: 2, 4, 6, and 8 frames for Bodo speaker (Figure-5) and 9, 14, 16 and 17 frames for Rabha speaker (Figure-8) have shown distinct variation of the cepstral coefficients for male and female speakers.
The basic problem of the LPC system is to determine the formants from the speech signal. The solution of this problem is a difference equation, which expresses each sample of the signal as a linear combination of previous samples. Such an equation is called a linear predictor i.e. Linear Predictive Coding. The coefficients of the difference equation (the prediction coefficients) characterize the formants. Therefore, the LPC system needs to estimate these coefficients. The estimation is made by minimizing the mean square error between the predicted signal and the actual signal. The basic idea behind the LPC model is that a given speech sample at time n, can be approximated as a linear combination of the past p speech samples (Rabiner & Juang, 1993) such that (1) (1)
a , a ,...a
n assumed to be Where the coefficients are 1 2 constants over the speech analysis frame. The equation (1) can be converted to an equality by including an excitation term Gu(n),
(2) Where normalized excitation and G is the gain of excitation. Expressing equation (2) in Z domain we get the relation:
(3) Leading to the transfer function:
(4) based on our knowledge that the actual excitation function for speech is essentially either voiced speech sounds or an unvoiced sound. 2 @ 2012, IJCCN All Rights Reserved
Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9
The relation between and is defined (based on the speech production model Figure-1.1)
s (m i) s (m k ) n
n
This term m term covariance of sn(m) i.e.,
are related to the short
(15)
(5) We consider the linear combination of past speech samples as the estimate
Which can be expressed in compact notation as,
, defined as, (16) Which describe a set of p equations. It is readily shown that (6)
The predictor error,
the minimum mean-square error,
, can be expressed as :
, is defined as , (17) thus the minimum mean-squared error consists of a fixed term and is depend on the predictor coefficients.
(7) And the error transfer function is,
To solve Equation (16) for the optimum coefficients we have to compute
=1-
(8)
The basic problem of linear prediction analysis is to determine the set of predictor coefficient , directly from the speech signal so that the speech properties of the digital filter match those of the speech waveform within the analysis window. To set up the equations that must be solved to determine the predictor coefficients, we define the short-term speech and error segments at time n as, (9) (10) and tried to minimize the mean square error signal at time n, (11) Using equation (9) & (10) we can write
(12) To solve the equation (4.12) we put
, for
, and
, and then solve the resulting set of p simultaneous equations. A method to solve these equations and compute the coefficients is the autocorrelation method. The LPC-Cepstral Co-efficient In the present study, LPC-based cepstral coefficients and phonetically important parameters are used as feature vectors. Cepstral weighted feature vector is obtained for each frame by block processing of continuous speech signals. The analog speech waveform is then sampled and quantized analog-to-digital converter. To spectrally flatten the signal, the speech signal has been subjected to the pre-emphasis procedure through a first order digital filter whose transfer function has been given by , with
(19)
Consecutive speech signal are taken as a single frame. To reduce the undesired effect of Gibbs phenomenon, the frames are multiplied by a windows function (Hamming window), which is given by (Proakis, & Manolakis, 2004;Talukdar , P.H, 2010)
(13) giving (14)
Where N is the number of sample in a block. Now, each frame of the windowed signal is next auto correlated to give (20) 3
@ 2012, IJCCN All Rights Reserved
Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9
m=0, 1, 2…p Where the highest auto correlated value LPC analysis.
(21)
is the order of the
(22) Equation (4.30) shows the computation of cepstral coefficients C p+1, C p+2…C p.
a. LPC Parameter Conversion to Cepstral Coeffecients The LPC cepstral coefficients, which are a set of values that have been found to be more robust, reliable feature set for speech recognition than the LPC coefficients. These coefficients are obtained recursively as follows. Where
Generally,
is taken for cepstral representation.
is the gain term in the LPC
model. Table 1: Range of variation of the cepstral coefficients corresponding to the male and female Bodo speaker Cepstral Coefficients Vowel /o/
Max. 2.2237
Male Min. -1.5940
/a/ /i/ /e/ /u/ /w/
1.6260 1.1528 1.2355 1.0922 1.1832
-0.9615 -0.1253 -0.6532 -0.0601 -0.1541
Range of variation 3.8177
Max 1.9492
Female Min Range of variation -0.0086 1.9578
2.5875 1.2781 1.8887 1.1523 1.3373
0.9492 0.9059 0.9847 1.1385 1.1843
-0.0641 -0.0217 -0.0578 0.0690 -0.1674
Figure 1. Cepstral characteristics of Bodo vowels for male speaker 10
4
0
0
2
-10 0
5
10
2 /i/
5
0
5
10
-2
/e/
0
5
10
2
1
/u/
5
10
-1
0
5
10
2 /i/
1
0
5
10
-2
5
10
/e/
0 0
5
10
-2
0
5
10
4 /u/
0
Cepstral Coefficient
0
2
0 -1
-2 4
2
/w/
0 0
/a/
0
4
1
0
-2
10
0
2
-1
0
2
0 -1
-20 4
1
2
0
Amplitude(dB)
-20
4 /o/
/a/
/o/
Amplitude(dB)
Figure 2. Cepstral characteristics of Bodo vowels for female speaker
10
-10
1.0133 0.9276 1.0425 1.2075 1.3517
/w/
2 0
0
5
-2 10 0 Cepstral Coefficient
5
10
4 @ 2012, IJCCN All Rights Reserved
Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9 Frame no.:2
Frame no.:4
1.4
2 Male Female
1.2
Male Female
1.5
1 1 LogMagnitude(dB)
LogMagnitude(dB)
0.8 0.6 0.4 0.2 0
0.5
0
-0.5
-0.2 -1 -0.4 -0.6
0
2
4
6
8 10 12 Cepstral Coefficients
14
16
18
-1.5
20
0
2
4
6
8 10 12 Cepstral Coefficients
Frame no.:16
14
16
18
20
Frame no.:6
8
1.5 Male Female
7
Male Female
6
1
LogMagnitude(dB)
LogMagnitude(dB)
5 4 3 2
0.5
0
1 0
-0.5
-1 -2
0
2
4
6
8 10 12 Cepstral Coefficients
14
16
18
-1
20
0
2
4
6
8 10 12 Cepstral Coefficients
14
16
18
20
Figure 3. Distinction between Bodo Male & Female speaker in frame no 2,4,16 & 8
Table 2: Range of variation of the Cepstral coefficients corresponding to the Male and Female Bodo speaker Cepstral Coefficients Male Female vowel Max. Min. Range of variation Max. Min. Range of variation /o/ 1.0057 -1.0522 2.0579 3.9045 -3.7501 7.6546 /a/ 1.4964 -1.8083 3.3047 2.0135 -2.0784 4.0919 /i/ 1.4086 -1.8085 3.2171 1.9864 -1.9832 3.9696 /e/ 2.1054 -2.2054 4.3108 0.9164 -1.4963 2.4127 /u/ 3.4942 -4.6387 8.1329 1.0839 -1.6952 2.7791 /w/ 2.4834 -1.0627 3.5461 1.7201 -0.8801 2.6002
Figure 4. Distinction between the male and female Rabha speaker in frame no: 9,14,16 & 17 Figure 5. Cepstral characteristics of Rabha vowels for Figure 6. Cepstral characteristics of Rabha vowels male speaker for female speaker. 5
5
5
/o/ 0
0
0
5
10
2
0
5
/i/
0
5
/e/
0
10
2
-2
/u/
5
5
10
5
10
0
5
10
5
0
-5
0
5
10
2 /u/
1
/w/
0
-5
/e/ /i/
0
2
/w/ 0
0 0
5
10
-5
0
5
Cepstral Coefficient
b. Formant Estimation of BODO and RABHA Phonemes Formant frequency components of human frequencies of vocal concentration during
0
2 0
10
0 -1
0
-2 0
5
1
-5
10
2
0
-2
-5
/a/
0
Amplitude(dB)
Amplitude(dB)
-5
5 /o/
/a/
is the distinguishing frequency speech. It refers to specific resonance tract which have maximum energy the vowels utterance. It can be
10
-1
0
5
10
-2
0
5
10
Cepstral coefficient
qualitatively distinguished by the frequency component of the vowel. Generally, three formants frequencies (F1, F2 and F3) are considered for perception and discrimination of vowels by a listener (Kewley, 1982, 1983). A variety of approaches, such as formant tracking articulator model and auditory model have been used for the analysis and synthesis of speech. The formant tracking method, based on Linear 5
@ 2012, IJCCN All Rights Reserved
Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9
Predictive Code (LPC), has received considerable attention. Based on digitalized technique, the entire frequency range is divided into a fixed number of segment and each segment is represents a formant frequency. A 2nd order resonator for each segment k with a specific boundary is defined. A predictor polynomial defined as the Fourier transform of the corresponding 2nd order predictor is given by (Welling. I, and Ney, II, 1998):
=
(25)
The parameter
, determines the bandwidth of the
resonator defined as negative (-) of formant frequency is given by,
.
.The
(26)
(23) Where and are real valued predictor coefficients. Therefore, from equation (23) we get (24) Table 3: Vowe Female
Male
F1 F2 F3 F1 F2 F3
VC Female
Male
Male
CVC Female
Male
/a/ 380.3 1194.5 3650.4 343.8 1172.0 2494.5
F1 F2 F3 F1 F2 F3
326.4 1623.4 3023,8 539.1 2293.9 3242.6 /hw/(to give)
/ /(I) 326.1 1717.5 3006.2 714.0 2365.5 3199.6 /bu/(to swell)
293.3 2371.3 3455.9 299.3 2932.2 3189.1 /ru/(to boil)
F1 F2 F3 F1 F2 F3
320.7 1687.9 3120.24 494.4 2109.8 3216.3 /san/(the sun) 285.5 1800.6 3286.8 838.3 1494.4 3546.54
382.1 1661.1 3077.1 690.1 2545.8 3355.9 /swb/(smoke) 282.5 1966.4 3135.6 727.5 1421.3 3265.67
311.1 1623.5 3445.5 633.0 2386.2 3298.9 /bar/(wind) 298.5 2657.89 3024.78 892.2 1356.9 3198.00
CV Female
/o/ 319.1 833.3 3030.4 309.3 764.0 2748.8 /or/(fire)
Formant Frequencies Estimation of BODO Words Formant frequency /i/ /e/ 411.3 387.5 2409.8 2240.8 2911.8 3165.0 394.6 384.9 2341.6 2178.1 3002.4 3577.1 /ich/(pain) /un/(back side)
F1 F2 F3 F1 F2 F3
300.5 1424.7 3276.9 280.2 2240.0 2636.7 / /(to beat) 337.6 1853.7 2996.8 375.5 2536.2 2842.9 /lir/(to write) 304.7 2354.87 3254.67 745.3 1293.2 3354.52
/u/ 249.6 997.7 3044.3 244.7 837.5 3690.6 /ul/(confuse)
/w/ 292.7 1527.2 3165.3 206.4 1147.1 2486.9 /em/(bed)
347.2 2353.1 2853.5 442.8 2544.9 3350.7 /be/(this)
311.4 2452.7 2765.3 398.5 1265.7 2435.8 /gi/(to fear)
354.6 1699.5 3001.65 283.4 2250.1 3220.0 /dwn/(to keep) 352.6 1471.0 3163.2 300.7 1238.3 3648.01
334.8 1617.9 2947.7 393.0 2223.5 3287.7 /thar/ (sure) 276.1 2491.2 3155.5 415.6 1629.4 3674.98
6 @ 2012, IJCCN All Rights Reserved
Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9
Formant Frequencies estimation of 6 Bodo vowels for female utterances 50
Formant Frequencies estimation of 6 Bodo vowels for male utterances
50
50
40
/e/ /o/
0
/o/
0
/e/
20
0 0
0
1000
2000
3000
4000
-50
0
1000
2000
3000
-50
4000
20
Amplitude(dB)
50
/u/
/a/
(
Amplitude dB)
-50
0
0 -20 -50
0
1000
2000
3000
4000
20
-40
0
1000
2000
3000
1000
2000
3000
4000
2000
3000
4000
/u/
0 0
1000
2000
3000
4000
0
0
-50
-50
-20
0
1000
2000
3000
4000
40 /i/
/w/
/w/
20 0
0
1000
2000
3000
4000 0 Frequency(Hz)
1000
2000
3000
4000
0
1000
2000
3000
4000
-20
0
1000
2000
3000
4000
Frequency(Hz)
F1-F2 plot shows the vowel triangle for male and female speaker of Bodo language.
Formant frequency curves shows the distinction of formant variation for V,VC,CV & CVC word structure. Change of fromant with v/vc/cv/cvc
F1-F2 for male & female vowel tringle 40
2600 2400
V
Red-Male Blue-Female
/i/
30 VC
2200 20
2000 Gain(dB)
1800 F2 (Hz)
1000
0
-50
-20 -40
0
20
50
/i/ 0
-20 40
/a/
4000
50
0
50
1600 1400 1200
CV
10
CV C
0
-10
1000 -20
/a/ 800 600 200
/u/ 250
300
350 F1 (Hz)
400
450
-30
500
0
500
1000
1500 2000 2500 Frequency (Hz)
3000
3500
4000
F1-F2 plot shows the range formant frequencies of the CV,VC or CVC word structure of Bodo language mostly lies within the range of the formant frequencies of the vowels. Range of Formant Frequency 2500
VC
2000
CV FV
1500 F2 (Hz)
MV
1000
CVC
500 MV-Male,vowel FV-Female,vowel 0 200
Vowel Female
Male
Male
350 F1 (Hz)
400
450
F1 F2 F3 F1 F2 F3
/ora /(you are) 543.7 1748.9 3823.5 643.3 2396.9 3242.6 /to/(hen)
F1 F2
465,9 1874.5
3428 2463.9
F1 F2 F3 F1 F2 F3
CV Female
300
Table 4: Formant frequency Formant frequency /a/ /i/ /e/ 283.5 280.4 1040.8 1480.3 2560.8 1384.4 3600.2 2200.6 3151.8 243.8 301.3 987.4 1654.4 2251.8 2657.9 2865,8 3985.8 3758.4 /intcek/(this much) /ek/(to jump) / /(I am) 375.3 275.7 765.2 1682.9 2769.4 1765.6 30165.5 3321.9 3546.9 987.5 276.9 321.9 2401.6 3001.8 2394.9 3099.0 3548.2 2987.3 /tsa/(to eat) /mi/(vegetable) /the/(fruit)
VC Female
250
/o/ 640.3 2560.4 3220.8 620.2 2154.7 2876.1
354.9 1987.7
698.8 1976.4
500
/u/ 480.9 2360.1 3211.4 504.5 2857.9 3415.8 /ut/(camal) 653.9 2015.9 2976.9 431.9 2656,7 3241.9 //tcu/(thorn) 387.03 1687.5
/w/ 340.5 1080.2 2720.4 253.9 2415.7 2965.8 /r /(length) 392.8 2438.3 2657.9 400.3 1834.9 2865.9 /a /(shout) 565.7 176.5 7
@ 2012, IJCCN All Rights Reserved
Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9
F3 F1 F2 F3
Male
2976.4 498.3 2183.5 3216.3 /tcok/(compound)
CVC Female
F1 F2 F3 F1 F2 F3
Male
2981.6 690.3 2574,8 3582.7
3768,9 541,7 3286.1 3321.8 /rin)(loan)
/na (You are) 265.8 20001.8 3875.4 698.4 1501.0 3315.8
276.4 1867.5 3341.7 845.2 1476.7 3498.6
3415.6 298.5 2261.3 3139.7 /tbau/(owl)
/ben /(where) 312.4 2323.3 3198.6 684.9 1354.8 3299.7
301.6 2782.5 3054.6 875.2 1401,9 3176.0
Formant Frequencies estimation of 6 Rabha Vowels for female utterances 20
2885.6 367.2 2653.0 2976.2
2986.3 391.2 2695.3 3271.6 /tsara /(disease) 261.6 2434.1 3145.1 500.8 1687.8 3679.1
299.7 1976.4 2988.3 301.5 1222.6 3571.8
Formant Frequencies estimation of 6 Rabha vowels for male utterances
20
20
40
0
0
20
-20
-20
/e/
/o/ 0
0
1000
2000
3000
4000
20
-40
-40
0
1000
2000
3000
4000
20 /u/
/a/
0
Amplitude(dB)
Amplitude(dB)
-20
0 -20 -40
0
1000
2000
3000
4000
20
-20
0
1000
2000
3000 /w/
1000
2000
3000
4000
-20
3000
4000
0
1000
2000
3000
Vowel tringle for male & female speaker Red-Male Blue-Female
/i/
1000
2000
3000
4000
0
1000
2000
3000
4000
0
1000
2000
3000
4000
0 0 -20
0
1000
2000
3000
4000
-20 40 20
0
0
-20
-20
0
1000
2000
3000
4000
Frequency(Hz)
Frequency(Hz)
3000
0
20
4000
F1-F2 plot shows the vowel triangle for male and female speaker of Rabha language
-20 40
20
0
0
2000
40
/i/
-20
1000
20
4000
20
0
0 0
F1-F2 plot shows the range formant frequencies of the CV,VC or CVC word structure of Rabha language mostly lies within the range of the formant frequencies of the vowels. Range of Formant frequency 3000
2500
MV-Male,vowel FV-Female,Vowel
2800 2600
2000
2400
F2(Hz)
CV F2(Hz)
2200
1500 /a/
MV CVC
2000
FV
1800 VC 1600
1000
/u/
1400 1200
500
0
500
1000
1500
F1 (Hz)
3.RESULTS AND DISCUSSION Depending on the analysis on cepstral features and formant frequencies of Bodo and Rabha phonemes and words the following observations were made-Significant variation of cepstral coefficients are observed among the Bodo vowels as shown in Table-1. The cepstral variation is found to be maximum with respect to vowel /o/ and minimum corresponding to vowel /u/, in case of male speakers. Similarly, for female Bodo speakers, the maximum variation of cepstral measure is found corresponding to vowels /o/ and minimum in case of /i/. In case of Rabha vowels, i.e., /o/, /a/, /i/, ./e/,, /u/ and /w/ for both male and female speakers the range of variation of the cepstral coefficient (Table-2) is found to be maximum in case of male speakers with respect to vowel /u/ and
1000
0
500
1000
1500
F1(Hz)
minimum with respect to vowel /o/. In case of female speaker, the maximum variation of cepstral co-efficient is found in case of vowel /o/ and minimum with respect to vowel /e/. Significantly, cepstral coefficients of Bodo vowels for frame nos: 2, 3, 6 & 8 have shown distinctive characteristic (Figure-4) for male and female speaker. The variation of the cepstral coefficients for male is very irregular in contrast to the stable variation of female cepstral coefficients. The same phenomenon is also observed in case of Rabha vowels also, but in this case the frame numbers are different i.e. frame no: 9, 14, 16 and 17 (Figure-7). This observation may be helpful in sex determination for both Bodo and Rabha speakers. The range of variation of cepstral coefficients for Bodo and Rabha male is found within the range of 3.8177 >CBodo>1.1523 and 8.1329>CRabha>2.0579 respectively. The range of variation for female is found 8
@ 2012, IJCCN All Rights Reserved
Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9
1.9578>CBodo>0.9276 and 7.6546>CRabha>2.4127. i.e. the variation of cepstral features for Bodo vowels is less (Male2.6654; Female-1.0302)with respect to the Rabha vowels(Male-6.0750;Female-5.2419) i.e., the former is stable as compared to the latter. The Figure 10 and Figure 15 represent the extremes of formant locations in the F1-F2 plane for both Bodo and Rabha vowels. It is found that the formant locations for /u/ (low F1, low F2), /i/ (low F1, high F2) and /a/(high F1, low (low F1, low F2), /i/ (low F1, high F2) and /a/(high F1, low F2) with other vowels are placed with respect to the triangle vertices. The Figure 12 and Figure 16 have shown that the formant frequencies of the selected word sets for both Bodo and Rabha lies within the range of the formant frequencies of the isolated vowels. The investigation have shown that (Table-3 & 4) the range of formant frequency is maximum in case of isolated vowels, but when the vowels are placed in the nucleus of a structure like CV, VC or CVC, the formant frequency decreases.
3.
Borz. Porat. A course in digital Signal Processing, John Willy & Sons. 1997.
4.
Proakis, J.G. and Manolakis, D.G. Digital Signal Processing Principles, Algorithm and Applications, Pearson edition, Third Indian reprint 2004.
5.
Kewley-Port, D. Measurement of formant transitions in naturally produced stop consonant-vowel syllables, Journal of the Acoustical Society of America, 72, pp. 379-389, 1982.
6.
Kewley-Port, D. Time-varying features as correlates of place of articulation of stop consonants, Journal of the Acoustical Society of America, 73, pp. 322-335, 1983.
7.
Willing I., and Ney, II. Formant Estimation for Speech Recognition, IEEE Transactions on Speech and Audio Processing, Vol 6. pp.-36-48,1998.
8.
Talukdar P.H; 2010. Speech production, Analysis and Coding, Lambert Publication, Germany 2010.
ACKNOWLEDGEMENT We highly acknowledge the Ministry of Communication & Information Technology (MIT), New Delhi, Govt. of India, for providing us the relevant information while preparing the manuscript of this paper. REFERENCES 1.
Rabiner, L.R and B. H. Juang. Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliff, New Jersy, 1993.
2.
A.M. Noll. Spectrum Pitch Determination, J. Acoustic Society. A.M. Vol.41. pp.293-309, Feb.1967
9 @ 2012, IJCCN All Rights Reserved