Acoustic Representation of BODO and RABHA Phonemes

Page 1

ISSN No. Volume 1, No.1, July – August Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

2012

International Journal of Computing, Communications and Networking Available Online at http://warse.org/pdfs/ijccn01112012.pdf

Acoustic Representation of BODO and RABHA Phonemes 1

Jyotismita Talukdar1, Nabankur Pathak2 Asia Institute of Technology, Bangkok, Thailand, E-mail:jyotismita4@gmail.com 2 Gauhati University, India, E-mail:phtassam@gmail.com

ABSTRACT In this paper we studied the spectral features of Bodo and Rabha Phonemes. The spectral features are studied using formant frequency and Cepstral coefficients. Depending on the analysis on cepstral features and formant frequencies of Bodo and Rabha phonemes and words we observed that significant variation of cepstral coefficients are observed among the Bodo vowels. The cepstral variation is found to be maximum with respect to vowel /o/ and minimum corresponding to vowel /u/, in case of male speakers. Similarly, for female Bodo speakers, the maximum variation of cepstral measure is found corresponding to vowels /o/ and minimum in case of /i/.In case of Rabha vowels, i.e., /o/, /a/, /i/, ./e/,, /u/ and /w/ for both male and female speakers the range of variation of the cepstral coefficient is found to be maximum in case of male speakers with respect to vowel /u/ and minimum with respect to vowel /o/. In case of female speaker, the maximum variation of cepstral co-efficient is found in case of vowel /o/ and minimum with respect to vowel /e/. This observation may be helpful in sex determination for both Bodo and Rabha speakers.The range of variation of cepstral coefficients for Bodo and Rabha male is found within the range of 3.8177 >CBodo>1.1523 and 8.1329>CRabha>2.0579 respectively. The range of variation for female is found 1.9578>CBodo>0.9276 and 7.6546>CRabha>2.4127. i.e. the variation of cepstral features for Bodo vowels is less (Male-2.6654; Female1.0302) with respect to the Rabha vowels (Male-6.0750; Female-5.2419) i.e., the former is stable as compared to the latter. The investigation have shown that the range of

formant frequency is maximum in case of isolated vowels, but when the vowels are placed in the nucleus of a structure like CV, VC or CVC, the formant frequency decreases. Keywords: Acoustic Representation, Phonemes, Cepstral Features 1. INTRODUCTION The Bodos and the Rabhas are the early ethnic and linguistic communities settled in the North-Eastern part of India. The Bodos belong to a larger group of ethnicity called the BodoKachari. Racially, they belong to a Mongoloid stock of the Indo-Mongoloids or Indo-Tibetans. Mythologically, according to Dr. Suniti Kumar Chatterjee, a well-known historian, the Bodos are “the Offspring of son of Vishnu and mother Earth”, who are termed as Kiratas during the epic period. They are recognized as a plain tribe in the 6th schedule of the constitution of India. Historically, there are different views on the early migration of the Mongolian into the North-Eastern part of India. Some of them are: According to Grierson’s “The Linguistic Survey of India”, the Mongolian settled in old Assam, migrated from HoangHo and Yangtze River banks and scattered and dwelt in different river banks of the state. The upper course of the Yangtz and Hoang-Ho in the North-West China were the original home of the Tibeto-Burman races. The hierarchy of Bodo community is shown in figure .

Hierarchy of Bodo & Rabha Languages 1 @ 2012, IJCCN All Rights Reserved


Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

Speech Data Collection for Acoustic Representation

2. LPC ANALYSIS

Typically, the spoken language data can be classified based on

Linear prediction is a method for signal source modelling dominant in speech signal processing and having wide application in other areas. Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques. The glottis (the space between the vocal cords) produces the sound, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat, the mouth and the nasal cavity) forms the tube, which is characterized by its resonance frequencies, which are called formants.

   

Mode of speech Medium of recoding Language Dialects Environment

In the present study, speech data is collected from the native speakers of Bodo and Rabha language who are fluent in speaking and writing the language. Male and female speaker of age between 15 to 30 years, possessing a pleasant and a good voice quality are chosen to record the data. The recording is done one-by one manner. The speakers were instructed to read each word or sentence naturally, without emotions and expression. They were asked to speak clearly and to keep their normal speaking rate and volume. To keep the recording consistent, both in phonetic and prosodic (within the framework of symbolic Prosody) terms, an expert in acoustic phonetics supervised the recording. The average duration of recording session was about 4 hours (3 recording session) for each speaker (Male & Female). We have recorded the following data sets for analysis of the cepstral coefficients of vowel phonemes and formant frequencies of some selected Bodo and Rabha words.  Bodo and Rabha vowel phonemes for cepstral analysis.  Selected word sets of V, CV, VC and CVC structure in both languages for formant analysis. The recording is done in audio editing software Cool Edit Pro and the analysis was done in MATLAB 7.1. Each digitized voice uttered, is divided or blocked into 50 frames of duration 20 millisecond (ms). Every frame contains 441 samples and for each frame 20 cepstral coefficients have been calculated. The spectral characteristics of six Bodo and Rabha vowels, corresponding to male and female speakers were investigated. Approximately 12 samples were averaged to obtain one coefficient. Firstly, 10th frame of all utterances of male and female speakers have been considered for analysis. The variation of the cepstral coefficients for the Bodo and Rabha vowels corresponding to the selected speakers have been shown in Table-(1) & Table-(2) and depicted in Figures-(3 & 4) and Figures-(6 & 7). However, from continuous frame wise analysis, it is observed that: 2, 4, 6, and 8 frames for Bodo speaker (Figure-5) and 9, 14, 16 and 17 frames for Rabha speaker (Figure-8) have shown distinct variation of the cepstral coefficients for male and female speakers.

The basic problem of the LPC system is to determine the formants from the speech signal. The solution of this problem is a difference equation, which expresses each sample of the signal as a linear combination of previous samples. Such an equation is called a linear predictor i.e. Linear Predictive Coding. The coefficients of the difference equation (the prediction coefficients) characterize the formants. Therefore, the LPC system needs to estimate these coefficients. The estimation is made by minimizing the mean square error between the predicted signal and the actual signal. The basic idea behind the LPC model is that a given speech sample at time n, can be approximated as a linear combination of the past p speech samples (Rabiner & Juang, 1993) such that (1) (1)

a , a ,...a

n assumed to be Where the coefficients are 1 2 constants over the speech analysis frame. The equation (1) can be converted to an equality by including an excitation term Gu(n),

(2) Where normalized excitation and G is the gain of excitation. Expressing equation (2) in Z domain we get the relation:

(3) Leading to the transfer function:

(4) based on our knowledge that the actual excitation function for speech is essentially either voiced speech sounds or an unvoiced sound. 2 @ 2012, IJCCN All Rights Reserved


Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

The relation between and is defined (based on the speech production model Figure-1.1)

 s (m  i) s (m  k ) n

n

This term m term covariance of sn(m) i.e.,

are related to the short

(15)

(5) We consider the linear combination of past speech samples as the estimate

Which can be expressed in compact notation as,

, defined as, (16) Which describe a set of p equations. It is readily shown that (6)

The predictor error,

the minimum mean-square error,

, can be expressed as :

, is defined as , (17) thus the minimum mean-squared error consists of a fixed term and is depend on the predictor coefficients.

(7) And the error transfer function is,

To solve Equation (16) for the optimum coefficients we have to compute

=1-

(8)

The basic problem of linear prediction analysis is to determine the set of predictor coefficient , directly from the speech signal so that the speech properties of the digital filter match those of the speech waveform within the analysis window. To set up the equations that must be solved to determine the predictor coefficients, we define the short-term speech and error segments at time n as, (9) (10) and tried to minimize the mean square error signal at time n, (11) Using equation (9) & (10) we can write

(12) To solve the equation (4.12) we put

, for

, and

, and then solve the resulting set of p simultaneous equations. A method to solve these equations and compute the coefficients is the autocorrelation method. The LPC-Cepstral Co-efficient In the present study, LPC-based cepstral coefficients and phonetically important parameters are used as feature vectors. Cepstral weighted feature vector is obtained for each frame by block processing of continuous speech signals. The analog speech waveform is then sampled and quantized analog-to-digital converter. To spectrally flatten the signal, the speech signal has been subjected to the pre-emphasis procedure through a first order digital filter whose transfer function has been given by , with

(19)

Consecutive speech signal are taken as a single frame. To reduce the undesired effect of Gibbs phenomenon, the frames are multiplied by a windows function (Hamming window), which is given by (Proakis, & Manolakis, 2004;Talukdar , P.H, 2010)

(13) giving (14)

Where N is the number of sample in a block. Now, each frame of the windowed signal is next auto correlated to give (20) 3

@ 2012, IJCCN All Rights Reserved


Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

m=0, 1, 2…p Where the highest auto correlated value LPC analysis.

(21)

is the order of the

(22) Equation (4.30) shows the computation of cepstral coefficients C p+1, C p+2…C p.

a. LPC Parameter Conversion to Cepstral Coeffecients The LPC cepstral coefficients, which are a set of values that have been found to be more robust, reliable feature set for speech recognition than the LPC coefficients. These coefficients are obtained recursively as follows. Where

Generally,

is taken for cepstral representation.

is the gain term in the LPC

model. Table 1: Range of variation of the cepstral coefficients corresponding to the male and female Bodo speaker Cepstral Coefficients Vowel /o/

Max. 2.2237

Male Min. -1.5940

/a/ /i/ /e/ /u/ /w/

1.6260 1.1528 1.2355 1.0922 1.1832

-0.9615 -0.1253 -0.6532 -0.0601 -0.1541

Range of variation 3.8177

Max 1.9492

Female Min Range of variation -0.0086 1.9578

2.5875 1.2781 1.8887 1.1523 1.3373

0.9492 0.9059 0.9847 1.1385 1.1843

-0.0641 -0.0217 -0.0578 0.0690 -0.1674

Figure 1. Cepstral characteristics of Bodo vowels for male speaker 10

4

0

0

2

-10 0

5

10

2 /i/

5

0

5

10

-2

/e/

0

5

10

2

1

/u/

5

10

-1

0

5

10

2 /i/

1

0

5

10

-2

5

10

/e/

0 0

5

10

-2

0

5

10

4 /u/

0

Cepstral Coefficient

0

2

0 -1

-2 4

2

/w/

0 0

/a/

0

4

1

0

-2

10

0

2

-1

0

2

0 -1

-20 4

1

2

0

Amplitude(dB)

-20

4 /o/

/a/

/o/

Amplitude(dB)

Figure 2. Cepstral characteristics of Bodo vowels for female speaker

10

-10

1.0133 0.9276 1.0425 1.2075 1.3517

/w/

2 0

0

5

-2 10 0 Cepstral Coefficient

5

10

4 @ 2012, IJCCN All Rights Reserved


Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9 Frame no.:2

Frame no.:4

1.4

2 Male Female

1.2

Male Female

1.5

1 1 LogMagnitude(dB)

LogMagnitude(dB)

0.8 0.6 0.4 0.2 0

0.5

0

-0.5

-0.2 -1 -0.4 -0.6

0

2

4

6

8 10 12 Cepstral Coefficients

14

16

18

-1.5

20

0

2

4

6

8 10 12 Cepstral Coefficients

Frame no.:16

14

16

18

20

Frame no.:6

8

1.5 Male Female

7

Male Female

6

1

LogMagnitude(dB)

LogMagnitude(dB)

5 4 3 2

0.5

0

1 0

-0.5

-1 -2

0

2

4

6

8 10 12 Cepstral Coefficients

14

16

18

-1

20

0

2

4

6

8 10 12 Cepstral Coefficients

14

16

18

20

Figure 3. Distinction between Bodo Male & Female speaker in frame no 2,4,16 & 8

Table 2: Range of variation of the Cepstral coefficients corresponding to the Male and Female Bodo speaker Cepstral Coefficients Male Female vowel Max. Min. Range of variation Max. Min. Range of variation /o/ 1.0057 -1.0522 2.0579 3.9045 -3.7501 7.6546 /a/ 1.4964 -1.8083 3.3047 2.0135 -2.0784 4.0919 /i/ 1.4086 -1.8085 3.2171 1.9864 -1.9832 3.9696 /e/ 2.1054 -2.2054 4.3108 0.9164 -1.4963 2.4127 /u/ 3.4942 -4.6387 8.1329 1.0839 -1.6952 2.7791 /w/ 2.4834 -1.0627 3.5461 1.7201 -0.8801 2.6002

Figure 4. Distinction between the male and female Rabha speaker in frame no: 9,14,16 & 17 Figure 5. Cepstral characteristics of Rabha vowels for Figure 6. Cepstral characteristics of Rabha vowels male speaker for female speaker. 5

5

5

/o/ 0

0

0

5

10

2

0

5

/i/

0

5

/e/

0

10

2

-2

/u/

5

5

10

5

10

0

5

10

5

0

-5

0

5

10

2 /u/

1

/w/

0

-5

/e/ /i/

0

2

/w/ 0

0 0

5

10

-5

0

5

Cepstral Coefficient

b. Formant Estimation of BODO and RABHA Phonemes Formant frequency components of human frequencies of vocal concentration during

0

2 0

10

0 -1

0

-2 0

5

1

-5

10

2

0

-2

-5

/a/

0

Amplitude(dB)

Amplitude(dB)

-5

5 /o/

/a/

is the distinguishing frequency speech. It refers to specific resonance tract which have maximum energy the vowels utterance. It can be

10

-1

0

5

10

-2

0

5

10

Cepstral coefficient

qualitatively distinguished by the frequency component of the vowel. Generally, three formants frequencies (F1, F2 and F3) are considered for perception and discrimination of vowels by a listener (Kewley, 1982, 1983). A variety of approaches, such as formant tracking articulator model and auditory model have been used for the analysis and synthesis of speech. The formant tracking method, based on Linear 5

@ 2012, IJCCN All Rights Reserved


Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

Predictive Code (LPC), has received considerable attention. Based on digitalized technique, the entire frequency range is divided into a fixed number of segment and each segment is represents a formant frequency. A 2nd order resonator for each segment k with a specific boundary is defined. A predictor polynomial defined as the Fourier transform of the corresponding 2nd order predictor is given by (Welling. I, and Ney, II, 1998):

=

(25)

The parameter

, determines the bandwidth of the

resonator defined as negative (-) of formant frequency is given by,

.

.The

(26)

(23) Where and are real valued predictor coefficients. Therefore, from equation (23) we get (24) Table 3: Vowe Female

Male

F1 F2 F3 F1 F2 F3

VC Female

Male

Male

CVC Female

Male

/a/ 380.3 1194.5 3650.4 343.8 1172.0 2494.5

F1 F2 F3 F1 F2 F3

326.4 1623.4 3023,8 539.1 2293.9 3242.6 /hw/(to give)

/ /(I) 326.1 1717.5 3006.2 714.0 2365.5 3199.6 /bu/(to swell)

293.3 2371.3 3455.9 299.3 2932.2 3189.1 /ru/(to boil)

F1 F2 F3 F1 F2 F3

320.7 1687.9 3120.24 494.4 2109.8 3216.3 /san/(the sun) 285.5 1800.6 3286.8 838.3 1494.4 3546.54

382.1 1661.1 3077.1 690.1 2545.8 3355.9 /swb/(smoke) 282.5 1966.4 3135.6 727.5 1421.3 3265.67

311.1 1623.5 3445.5 633.0 2386.2 3298.9 /bar/(wind) 298.5 2657.89 3024.78 892.2 1356.9 3198.00

CV Female

/o/ 319.1 833.3 3030.4 309.3 764.0 2748.8 /or/(fire)

Formant Frequencies Estimation of BODO Words Formant frequency /i/ /e/ 411.3 387.5 2409.8 2240.8 2911.8 3165.0 394.6 384.9 2341.6 2178.1 3002.4 3577.1 /ich/(pain) /un/(back side)

F1 F2 F3 F1 F2 F3

300.5 1424.7 3276.9 280.2 2240.0 2636.7 / /(to beat) 337.6 1853.7 2996.8 375.5 2536.2 2842.9 /lir/(to write) 304.7 2354.87 3254.67 745.3 1293.2 3354.52

/u/ 249.6 997.7 3044.3 244.7 837.5 3690.6 /ul/(confuse)

/w/ 292.7 1527.2 3165.3 206.4 1147.1 2486.9 /em/(bed)

347.2 2353.1 2853.5 442.8 2544.9 3350.7 /be/(this)

311.4 2452.7 2765.3 398.5 1265.7 2435.8 /gi/(to fear)

354.6 1699.5 3001.65 283.4 2250.1 3220.0 /dwn/(to keep) 352.6 1471.0 3163.2 300.7 1238.3 3648.01

334.8 1617.9 2947.7 393.0 2223.5 3287.7 /thar/ (sure) 276.1 2491.2 3155.5 415.6 1629.4 3674.98

6 @ 2012, IJCCN All Rights Reserved


Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

Formant Frequencies estimation of 6 Bodo vowels for female utterances 50

Formant Frequencies estimation of 6 Bodo vowels for male utterances

50

50

40

/e/ /o/

0

/o/

0

/e/

20

0 0

0

1000

2000

3000

4000

-50

0

1000

2000

3000

-50

4000

20

Amplitude(dB)

50

/u/

/a/

(

Amplitude dB)

-50

0

0 -20 -50

0

1000

2000

3000

4000

20

-40

0

1000

2000

3000

1000

2000

3000

4000

2000

3000

4000

/u/

0 0

1000

2000

3000

4000

0

0

-50

-50

-20

0

1000

2000

3000

4000

40 /i/

/w/

/w/

20 0

0

1000

2000

3000

4000 0 Frequency(Hz)

1000

2000

3000

4000

0

1000

2000

3000

4000

-20

0

1000

2000

3000

4000

Frequency(Hz)

F1-F2 plot shows the vowel triangle for male and female speaker of Bodo language.

Formant frequency curves shows the distinction of formant variation for V,VC,CV & CVC word structure. Change of fromant with v/vc/cv/cvc

F1-F2 for male & female vowel tringle 40

2600 2400

V

Red-Male Blue-Female

/i/

30 VC

2200 20

2000 Gain(dB)

1800 F2 (Hz)

1000

0

-50

-20 -40

0

20

50

/i/ 0

-20 40

/a/

4000

50

0

50

1600 1400 1200

CV

10

CV C

0

-10

1000 -20

/a/ 800 600 200

/u/ 250

300

350 F1 (Hz)

400

450

-30

500

0

500

1000

1500 2000 2500 Frequency (Hz)

3000

3500

4000

F1-F2 plot shows the range formant frequencies of the CV,VC or CVC word structure of Bodo language mostly lies within the range of the formant frequencies of the vowels. Range of Formant Frequency 2500

VC

2000

CV FV

1500 F2 (Hz)

MV

1000

CVC

500 MV-Male,vowel FV-Female,vowel 0 200

Vowel Female

Male

Male

350 F1 (Hz)

400

450

F1 F2 F3 F1 F2 F3

/ora /(you are) 543.7 1748.9 3823.5 643.3 2396.9 3242.6 /to/(hen)

F1 F2

465,9 1874.5

3428 2463.9

F1 F2 F3 F1 F2 F3

CV Female

300

Table 4: Formant frequency Formant frequency /a/ /i/ /e/ 283.5 280.4 1040.8 1480.3 2560.8 1384.4 3600.2 2200.6 3151.8 243.8 301.3 987.4 1654.4 2251.8 2657.9 2865,8 3985.8 3758.4 /intcek/(this much) /ek/(to jump) / /(I am) 375.3 275.7 765.2 1682.9 2769.4 1765.6 30165.5 3321.9 3546.9 987.5 276.9 321.9 2401.6 3001.8 2394.9 3099.0 3548.2 2987.3 /tsa/(to eat) /mi/(vegetable) /the/(fruit)

VC Female

250

/o/ 640.3 2560.4 3220.8 620.2 2154.7 2876.1

354.9 1987.7

698.8 1976.4

500

/u/ 480.9 2360.1 3211.4 504.5 2857.9 3415.8 /ut/(camal) 653.9 2015.9 2976.9 431.9 2656,7 3241.9 //tcu/(thorn) 387.03 1687.5

/w/ 340.5 1080.2 2720.4 253.9 2415.7 2965.8 /r /(length) 392.8 2438.3 2657.9 400.3 1834.9 2865.9 /a /(shout) 565.7 176.5 7

@ 2012, IJCCN All Rights Reserved


Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

F3 F1 F2 F3

Male

2976.4 498.3 2183.5 3216.3 /tcok/(compound)

CVC Female

F1 F2 F3 F1 F2 F3

Male

2981.6 690.3 2574,8 3582.7

3768,9 541,7 3286.1 3321.8 /rin)(loan)

/na (You are) 265.8 20001.8 3875.4 698.4 1501.0 3315.8

276.4 1867.5 3341.7 845.2 1476.7 3498.6

3415.6 298.5 2261.3 3139.7 /tbau/(owl)

/ben /(where) 312.4 2323.3 3198.6 684.9 1354.8 3299.7

301.6 2782.5 3054.6 875.2 1401,9 3176.0

Formant Frequencies estimation of 6 Rabha Vowels for female utterances 20

2885.6 367.2 2653.0 2976.2

2986.3 391.2 2695.3 3271.6 /tsara /(disease) 261.6 2434.1 3145.1 500.8 1687.8 3679.1

299.7 1976.4 2988.3 301.5 1222.6 3571.8

Formant Frequencies estimation of 6 Rabha vowels for male utterances

20

20

40

0

0

20

-20

-20

/e/

/o/ 0

0

1000

2000

3000

4000

20

-40

-40

0

1000

2000

3000

4000

20 /u/

/a/

0

Amplitude(dB)

Amplitude(dB)

-20

0 -20 -40

0

1000

2000

3000

4000

20

-20

0

1000

2000

3000 /w/

1000

2000

3000

4000

-20

3000

4000

0

1000

2000

3000

Vowel tringle for male & female speaker Red-Male Blue-Female

/i/

1000

2000

3000

4000

0

1000

2000

3000

4000

0

1000

2000

3000

4000

0 0 -20

0

1000

2000

3000

4000

-20 40 20

0

0

-20

-20

0

1000

2000

3000

4000

Frequency(Hz)

Frequency(Hz)

3000

0

20

4000

F1-F2 plot shows the vowel triangle for male and female speaker of Rabha language

-20 40

20

0

0

2000

40

/i/

-20

1000

20

4000

20

0

0 0

F1-F2 plot shows the range formant frequencies of the CV,VC or CVC word structure of Rabha language mostly lies within the range of the formant frequencies of the vowels. Range of Formant frequency 3000

2500

MV-Male,vowel FV-Female,Vowel

2800 2600

2000

2400

F2(Hz)

CV F2(Hz)

2200

1500 /a/

MV CVC

2000

FV

1800 VC 1600

1000

/u/

1400 1200

500

0

500

1000

1500

F1 (Hz)

3.RESULTS AND DISCUSSION Depending on the analysis on cepstral features and formant frequencies of Bodo and Rabha phonemes and words the following observations were made-Significant variation of cepstral coefficients are observed among the Bodo vowels as shown in Table-1. The cepstral variation is found to be maximum with respect to vowel /o/ and minimum corresponding to vowel /u/, in case of male speakers. Similarly, for female Bodo speakers, the maximum variation of cepstral measure is found corresponding to vowels /o/ and minimum in case of /i/. In case of Rabha vowels, i.e., /o/, /a/, /i/, ./e/,, /u/ and /w/ for both male and female speakers the range of variation of the cepstral coefficient (Table-2) is found to be maximum in case of male speakers with respect to vowel /u/ and

1000

0

500

1000

1500

F1(Hz)

minimum with respect to vowel /o/. In case of female speaker, the maximum variation of cepstral co-efficient is found in case of vowel /o/ and minimum with respect to vowel /e/. Significantly, cepstral coefficients of Bodo vowels for frame nos: 2, 3, 6 & 8 have shown distinctive characteristic (Figure-4) for male and female speaker. The variation of the cepstral coefficients for male is very irregular in contrast to the stable variation of female cepstral coefficients. The same phenomenon is also observed in case of Rabha vowels also, but in this case the frame numbers are different i.e. frame no: 9, 14, 16 and 17 (Figure-7). This observation may be helpful in sex determination for both Bodo and Rabha speakers. The range of variation of cepstral coefficients for Bodo and Rabha male is found within the range of 3.8177 >CBodo>1.1523 and 8.1329>CRabha>2.0579 respectively. The range of variation for female is found 8

@ 2012, IJCCN All Rights Reserved


Jyotismita Talukdar et al., International Journal of Computing, Communications and Networking, 1(1), July – August, 1-9

1.9578>CBodo>0.9276 and 7.6546>CRabha>2.4127. i.e. the variation of cepstral features for Bodo vowels is less (Male2.6654; Female-1.0302)with respect to the Rabha vowels(Male-6.0750;Female-5.2419) i.e., the former is stable as compared to the latter. The Figure 10 and Figure 15 represent the extremes of formant locations in the F1-F2 plane for both Bodo and Rabha vowels. It is found that the formant locations for /u/ (low F1, low F2), /i/ (low F1, high F2) and /a/(high F1, low (low F1, low F2), /i/ (low F1, high F2) and /a/(high F1, low F2) with other vowels are placed with respect to the triangle vertices. The Figure 12 and Figure 16 have shown that the formant frequencies of the selected word sets for both Bodo and Rabha lies within the range of the formant frequencies of the isolated vowels. The investigation have shown that (Table-3 & 4) the range of formant frequency is maximum in case of isolated vowels, but when the vowels are placed in the nucleus of a structure like CV, VC or CVC, the formant frequency decreases.

3.

Borz. Porat. A course in digital Signal Processing, John Willy & Sons. 1997.

4.

Proakis, J.G. and Manolakis, D.G. Digital Signal Processing Principles, Algorithm and Applications, Pearson edition, Third Indian reprint 2004.

5.

Kewley-Port, D. Measurement of formant transitions in naturally produced stop consonant-vowel syllables, Journal of the Acoustical Society of America, 72, pp. 379-389, 1982.

6.

Kewley-Port, D. Time-varying features as correlates of place of articulation of stop consonants, Journal of the Acoustical Society of America, 73, pp. 322-335, 1983.

7.

Willing I., and Ney, II. Formant Estimation for Speech Recognition, IEEE Transactions on Speech and Audio Processing, Vol 6. pp.-36-48,1998.

8.

Talukdar P.H; 2010. Speech production, Analysis and Coding, Lambert Publication, Germany 2010.

ACKNOWLEDGEMENT We highly acknowledge the Ministry of Communication & Information Technology (MIT), New Delhi, Govt. of India, for providing us the relevant information while preparing the manuscript of this paper. REFERENCES 1.

Rabiner, L.R and B. H. Juang. Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliff, New Jersy, 1993.

2.

A.M. Noll. Spectrum Pitch Determination, J. Acoustic Society. A.M. Vol.41. pp.293-309, Feb.1967

9 @ 2012, IJCCN All Rights Reserved


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.