www.ijape.org International Journal of Automation and Power Engineering (IJAPE) Volume 4, 2015 doi: 10.14355/ijape.2015.04.003
Robust Speaker Verification Using Improved PNCC Based on GMM‐UBM Xinxing Jing1, Bingwei Xiang2, Haiyan Yang3, Ping Zhou4 School of Information and Communication Guilin University of Electronic Technology, Guilin, Guangxi, China likeblue2000@163.com
2
Abstract Focused on the issue that the robustness of traditional Mel Frequency Cepstral Coefficient (MFCC) feature degrades drastically in speaker verification in noisy environments, a kind of suitable extraction method for low SNR environments based on Gaussian Mixture Model‐Universal Background Model (GMM‐UBM) and improved Power Normalized Cepstral Coefficient (PNCC) is proposed. First, the PNCC feature is extracted after the Voice Activity Detection (VAD), which uses long term analysis to remove the effect of background noise. Then, Cepstral Mean Variance Normalization (CMVN), Feature Warping and other methods are used to improve PNCC. Finally, GMM‐UBM‐MAP is set as the baseline system for speaker verification test with TIMIT speech database, the robustness of four different features (MFCC, GFCC, PNCC and improved PNCC) are analyzed and compared in different noisy conditions. The experimental results indicate that MFCC has achieved the highest recognition rate under the environment of clean speech. By mixing the test speech with sine noises, the improved PNCC is more robust against different low‐SNR noises than other original features and its Equal Error Rate (EER) reduce significantly in low‐SNR noise environments. Keywords Speaker verification; MFCC; PNCC; GMM‐UBM;
Introduction Speaker verification is a technology identifying the identity of the speaker by the personality feature extraction from speaker’s speech. Currently, there are some feature parameters widely used, including Mel Frequency Cepstral Coefficient (MFCC), Linear Predictive Cepstrum Coefficient (LPCC), Perceptual Linear Predictive (PLP) parameters, etc. These features perform remarkably in clean environment. But in the real application of environment, the existence of various background noises and channel noises will corrupt the basic features, thus affects the robustness of the system. Recently, a new feature based on Gammatone filter namely GFCC (Gammatone Frequency Cepstral Coefficients) is proposed in [1]. The author demonstrates higher robustness than traditional MFCC in noisy conditions. On the basis of this, two novel algorithms of power‐law nonlinearity and power‐bias subtraction are proposed in [2], and form a kind of new feature namely PNCC (Power Normalized Cepstral Coefficient). This approach makes it more robust. To compensate for channel variations, Cepstral Mean Subtraction (CMS) is the primary way. However, this approach degrades significantly in additive noisy conditions. An extension of this approach namely Cepstral Mean Variance Normalization (CMVN) is used in this paper. Based on the mean and variance of normalized at the same time, it is used not only to compensate the channel distortion, but also to play an inhibitory effect of additive noise. A novel feature mapping approach called Feature Warping is proposed in [3] which is robustness to channel mismatch and additive noise. This approach is introduced for us to further enhance the robustness of PNCC. In terms of speaker verification model, Gaussian Mixture Model‐Universal Background Model (GMM‐UBM) is the standard model for standard model and performs better than traditional GMM [4]. In this paper, based on the study mentioned above, new improved PNCC feature is extracted; GMM‐UBM‐MAP is set as the baseline system, combining with the TIMIT database analysis the robustness of different features for speaker verification. Speaker Verification Framework A speaker verification system is shown in Figure 1, including the feature extraction, feature normalization, speaker modeling, classification and a decision process.
14