2.Maths - IJMCAR

Page 1

International Journal of Mathematics and Computer Applications Research (IJMCAR) ISSN 2249-6955 Vol. 2 Issue 4 Dec 2012 11-24 Š TJPRC Pvt. Ltd.,

SPARSE NONNEGATIVE MATRIX BASED ON

-DIVERGENCE FOR

SINGLE CHANNEL SEPARATION IN COCHLEAGRAM M. E. ABD EL AZIZ & WAEL KIDER Department of Mathematics, Faculty of Science, Zagazig University, Zagazig 44519, Egypt

ABSTRACT In this paper, a novel family of

-divergence based two-dimensional nonnegative matrix factorization methods to

solve SCBSS has been proposed. The separation system of cochleagram and the family of

divergence based

factorization algorithms have been developed in a principled manner coupled with the theoretical support of audio signal separability. The proposed method enjoys at least two significant advantages: Firstly, the cochleagram rendered by the gammatone filterbank has non-uniform time-frequency resolution which enables the mixed signal to be more separable and improves the efficiency in source tracking. Secondly, the

divergency holds a desirable property of scale invariant that

enables low energy components in the cochleagram bear the same relative importance as the high energy ones. We compare our system to the Factorial SC and SNMF2D models, where the proposed algorithm shows a superior performance in terms of signal-to interference ratio. Finally, the low computational requirements of the algorithm allows close to real time applications.

KEYWORDS: Blind Signal Separation (BSS), Nonnegative Matrix Factorization (NMF),

Divergence,

- NMF,

Single Channel Source Separation (SCSS)

INTRODUCTION Single channel source separation (SCSS) aims to extract several source signals from a single mixture recording. Since at least two sources are interfering and sound sources may overlap in time so that the standard source separation methods such as ICA (Hyvarinen et al 2001) cannot be applied, the standard NMF or SNMF models (Schmidt et al 2006) are only satisfactory for solving source separation providing that spectral frequencies do not change over time. The recently SNMF2D model (Gao et al 2011) solving the problem of SNMF where the spectral dictionary and temporal code optimized by using kullback divergence, where they rarely interfere in a time-frequency representation. This fact has been used in computational auditory scene analysis (Wang et al 2006, Brown 1994); inspired by the human ability to organize the perceived time-frequency representation according to likely sources, but SNMF2D has some drawbacks that originate from its lack of generalized criterion for controlling the sparsity. Roweis (Roweis 2003) introduced the refiltering framework which uses so-called spectrogram masks in order to attenuate spectrogram parts which do not belong to the desired sources. To estimate these mask signals, he proposed the factorial-max vector quantizer (VQ) model, which assumes that the magnitude-log source spectrograms are generated by vector quantizers plus a noise term. In order to train speaker specific code-books and to estimate the noise variances he applied k-means to source specific spectrograms. Hence, max-VQ explicitly models the sources in a training stage. The factorial-max VQ model can be extend by replacing the vector quantizers with sparse coders (Peharz 2010). A sparse coder can be seen as a generalization of a vector quantizer, since it represents data with a linear combination of up to

so-called atoms (

being a parameter to chose),

while a vector quantizer uses a single, non-scalable code-word, consequently. In order to train speaker specific dictionaries,


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
2.Maths - IJMCAR by Transtellar Publications - Issuu