Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music
Abstract: Identifying musical instruments in polyphonic music recordings is a challenging but important problem in the field of music information retrieval. It enables music search by instrument, helps recognize musical genres, or can make music transcription easier and more accurate. In this paper, we present a convolutional neural network framework for predominant instrument recognition in real-world real polyphonic music. We train our network from fixed fixed-length length music excerpts with a single-labeled labeled predominant instrument and estimate an arbitrary number of predominant instruments from an audio signal with a variable length. To obtain the audio-excerpt-wise wise result, we aggregate multiple outputs from sliding windows over the test audio. In doing so, we investigated two different diff aggregation methods: one takes the class class-wise wise average followed by normalization, and the other perform temporally local class class-wise wise max-pooling max on the output probability prior to averaging and normalization steps to minimize the effect of averaging process ocess suppresses the activation of sporadically appearing instruments. In addition, we conducted extensive experiments on several important factors that affect the performance, including analysis window size, identification threshold, and activation functi functions ons for neural networks to find the optimal set of parameters. Our analysis on the instrument instrument-wise wise performance found that the onset type is a critical factor for recall and precision of each