• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks

Vol. 15, No. 2, February 28, 2021
10.3837/tiis.2021.02.019, Download Paper (Free):

Abstract

Vocal detection is one of the fundamental steps in musical information retrieval. Typically, the detection process consists of feature extraction and classification steps. Recently, neural networks are shown to outperform traditional classifiers. In this paper, we report our study on how to improve detection accuracy further by carefully choosing the parameters of the deep network model. Through experiments, we conclude that a feature-classifier model is still better than an end-to-end model. The recommended model uses a spectrogram as the input plane and the classifier is an 18-layer convolutional neural network (CNN). With this arrangement, when compared with existing literature, the proposed model improves the accuracy from 91.8% to 94.1% in Jamendo dataset. As the dataset has an accuracy of more than 90%, the improvement of 2.3% is difficult and valuable. If even higher accuracy is required, the ensemble learning may be used. The recommend setting is a majority vote with seven proposed models. Doing so, the accuracy increases by about 1.1% in Jamendo dataset.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
S. D. You, C. Liu and J. Lin, "Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks," KSII Transactions on Internet and Information Systems, vol. 15, no. 2, pp. 729-748, 2021. DOI: 10.3837/tiis.2021.02.019.

[ACM Style]
Shingchern D. You, Chien-Hung Liu, and Jia-Wei Lin. 2021. Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks. KSII Transactions on Internet and Information Systems, 15, 2, (2021), 729-748. DOI: 10.3837/tiis.2021.02.019.