• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

Convolutional Neural Network based Audio Event Classification


Abstract

This paper proposes an audio event classification method based on convolutional neural networks (CNNs). CNN has great advantages of distinguishing complex shapes of image. Proposed system uses the features of audio sound as an input image of CNN. Mel scale filter bank features are extracted from each frame, then the features are concatenated over 40 consecutive frames and as a result, the concatenated frames are regarded as an input image. The output layer of CNN generates probabilities of audio event (e.g. dogs bark, siren, forest). The event probabilities for all images in an audio segment are accumulated, then the audio event having the highest accumulated probability is determined to be the classification result. This proposed method classified thirty audio events with the accuracy of 81.5% for the UrbanSound8K, BBC Sound FX, DCASE2016, and FREESOUND dataset.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
Minkyu Lim, Donghyun Lee, Hosung Park, Yoseb Kang, Junseok Oh, Jeong-Sik Park, Gil-Jin Jang and Ji-Hwan Kim, "Convolutional Neural Network based Audio Event Classification," KSII Transactions on Internet and Information Systems, vol. 12, no. 6, pp. 2748-2760, 2018. DOI: 10.3837/tiis.2018.06.017

[ACM Style]
Lim, M., Lee, D., Park, H., Kang, Y., Oh, J., Park, J., Jang, G., and Kim, J. 2018. Convolutional Neural Network based Audio Event Classification. KSII Transactions on Internet and Information Systems, 12, 6, (2018), 2748-2760. DOI: 10.3837/tiis.2018.06.017