• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

Visual Object Tracking Fusing CNN and Color Histogram based Tracker and Depth Estimation for Automatic Immersive Audio Mixing

Vol. 14, No. 3, March 31, 2020
10.3837/tiis.2020.03.012, Download Paper (Free):

Abstract

We propose a robust visual object tracking algorithm fusing a convolutional neural network tracker trained offline from a large number of video repositories and a color histogram based tracker to track objects for mixing immersive audio. Our algorithm addresses the problem of occlusion and large movements of the CNN based GOTURN generic object tracker. The key idea is the offline training of a binary classifier with the color histogram similarity values estimated via both trackers used in this method to opt appropriate tracker for target tracking and update both trackers with the predicted bounding box position of the target to continue tracking. Furthermore, a histogram similarity constraint is applied before updating the trackers to maximize the tracking accuracy. Finally, we compute the depth(z) of the target object by one of the prominent unsupervised monocular depth estimation algorithms to ensure the necessary 3D position of the tracked object to mix the immersive audio into that object. Our proposed algorithm demonstrates about 2% improved accuracy over the outperforming GOTURN algorithm in the existing VOT2014 tracking benchmark. Additionally, our tracker also works well to track multiple objects utilizing the concept of single object tracker but no demonstrations on any MOT benchmark.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
S. Park, M. M. Islam and J. Baek, "Visual Object Tracking Fusing CNN and Color Histogram based Tracker and Depth Estimation for Automatic Immersive Audio Mixing," KSII Transactions on Internet and Information Systems, vol. 14, no. 3, pp. 1121-1141, 2020. DOI: 10.3837/tiis.2020.03.012.

[ACM Style]
Sung-Jun Park, Md. Mahbubul Islam, and Joong-Hwan Baek. 2020. Visual Object Tracking Fusing CNN and Color Histogram based Tracker and Depth Estimation for Automatic Immersive Audio Mixing. KSII Transactions on Internet and Information Systems, 14, 3, (2020), 1121-1141. DOI: 10.3837/tiis.2020.03.012.

[BibTeX Style]
@article{tiis:23391, title="Visual Object Tracking Fusing CNN and Color Histogram based Tracker and Depth Estimation for Automatic Immersive Audio Mixing", author="Sung-Jun Park and Md. Mahbubul Islam and Joong-Hwan Baek and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2020.03.012}, volume={14}, number={3}, year="2020", month={March}, pages={1121-1141}}