• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

Multi-scale feature fusion attention of stereo vision depth recovery network based on Swin Transformer

Vol. 19, No. 1, January 31, 2025
10.3837/tiis.2025.01.007, Download Paper (Free):

Abstract

Stereo vision can be applied to obtain the depth of the current image by taking left and right images through binocular cameras. The foundation of traditional depth recovery approaches is disparity matching, which necessitates intricate registration algorithms and a significant amount of computation. Moreover, effective disparity matching is hard to be established for images with weak or repeated textures. We therefore proposed a multi-scale U-shaped Swin transformer structure based binocular visual depth recovery backbone network. It has a larger receptive field, allowing it to extract both local and global feature more effectively. A new loss function takes into account the SSIM as well as the L1 loss was proposed and validated, which allows for more accurate depth restoration. By combining the correlation disparity cost volume and our new loss function, depth recovery can be accomplished efficiently. Our tests on the dataset such as Middlebury, ETH3D and Cityspace have achieved excellent results, demonstrating the advantages of our proposed approach, and its PSNR/SSIM/L1 performance improved significantly, especially on the bonn dataset, as compared to the second-place results, by 5.06%/3.00%/19.50%, respectively.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
C. Zou, "Multi-scale feature fusion attention of stereo vision depth recovery network based on Swin Transformer," KSII Transactions on Internet and Information Systems, vol. 19, no. 1, pp. 149-166, 2025. DOI: 10.3837/tiis.2025.01.007.

[ACM Style]
Changjun Zou. 2025. Multi-scale feature fusion attention of stereo vision depth recovery network based on Swin Transformer. KSII Transactions on Internet and Information Systems, 19, 1, (2025), 149-166. DOI: 10.3837/tiis.2025.01.007.

[BibTeX Style]
@article{tiis:101913, title="Multi-scale feature fusion attention of stereo vision depth recovery network based on Swin Transformer", author="Changjun Zou and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2025.01.007}, volume={19}, number={1}, year="2025", month={January}, pages={149-166}}