• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

Improving Transformer with Dynamic Convolution and Shortcut for Video-Text Retrieval

Vol. 16, No. 7, July 31, 2022
10.3837/tiis.2022.07.016, Download Paper (Free):

Abstract

Recently, Transformer has made great progress in video retrieval tasks due to its high representation capability. For the structure of a Transformer, the cascaded self-attention modules are capable of capturing long-distance feature dependencies. However, the local feature details are likely to have deteriorated. In addition, increasing the depth of the structure is likely to produce learning bias in the learned features. In this paper, an improved Transformer structure named TransDCS (Transformer with Dynamic Convolution and Shortcut) is proposed. A Multi-head Conv-Self-Attention module is introduced to model the local dependencies and improve the efficiency of local features extraction. Meanwhile, the augmented shortcuts module based on a dual identity matrix is applied to enhance the conduction of input features, and mitigate the learning bias. The proposed model is tested on MSRVTT, LSMDC and Activity-Net benchmarks, and it surpasses all previous solutions for the video-text retrieval task. For example, on the LSMDC benchmark, a gain of about 2.3% MdR and 6.1% MnR is obtained over recently proposed multimodal-based methods.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
Z. Liu, J. Cai, M. Zhang, "Improving Transformer with Dynamic Convolution and Shortcut for Video-Text Retrieval," KSII Transactions on Internet and Information Systems, vol. 16, no. 7, pp. 2407-2424, 2022. DOI: 10.3837/tiis.2022.07.016.

[ACM Style]
Zhi Liu, Jincen Cai, and Mengmeng Zhang. 2022. Improving Transformer with Dynamic Convolution and Shortcut for Video-Text Retrieval. KSII Transactions on Internet and Information Systems, 16, 7, (2022), 2407-2424. DOI: 10.3837/tiis.2022.07.016.

[BibTeX Style]
@article{tiis:25849, title="Improving Transformer with Dynamic Convolution and Shortcut for Video-Text Retrieval", author="Zhi Liu and Jincen Cai and Mengmeng Zhang and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2022.07.016}, volume={16}, number={7}, year="2022", month={July}, pages={2407-2424}}