• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

Survey on Deep Learning-based Speech Technologies in Voice Chatbot Systems

Vol. 19, No. 5, May 31, 2025
10.3837/tiis.2025.05.002, Download Paper (Free):

Abstract

Recent advancements in large language models (LLMs) such as ChatGPT have contributed to the development of chatbot systems. Specifically, speech has been recognized as the optimal tool for interactive dialogue, leading to increased interest in voice chatbots. Voice chatbots offer information and services through voice interactions, enhancing user experience and improving service accessibility. This survey paper introduces the latest developments in the core technologies of voice chatbot systems, including deep learning-based automatic speech recognition, speech synthesis, and speech emotion recognition. It focuses on advanced research to enhance speed and performance, which is crucial for applying speech technologies in voice chatbots. In automatic speech recognition, we introduce methodologies such as Connectionist Temporal Classification (CTC), Attention based Encoder-Decoder (AED), and Recurrent Neural Network Transducer (RNN-T), along with studies optimizing Transformer-based models for real-time automatic speech recognition. In speech emotion recognition, we explore the use of pre-trained models and the latest techniques for accurate emotion prediction. For speech synthesis, the focus extends to two-stage and End-to-End (E2E) approaches, with additional research on integrating emotional information to generate natural speech.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
S. Ma, J. Oh, M. Kim, J. Kim, "Survey on Deep Learning-based Speech Technologies in Voice Chatbot Systems," KSII Transactions on Internet and Information Systems, vol. 19, no. 5, pp. 1406-1440, 2025. DOI: 10.3837/tiis.2025.05.002.

[ACM Style]
Seunghee Ma, Junseok Oh, Minseo Kim, and Ji-Hwan Kim. 2025. Survey on Deep Learning-based Speech Technologies in Voice Chatbot Systems. KSII Transactions on Internet and Information Systems, 19, 5, (2025), 1406-1440. DOI: 10.3837/tiis.2025.05.002.

[BibTeX Style]
@article{tiis:102584, title="Survey on Deep Learning-based Speech Technologies in Voice Chatbot Systems", author="Seunghee Ma and Junseok Oh and Minseo Kim and Ji-Hwan Kim and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2025.05.002}, volume={19}, number={5}, year="2025", month={May}, pages={1406-1440}}