Vol. 18, No. 6, June 30, 2024
10.3837/tiis.2024.06.015,
Download Paper (Free):
Abstract
In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre-trained Korean End-to-end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause-level n-gram language model, transformed into a Weighted Finite-State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre-trained ASR model. This is particularly advantageous for Korean, characterized as a low-resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n-gram model at the clause-level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.
Statistics
Show / Hide Statistics
Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.
Cite this article
[IEEE Style]
J. Oh, E. Cho, J. Kim, "Integration of WFST Language Model in Pre-trained Korean E2E ASR Model," KSII Transactions on Internet and Information Systems, vol. 18, no. 6, pp. 1692-1705, 2024. DOI: 10.3837/tiis.2024.06.015.
[ACM Style]
Junseok Oh, Eunsoo Cho, and Ji-Hwan Kim. 2024. Integration of WFST Language Model in Pre-trained Korean E2E ASR Model. KSII Transactions on Internet and Information Systems, 18, 6, (2024), 1692-1705. DOI: 10.3837/tiis.2024.06.015.
[BibTeX Style]
@article{tiis:99358, title="Integration of WFST Language Model in Pre-trained Korean E2E ASR Model", author="Junseok Oh and Eunsoo Cho and Ji-Hwan Kim and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2024.06.015}, volume={18}, number={6}, year="2024", month={June}, pages={1692-1705}}