Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

Junseok Oh; Eunsoo Cho; Ji-Hwan Kim

Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

Vol. 18, No. 6, June 30, 2024

10.3837/tiis.2024.06.015, Download Paper (Free):

Abstract

In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre-trained Korean End-to-end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause-level n-gram language model, transformed into a Weighted Finite-State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre-trained ASR model. This is particularly advantageous for Korean, characterized as a low-resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n-gram model at the clause-level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.

Statistics

Show / Hide Statistics

Cite this article

[IEEE Style]

J. Oh, E. Cho, J. Kim, "Integration of WFST Language Model in Pre-trained Korean E2E ASR Model," KSII Transactions on Internet and Information Systems, vol. 18, no. 6, pp. 1692-1705, 2024. DOI: 10.3837/tiis.2024.06.015.

[ACM Style]

Junseok Oh, Eunsoo Cho, and Ji-Hwan Kim. 2024. Integration of WFST Language Model in Pre-trained Korean E2E ASR Model. KSII Transactions on Internet and Information Systems, 18, 6, (2024), 1692-1705. DOI: 10.3837/tiis.2024.06.015.

[BibTeX Style]

@article{tiis:99358, title="Integration of WFST Language Model in Pre-trained Korean E2E ASR Model", author="Junseok Oh and Eunsoo Cho and Ji-Hwan Kim and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2024.06.015}, volume={18}, number={6}, year="2024", month={June}, pages={1692-1705}}

Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

Abstract

Statistics

Cite this article

[IEEE Style]

[ACM Style]

[BibTeX Style]

Unified Search
(in title, author, abstract, and keywords)

Category Search

Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

Abstract

Statistics

Cite this article

[IEEE Style]

[ACM Style]

[BibTeX Style]

Unified Search (in title, author, abstract, and keywords)

Category Search

Unified Search
(in title, author, abstract, and keywords)