CSAF: Child-like Speech Augmentation Framework through Adult Speech Data Modulation

JunHwi Moon; OkJoo Choi; Sarah Choi; Keon Chul Park; JeongRok Lee; Wonsun Shin

CSAF: Child-like Speech Augmentation Framework through Adult Speech Data Modulation

Vol. 19, No. 11, November 30, 2025

10.3837/tiis.2025.11.019, Download Paper (Free):

Abstract

The lack of child speech data significantly degrades the performance of commercial automatic speech recognition (ASR) systems. However, it remains challenging to acquire sufficient child speech data because of difficulties in controlling recording conditions and limited cooperation of children during data collection. To address these limitations, this study proposes a data augmentation method to enhance speech resources for developing ASR systems targeted at children aged 4 to 7. Several acoustic modulation techniques—such as pitch, tempo, fundamental frequency (F0), and vocal tract length normalization (VTLN)—are applied to make the characteristics of adult speech more similar to those of child speech. The results of this study show that ASR models trained on modulated adult speech achieved lower word error rates (WER) than those trained on unmodulated adult speech. Under the best augmentation settings, WER(N) decreased by up to 0.31 percentage points for Whisper and 0.41 percentage points for Wav2Vec 2.0 (W2V-K), while CER(N) decreased by 0.51 and 0.14 percentage points, respectively. In comparative experiments, applying the most effective augmentation conditions from Whisper to the Wav2Vec2-Korean (W2V-K) model also resulted in performance improvements.

Statistics

Show / Hide Statistics

Cite this article

[IEEE Style]

J. Moon, O. Choi, S. Choi, K. C. Park, J. Lee, W. Shin, "CSAF: Child-like Speech Augmentation Framework through Adult Speech Data Modulation," KSII Transactions on Internet and Information Systems, vol. 19, no. 11, pp. 4119-4137, 2025. DOI: 10.3837/tiis.2025.11.019.

[ACM Style]

JunHwi Moon, OkJoo Choi, Sarah Choi, Keon Chul Park, JeongRok Lee, and Wonsun Shin. 2025. CSAF: Child-like Speech Augmentation Framework through Adult Speech Data Modulation. KSII Transactions on Internet and Information Systems, 19, 11, (2025), 4119-4137. DOI: 10.3837/tiis.2025.11.019.

[BibTeX Style]

@article{tiis:105182, title="CSAF: Child-like Speech Augmentation Framework through Adult Speech Data Modulation", author="JunHwi Moon and OkJoo Choi and Sarah Choi and Keon Chul Park and JeongRok Lee and Wonsun Shin and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2025.11.019}, volume={19}, number={11}, year="2025", month={November}, pages={4119-4137}}

CSAF: Child-like Speech Augmentation Framework through Adult Speech Data Modulation

Abstract

Statistics

Cite this article

[IEEE Style]

[ACM Style]

[BibTeX Style]

Unified Search
(in title, author, abstract, and keywords)

Category Search

CSAF: Child-like Speech Augmentation Framework through Adult Speech Data Modulation

Abstract

Statistics

Cite this article

[IEEE Style]

[ACM Style]

[BibTeX Style]

Unified Search (in title, author, abstract, and keywords)

Category Search

Unified Search
(in title, author, abstract, and keywords)