Vol. 19, No. 11, November 30, 2025
10.3837/tiis.2025.11.019,
Download Paper (Free):
Abstract
The lack of child speech data significantly degrades the performance of commercial automatic speech recognition (ASR) systems. However, it remains challenging to acquire sufficient child speech data because of difficulties in controlling recording conditions and limited cooperation of children during data collection. To address these limitations, this study proposes a data augmentation method to enhance speech resources for developing ASR systems targeted at children aged 4 to 7. Several acoustic modulation techniques—such as pitch, tempo, fundamental frequency (F0), and vocal tract length normalization (VTLN)—are applied to make the characteristics of adult speech more similar to those of child speech. The results of this study show that ASR models trained on modulated adult speech achieved lower word error rates (WER) than those trained on unmodulated adult speech. Under the best augmentation settings, WER(N) decreased by up to 0.31 percentage points for Whisper and 0.41 percentage points for Wav2Vec 2.0 (W2V-K), while CER(N) decreased by 0.51 and 0.14 percentage points, respectively. In comparative experiments, applying the most effective augmentation conditions from Whisper to the Wav2Vec2-Korean (W2V-K) model also resulted in performance improvements.
Statistics
Show / Hide Statistics
Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.
Cite this article
[IEEE Style]
J. Moon, O. Choi, S. Choi, K. C. Park, J. Lee, W. Shin, "CSAF: Child-like Speech Augmentation Framework through Adult Speech Data Modulation," KSII Transactions on Internet and Information Systems, vol. 19, no. 11, pp. 4119-4137, 2025. DOI: 10.3837/tiis.2025.11.019.
[ACM Style]
JunHwi Moon, OkJoo Choi, Sarah Choi, Keon Chul Park, JeongRok Lee, and Wonsun Shin. 2025. CSAF: Child-like Speech Augmentation Framework through Adult Speech Data Modulation. KSII Transactions on Internet and Information Systems, 19, 11, (2025), 4119-4137. DOI: 10.3837/tiis.2025.11.019.
[BibTeX Style]
@article{tiis:105182, title="CSAF: Child-like Speech Augmentation Framework through Adult Speech Data Modulation", author="JunHwi Moon and OkJoo Choi and Sarah Choi and Keon Chul Park and JeongRok Lee and Wonsun Shin and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2025.11.019}, volume={19}, number={11}, year="2025", month={November}, pages={4119-4137}}