• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

Voice Frequency Synthesis using VAW-GAN based Amplitude Scaling for Emotion Transformation


Abstract

Mostly, artificial intelligence does not show any definite change in emotions. For this reason, it is hard to demonstrate empathy in communication with humans. If frequency modification is applied to neutral emotions, or if a different emotional frequency is added to them, it is possible to develop artificial intelligence with emotions. This study proposes the emotion conversion using the Generative Adversarial Network (GAN) based voice frequency synthesis. The proposed method extracts a frequency from speech data of twenty-four actors and actresses. In other words, it extracts voice features of their different emotions, preserves linguistic features, and converts emotions only. After that, it generates a frequency in variational auto-encoding Wasserstein generative adversarial network (VAW-GAN) in order to make prosody and preserve linguistic information. That makes it possible to learn speech features in parallel. Finally, it corrects a frequency by employing Amplitude Scaling. With the use of the spectral conversion of logarithmic scale, it is converted into a frequency in consideration of human hearing features. Accordingly, the proposed technique provides the emotion conversion of speeches in order to express emotions in line with artificially generated voices or speeches.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
H. Kwon, M. Kim, J. Baek, K. Chung, "Voice Frequency Synthesis using VAW-GAN based Amplitude Scaling for Emotion Transformation," KSII Transactions on Internet and Information Systems, vol. 16, no. 2, pp. 713-725, 2022. DOI: 10.3837/tiis.2022.02.018.

[ACM Style]
Hye-Jeong Kwon, Min-Jeong Kim, Ji-Won Baek, and Kyungyong Chung. 2022. Voice Frequency Synthesis using VAW-GAN based Amplitude Scaling for Emotion Transformation. KSII Transactions on Internet and Information Systems, 16, 2, (2022), 713-725. DOI: 10.3837/tiis.2022.02.018.

[BibTeX Style]
@article{tiis:25315, title="Voice Frequency Synthesis using VAW-GAN based Amplitude Scaling for Emotion Transformation", author="Hye-Jeong Kwon and Min-Jeong Kim and Ji-Won Baek and Kyungyong Chung and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2022.02.018}, volume={16}, number={2}, year="2022", month={February}, pages={713-725}}