摘要
提出了一种基于PAD三维情绪模型的情感语音韵律转换方法。选取了11种典型情感,设计了文本语料,录制了语音语料,利用心理学的方法标注了语音语料的PAD值,利用五度字调模型对情感语音音节的基频曲线建模。在此基础上,利用广义回归神经网络(Generalized Regression Neural Network,GRNN)构建了一个情感语音韵律转换模型,根据情感的PAD值和语句的语境参数预测情感语音的韵律特征,并采用STRAIGHT算法实现了情感语音的转换。主观评测结果表明,提出的方法转换得到的11种情感语音,其平均EMOS(Emotional Mean Opinion Score)得分为3.6,能够表现出相应的情感。
This paper proposes a framework for prosody conversion of emotional speech based on PAD three dimensional emo- tion model. It designs an emotional speech corpus including 11 kinds of emotional utterances. Each utterance is labelled the emotional information with PAD value. A five-scale tone model is employed to model the pitch contour of emotional speech at the syllable level. It builds a Generalized Regression Neural Network (GRNN) based prosody conversion model to realize the transformation of pitch contour, duration and pause duration of emotional speech according to the PAD values of emotion and context information of text. Speech is then re-synthesized with the STRAIGHT algorithm by modifying pitch contour, duration and pause duration. Experimental results on Emotional Mean Opining Score (EMOS) demonstrate that the modified speeches achieve 3.6 of average Emotional Mean Opining Score (EMOS).
出处
《计算机工程与应用》
CSCD
2013年第5期230-235,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.61263036
No.60875015)
甘肃省自然科学基金(No.1107RJZA112
No.1208RJYA078)