基于SVR的情感语音变换被引量：2

Emotional speech conversion based on support vector regression

在线阅读下载PDF

导出

摘要提出了一种新的基于支持向量回归(SVR)的情感语音的变换方法.通过提取普通话10种情感语音的韵律特征,对比分析了中性语音和情感语音之间的韵律特征差异,利用SVR建立了基频、时长、能量、停顿等韵律特征参数的预测模型,并利用Straight算法实现了由中性语音向情感语音的转换.利用这种方法变换出的10种情感语音,其情感主观平均(EMOS)得分为3.4. This paper proposed a novel approach for emotional speech conversion based on support vector regression（SVR）. By analyzing the prosodic features of contrastive neutral and emotional recordings, a support vector regression（SVR） based model is developed, which can transform acoustic features of neutral speech（pitch, duration, energy and pauses） to resemble emotional speech with Straigth algorithm. Emotional mean opinion score（EMOS） results demonstrate that the modified speech which achieved 3.4 of score can express emotion.

作者周慧杨鸿武蔡莲红

机构地区西北师范大学物理与电子工程学院清华大学计算机科学与技术系

出处《西北师范大学学报（自然科学版）》 CAS 北大核心 2009年第1期62-66,93,共6页 Journal of Northwest Normal University(Natural Science)

基金教育部科学研究重点项目(208146) 西北师范大学科研骨干培育项目(NWNU-KJCXGC-03-42)

关键词 SVR 情感语音变换 Straight算法 SVR emotional speech conversion Straight algorithm

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献10

1LIDA A, CAMPBELL N, HIGUCHI F, et al. A corpus based speech synthesis system with emotion [J]. Speech Communication, 2003, 40 (1): 161- 187.
2TODA T, BLACK A W, TOKUDA K. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter [J]. Acoustics, Speech, and Signal Processing, 2005, 30(1): 9-12.
3彭柏,许刚.利用频谱搬移控制语音转换中的共振峰[J].电声技术,2007,31(1):39-43. 被引量：2
4TAOJ H, KANG Y G, LI A J. Prosody conversion from neutral speech to emotional speech [J]. IEEE Transactions on Audio, Speech Language Processing, 2006, 14(4): 1145-1154.
5NI J, HIRAI T, KAWAI H. Constructing phonetic-rich speech corpus while controlling time- dependent voice quality variability for english speech synthesis [C]//ICASSP 2006 Proceedings, IEEE International Conference 2006: 881-884.
6BUSSO C, YILDIRIM S, KAZEMZADEH A. Investigating the role of phoneme-level modifications in emotional speech re-synthesis[C]//Proceedings of the EUROSPEECH, Interspeech, Lisbon Portugal, 2005 : 801-804.
7KAWAHARA H. STRAIGHT, exploration of the other aspect of VOCODER: perceptually isomorphic decomposition of speech sounds[J]. Acoustic Science and Technology, 2006, 27(6): 349-353.
8蒋丹宁,蔡莲红.基于韵律特征的汉语情感语音分类[C]//第一届中国情感计算及智能交互学术会议论文集.北京:中国科学院自动化研究所,2003:122-124.
9李爱军.汉语情感语音研究[C/OL](2006-11-08)[2007-03-05]http://www.corpus4u.org/archive/index.php/f一80.html.
10CRISTlANINI N,SHAWE-TAYLOR J.支持向量机导论[M].李国正,王猛,曾华军,译.北京:电子工业出版社,2004.

二级参考文献12

1CHILDERS D G,WU K.Quality of speech produced by analysis-synthesis[J].Speech Communication,1990,9:97-117.
2吴宗济.普通话单音节语图册[M].北京:中国社会科学出版社,1986.
3CHILDERS D G,WU K.Gender recognition from speech,Part 2:coarse analysis[J].J.Acoust.Soc.Am.,1991,90 (4):1 851-1 856.
4吕士楠,周同春,谢咏圭.用KLATT合成器合成汉语的初步研究[C]// 第二届全国人机语音通讯学术会议论文集.北京:金城出版社,1992:281-286.
5MUSTAFA K,BRUCE I C.Robust formant tracking for continuous speech with speaker variability[J].IEEE Transactions on Audio,Speech and Language Processing,2006,14(2):435-444.
6RAO A,KUMAERSAN R.On decomposing speech into modulated components[J].IEEE Transactions on Speech Audio Processing,2000,8(3):240-254.
7BRUCE I C,KARKHANIS N V,YOUNG E D,et al.Robust formant tracking in noise[C]// Proceedings of the IEEE International Conference on Acoustic,Speech,and Signal Processing.[S.l.]:IEEE Press,2002,1:281-284.
8KANG G S,FRANSEN L J.Application of line-spectrum pairs to low-bit-rate speech encoders[C]// Proceedings of IEEE International Conference on Acoustic,Speech,and Signal Processing.[S.l.]:IEEE Press,1985,10:244-247.
9SOONG F K,JUANG B H.Line spectrum pair and speech data compression[C]// Proceedings of IEEE International Conference on Acoustic,Speech,and Signal Processing.[S.l.]:IEEE Press,1984,9(1):37-40.
10PALIWAL K K.A study of LSP representation for speakerdependent and speaker-independent HMM-based speech recognition systems[C]// Proceedings of IEEE International Conference on Acoustic,Speech,and Signal Processing.[S.l.]:IEEE Press,1992,1:97-100.

共引文献2

1李虎孬,赵晖.情感语音合成综述[J].现代计算机（中旬刊）,2014(7):31-37. 被引量：1
2李要芳,刘智.基于LabVIEW的变声器设计[J].机电信息,2020(27):130-131. 被引量：2

同被引文献32

1高莉琴.从维吾尔人学汉语看第二语言习得的几个问题[J].语言文字应用,1994(1):79-85. 被引量：5
2徐俊,蔡莲红.面向情感转换的层次化韵律分析与建模[J].清华大学学报（自然科学版）,2009(S1):1274-1277. 被引量：7
3蔡莲红,崔丹丹,蔡锐.汉语普通话语音合成语料库TH-CoSS的建设和分析[J].中文信息学报,2007,21(2):94-99. 被引量：12
4Zen H, Tokuda K, Black A W.Statistical parametric speech synthesis[J].Speech Communication,2009,51 ( 11 ) : 1039-1064.
5Yamagishi J, Kobayashi T, Nakano Y, et al.Analysis of speaker adaptation algorithms for HMM-based speech syn- thesis and a constrained SMAPLR adaptation algorithm[J]. IEEE Transactions on Audio, Speech, and Language Process- ing, 2009, 17( 1 ) : 66-83.
6Nose T, Tachibana M, Kobayashi T.HMM-based style con- trol for expressive speech synthesis with arbitrary speaker's voice using model adaptation[J].IEICE Trans on Inf & Syst, 2009, E92-D (3) : 489-497.
7Yang Hongwu, Meng H M, Cai Lianhong.Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis[C]//Proceedings of International Conference on Spoken Language Processing. Pittsburg, USA : [s.n.], 2006: 1806-1809.
8Wu Zhiyong, Meng H M, Yang Hongwu, et al.Modeling the expressivity of input text semantics for chinese text-to-speech synthesis in a spoken dialog system[J].IEEE Transactions on Audio, Speech, and Language Processing, 2009, 17 (8) : 1567-1577.
9崔丹丹.情感语音分析与变换的研究[D].北京:清华大学,2007.
10Guo Weitong, Yang Hongwu, Pei Dong, et al.Prosody con- version of Chinese northwest mandarin dialect based on five degree tone model[J].Intemational Journal of Digital Content Technology and its Applications, 2012, 6 (17): 323-332.

引证文献2

1鲁小勇,杨鸿武,郭威彤,裴东.基于PAD三维情绪模型的情感语音韵律转换[J].计算机工程与应用,2013,49(5):230-235. 被引量：3
2杜楠楠,赵晖.维吾尔语情感语音韵律转换研究[J].计算机工程与应用,2016,52(19):154-160. 被引量：2

二级引证文献5

1王泽勋.层次韵律特征对语音情感转换的影响分析[J].信息通信,2017,30(10):29-30.
2邓叶勋,赵晖.基于非负矩阵分解的情感语音基频转换研究[J].计算机工程,2018,44(5):256-261. 被引量：1
3智鹏鹏,杨鸿武,宋南.利用说话人自适应实现基于DNN的情感语音合成[J].重庆邮电大学学报（自然科学版）,2018,30(5):673-679. 被引量：4
4潘涛,王胜利.基于不同算法的语音信号共振峰提取研究与实现[J].甘肃科技,2019,35(22):23-26.
5彭毛扎西,才智杰,才让卓玛.藏语情感语音数据库构建[J].北京大学学报（自然科学版）,2023,59(5):773-781. 被引量：2

1姚建霄,张歆奕.基于STRAIGHT谱的非特定人数字语音识别[J].五邑大学学报（自然科学版）,2011,25(1):56-60.
2潘劲松.试述英国GPT公司SDH系统SMA—1及其调测[J].湖南通信技术,1995(1):1-10.
3郑萍.GIMIS移动通信资源运维管理系统的应用及推广价值[J].硅谷,2010,3(5):52-52. 被引量：1
4Hu Yahui,Liu Yinlong,Zhou Xu,Xu Zhen.Towards Qo E-based resource allocation schemes in SC-FDMA systems[J].The Journal of China Universities of Posts and Telecommunications,2015,22(5):63-70 100.

西北师范大学学报（自然科学版）

2009年第1期

浏览历史

内容加载中请稍等...

基于SVR的情感语音变换被引量：2

参考文献10

二级参考文献12

共引文献2

同被引文献32

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

基于SVR的情感语音变换 被引量：2

参考文献10

二级参考文献12

共引文献2

同被引文献32

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

基于SVR的情感语音变换被引量：2