期刊文献+

基于自监督预训练和有监督微调的伪造语音检测方法 被引量:1

Spoofing speech detection method based on self-supervised pre-training and supervised fine-tuning
在线阅读 下载PDF
导出
摘要 随着深度学习技术的发展,合成语音的质量和听感与真实自然语音越来越难以区分,这严重威胁了基于声纹识别相关应用的可靠性和安全性。现有研究从特征提取和后端二分类器两个维度上提出了各种方法用于伪造语音检测,取得了优异的效果。然而,当模型面对未知的伪造类型数据时,模型检测准确率急剧下降,特别是对于跨数据集评估测试场景。受到自监督学习框架在多种语音识别下游任务成功经验的启发,提出一种基于预训练和微调结合的伪造语音检测框架。预训练使用无标签数据学习通用的语音表征,之后利用有标签的真实伪造语音数据集来微调整个网络参数,达到区分真实自然音和伪造语音的目的。该方法在ASVspoof 2019逻辑攻击数据集上的联合检测代价函数值为0.0061,等错误率(EER)值为0.19%,同时在ASVspoof 2015和Fake or Real跨数据集评测上也展现了良好的泛化能力。 With the development of deep learning techniques,the quality and audibility of synthesized speech is increasingly indistinguishable from real natural speech,which threatens the reliability and safety of applications based on speaker verification.Various approaches for spoofing speech detection have been researched with different feature extraction and back-end binary classifiers,and have achieved promising intra-database performance.However,the model detection accuracy drops dramatically when dealing with unknown forgery type of data,especially in cross-dataset evaluation test scenarios.Inspired by the success of self-supervised framework for many downstream tasks,a self-supervised framework of pre-training and fine-tuning for spoofing speech detection was proposed.In the pre-training,general speech representation was extracted by learning non-label data,then the whole network parameters were fine-tuned to distinguish bona fide speech from spoofing speech by labeled spoofing speech data.The proposed model achieved a tandem detection cost function score of 0.0061 and an Equal Error Rate(EER)of 0.19%for the ASVspoof 2019 logical access dataset,and achieved promising detection performance in the cross-dataset evaluation on ASVspoof 2015 and Fake or Real datasets,which demonstrated its attractive generalization capability.
作者 夏翔 方磊 方四安 柳林 XIA Xiang;FANG Lei;FANG Si'an;LIU Lin(Hefei iFlytek Digital Technology Limited Company,Hefei Anhui 230000,China)
出处 《计算机应用》 CSCD 北大核心 2023年第S01期263-268,共6页 journal of Computer Applications
关键词 反欺骗 伪造语音检测 自监督 预训练 泛化能力 anti-spoofing spoofing speech detection self-supervised pre-training generalization capability
  • 相关文献

参考文献1

共引文献17

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部