摘要
传统声学特征易忽略语音的非线性、非平稳特性并且不能同时提取患者声道、声带的病理特性,导致识别模型性能不佳。因此文章提出了一种结合经验模态分解和快速沃尔什-哈达玛变换的构音障碍语音特征增强算法。首先,采用快速傅里叶变换处理语音后,引入经验模态分解自适应提取其本征模态函数;其次,进行快速沃尔什-哈达玛变换;接着,提取基于本征模态函数的统计学特征以及功率谱密度、伽马通频率倒谱系数的增强特征;最后,在UA Speech和TORGO数据库上进行病情分级研究,并引入了非平衡分类算法评估。结果表明,该算法对比传统特征在病理语音分级研究上是有效的,在考虑类间不平衡后,识别准确率至少提高了12.18个百分点。由此,该算法可以更充分表征构音障碍语音特性,对其非平衡性、非线性特性及缺乏同时表征声带和声道中局部病理信息的问题具有一定的改善作用。
Dysarthria speech contains the pathological characteristics of the vocal tract and vocal folds.However,these characteristics have not yet been included in traditional acoustic features.Furthermore,the nonlinearity and non-stationarity of speech are also ignored.Therefore,this paper proposes a feature enhancement algorithm for dysarthria speech called WHFEMD by combining empirical mode decomposition(EMD)and fast Walsh-Hadamard transform(FWHT).In this proposed algorithm,the dysarthria speech undergoes fast Fourier transform first,followed by EMD to obtain intrinsic mode functions(IMFs).Then FWHT is applied to generate new coefficients and extract statistical features as well as enhanced features based on Power Spectral Density and Gammatone Frequency Cepstral Coefficients based on IMFs.Disease classification is conducted using data from UA Speech and TORGO databases,which is further evaluated by using an imbalanced classification algorithm.According to experimental findings,WHFEMD enhanced features are significantly superior to traditional features.After balancing the data with the imbalanced classification algorithm,the identification accuracy rate increased by at least 12.18 percentage.This demonstrates that WHFEMD can more ccomprehensively characterize dysarthria speech while addressing issues related to its non-stationary and non-linear characteristics as well as lack of simultaneous characterization of local pathological information in both vocal folds and vocal tracts.
作者
朱婷
段淑斐
DINGAM Camille
梁慧芝
张卫
ZHU Ting;DUAN Shufei;DINGAM Camille;LIANG Huizhi;ZHANG Wei(College of Electronic Information and Optical Engineering,Taiyuan University of Technology,Taiyuan 030024,Shanxi,China;School of Computing,Newcastle University,Newcastle NE17RU,UK;Taiyuan Hospital,Peking University First Hospital,Taiyuan 030032,Shanxi,China)
出处
《声学技术》
2025年第2期239-251,共13页
Technical Acoustics
基金
国家自然科学基金青年科学基金(12004275)
山西省应用基础研究计划面上自然基金(20210302123186)
山西省留学人员科技活动择优资助项目(20200017)。
关键词
构音障碍
特征增强
经验模态分解
沃尔什-哈达玛变换
病理语音
dysarthria
feature enhancement
empirical mode decomposition
Walsh-Hadamard transform
pathological speech