摘要
由于在语音识别中被广泛应用的隐马尔可夫模型(HMM)是一重马尔可夫模型,它不能充分地描述语音信号的时间相依性。虽然理论上可将HMM扩展成多重马尔可夫模型,但由于所需运算量和存储量将成指数增长而使其难以应用。因此,本文提出一种新模型,它是由HMM与一个能描述语音信号时间相依性的多维高斯密度函数相结合构成的,本文从理论上论证了新模型的合理性。对汉语不计声调的全部409个单音节的识别实验结果表明:新模型的识别率显著一致地高于HMM.此外,本文使用平滑的统计直方图描述状态的持续时间长度,因为我们在实验中发现,连续的密度函数,例如高斯、Gamma等,不能令人满意地描述HMM或本文新模型的状态持续时间。
Since the widely used Hidden Markov Model(HMM)in speech recognition is first order Markov Model, it can not fully model the temporal dependence of speech signal. Although HMM can be extended to higher order Markov Model theoretically,the exponential increase of required computation and memory makes it difficult to use.Therefore,a new model is proposed in this paper,it is constructed by combining HMM with a multi-variable Gaussian density which can depict the temporal dependence of speech signal. The reasonableness of the new model is discussed theoreically. The experiment for all Chinese syllables with tone disregarded(total of 409 syllables)recognition shows that recognition rate of the new model is always significantly better than that of HMM.Furthermore,a discrete smoothed statisical histogram is used to model the state duration,because we found in the experiment that continuous density function,such as Gaussian,Gamma etc.,can not satisfactorily depict the state duration of either HMM or the new model.
出处
《电子学报》
EI
CAS
CSCD
北大核心
1994年第1期9-15,共7页
Acta Electronica Sinica