摘要
提出了一种基于经验模式分解(empiricalmodedecomposition,简称EMD)的汉字字体识别方法.通过对大量汉字字体的研究比较,选取了能反映汉字字体基本特征的8种基本笔画.以这8种汉字笔画为模板,在汉字文档图像块中随机地抽取笔画信息,形成笔画特征序列.通过对笔画特征序列作EMD分解,提取每个笔画特征序列的高频能量,并结合汉字文档图像块的平均灰度,形成字体识别的一个9维特征.
This paper gives a novel approach to recognize Chinese fonts based on Empirical Mode Decomposition (EMD). By analyzing and comparing a great number of Chinese characters, 8 basic strokes are selected to characterize the structural attributes of Chinese fonts. Based on them, stroke feature sequences of each text block are calculated. Once decomposed by EMD, their first two intrinsic mode functions (IMFs), which are of the highest frequencies, are used to calculate the stroke energy of all the 8 basic strokes, forming the average of the energy of the two IMFs over the length of the sequence. To distinguish bold fonts from their regular fonts, average of the pixel's gray levels of the text is calculated and appended to the feature vector to form a 9 dimensional feature.Finally, the minimum distance classifier is used to recognize the fonts. Experiments show encouraging recognition rates.
出处
《软件学报》
EI
CSCD
北大核心
2005年第8期1438-1444,共7页
Journal of Software
基金
Nos.60133020
60475042国家自然科学基金
No.2004CB318000国家重点基础研究发展规划(973)
No.036608广东省自然科学基金
No.2003J1-C0201广州市科技计划项目~~