摘要
本文采用数据挖掘技术和情报语言学方法 ,构建了一个可以用于从因特网上提取信息、进行自动标引和自动分类的系统 ,提供了一种创建自动分类知识库的新方法 ;提出了一种用于主题抽取的位置加权算法 ,研制了一种改进汉语同义词识别性能的新方法 ,并在自动分类时运用了这种语义相似度识别算法。
The authors use the automatic indexing technique and the data mining technology to create a practical knowledge base, which can be used to extract information from three kinds of data on the Internet. They provide a new method to create a knowledge database for automatic classification, provide the location weighting algorithm for information extraction, and present a new method to improve the performance of recognition of synonyms. To enhance the capacity of the recognition synonyms, they adopt a Synonyms Dictionary as the semantic system and provide the new algorithm of recognition synonyms. They use the algorithm to calculate the degree of similarity among words and match the subject during the automatic classifying. Finally, their systems are tested and evaluated.
出处
《情报理论与实践》
CSSCI
北大核心
2004年第5期528-532,共5页
Information Studies:Theory & Application
基金
ThispaperispartlysponsoredbyNationalSocialScienceFund ,directedbyProf.HouHanqing .(ID :0 2BTQ0 12 )