摘要
文章提出了一种基于概念统计和语义层次分析的自动文摘方法,并以此实现了一个英文自动文摘系统。系统利用WordNet对英文文章进行词语分析,用概念统计的方法选取文章的主题概念,以此构建向量空间模型;并根据主题概念在概念层次树上的分布划分意义块,以意义块为单位抽取文摘,初步解决多主题文章的文摘结构不平衡问题。该文主要介绍概念层次树的构造,主题概念的抽取步骤,句子重要度的计算和意义块的划分算法。测试表明该文提到的方法比传统的基于词频统计的方法有更高的召回率与精确率。
This paper puts forward a new summarizing method based on concept counting and semantic hierarchy anal-ysis.Based on the extracted topic concepts,it constructs concept counting and semantic hierarchy analysis an effective English Text Summarizing system is developed.This system uses topic concepts to construct Vector Space Model.Combing with discourse analysis and readability improvement ,the abstract of a text is generated.This paper proposes the parame-ters of evaluating topic concepts,and mainly describes the detail algorithm of building concept hierarchy tree,extracting topic concepts and the application of topic concepts in generating abstracts.The experiment result shows that compared to word counting,this new method has enhanced both the recall rate and the precision rate of the system,and it helps to solve the abstract distribution problem of multi-topic texts.
出处
《计算机工程与应用》
CSCD
北大核心
2002年第24期7-9,16,共4页
Computer Engineering and Applications
基金
国家自然科学基金项目(批准号:69972025)