摘要
文本摘要和关键词抽取是自然语言处理领域的两个重要研究课题,它们均以生成描述文本主旨内容的精简信息为目标。尽管这两个任务目标相似,但它们通常被作为两个独立的问题分别研究,而较少考虑其彼此间的自然关联性。尽管已有学者提出了基于图模型的协同抽取方法,该方法同时考虑了句子与句子、词与词、句子与词之间的各种关系,以迭代强化的方式同时生成文本摘要和关键词,但现有模型大多仅限于表达句子与词之间的各种二元关系,而忽视了不同文本单元间潜在的若干重要的高阶关系。鉴于此,该文提出了一种新的基于超图的协同抽取方法。该方法以句子作为超边,以词作为节点构建超图,在一个统一的超图模型下同时利用句子与词之间的高阶信息来生成摘要和关键词。在NLPCC 2015面向微博的新闻文本摘要任务数据集上的实验结果验证了所提方法的可行性和有效性。
Text summarization and keyword extraction are two important research topics in Natural Language Processing (NLP), and they both generate concise information to describe the gist of text. Ahhough these two tasks have similar objective, they are usually studied independently and their association is less considered. Based on the graph-based ranking methods, some collaborative extraction methods have been proposed, clapturing the associations between sentences, between words and between the sentence and the word. Though they generate both text summary and keywords in an iterative reinforced framework, most existing models are limited to iexpress various kinds of binary relations between sentences and words, ignoring a number of potential important high-order relationships a- mong different text units. In this paper, we propose a new collaborative extraction method based on hypergraph. In this method, sentences are modeled as hyperedges and words are modeled as vertices to build a hypergraph, and then the summary and keywords are generated by taking advantage of higher order information from sentences and words under the unified hypergraph. Experiments on the Weibo-oriented Chinese news summarization task in NLPCC 2015 demonstrate that the proposed method is feasible and effective.
出处
《中文信息学报》
CSCD
北大核心
2015年第6期135-140,共6页
Journal of Chinese Information Processing
基金
国家自然科学基金青年科学基金(61402191)
华中师范大学中央高校基本科研业务费项目(CCNU14A05015
CCNU15ZD003)
华中师范大学教师科研启动基金项目
国家社科基金重大计划招标项目(12&2D223)
关键词
超图
文本摘要
关键词抽取
协同抽取
hypergraph
document Summarization
keyword extraction
collaborative extraction