摘要
信息抽取技术能够提供高质量的检索服务。本文面向网络新闻事件,对人们感兴趣的事件关键信息进行了抽取和集成。系统中采用了如下的方法、策略:(1)利用句型模板构造抽取规则,然后直接从经过时间短语和空间短语识别和规范化处理的文本中抽取事件信息,从而跳过了深层句法分析,降低了实现系统的难度;(2)利用事件的规范化的时空信息关联不同文档中的同一事件,进行事件合并;(3)文档发生事件转移时对文档进行事件切分,从而解决了文档内不同事件信息的归并问题。初步实验结果表明:本文采用的方法和策略是有效的。
Technology of information extraction (IE) can provide high-quality service for retrieval. Targeting at events in web news,this paper conducts a system that can extract and integrate key information of event that interests people. Methodologies and strategies of the system are as follows: (1) Extraction rules are built in tenus of sentence patterns, then event informarion is directly extracted from the text in which temporal phrases (TP) and space phrases (SP) are recognized and normalized . The extraction system can thus be easily implemented owing to skipping complex syntax parsing. (2) The same event in different documents is linked by normalized TP and SP of event, and the information associated with an event is merged. (3) When new event appears in a text, the text is segmented. So isolative information for an event in same segment can be merged into its owner. Preliminary experiments show that methodologies and strategies in this paper are feasible.
出处
《中文信息学报》
CSCD
北大核心
2006年第1期21-28,共8页
Journal of Chinese Information Processing
基金
国家863项目资助(2001AA114040)
关键词
计算机应用
中文信息处理
信息抽取
句型模板
线索性事件
时空信息
事件合并
computer application
Chinese information processing
information extraction
sentence patlem
developing event
space-time information
event merge