摘要
提出由本体驱动,并根据文档结构和特征匹配来进行信息定位和信息抽取的方法,并实现了一个用户指导的交互式信息抽取原型系统。有效地解决了信息抽取中涉及的同义词,一词多义等语义问题,以及数据项不完整和排序不固定的问题。
A new approach to extract information from semi-structured Web documents is presented, which locates the data blocks needed in the documents by means of document structure and performs pattern matching based on ontology. Meanwhile, it implements an interactive information extraction prototype system. This approach can efficiently locate the information needed in document, and avoid the semantic problems such as synonyms, polysemy and units missing, etc.
出处
《计算机工程》
EI
CAS
CSCD
北大核心
2006年第5期192-194,共3页
Computer Engineering
基金
国家"863"计划基金资助项目(2002AA231071)