摘要
针对特定领域内自动化识别既有概念和发现新概念的问题,提出一种基于条件随机场和信息熵的抽取方法。通过使用条件随机场对文本中的概念词进行边界预测,与词典中的概念对比,筛选出新概念的候选项并找出其大概位置,然后由互信息和左右熵分别判断概念窗口内的概念内部结合度和概念边界自由度,从而发现新的专业概念。实验表明,使用该方法进行概念发现比单独使用条件随机场的方法有更好的效果,基于字和词的模型概念发现的准确率分别提升了20.06%和46.54%。
Aiming at the problem of automatic identification of existing concepts and discovering new concepts in a specific field,this paper proposed a new words discovery method based on conditional random field(CRF)and information entropy.This method used CRF to predict the boundary of conceptual words in text,selected the candidates of the new concept with the comparison to the existing concepts in the dictionary and found the probably location in text.Then it used the mutual information and the left and right entropy to judge the internal integration degree and the boundary freedom of the concept in the concept window for discovering new professional concepts.Experiments show that the concept discovery using the proposed method has a better effect than the method of using CRF alone.The accuracy of the concept discovery based on word and words model is respectively improved by 20.06%and 46.54%.
作者
付瑶
万静
邢立栋
Fu Yao;Wan Jing;Xing Lidong(College of Information Science&Technology,Beijing University of Chemical Technology,Beijing 100029,China;Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第3期708-711,730,共5页
Application Research of Computers
基金
国家科技支撑计划资助项目(2015BAK03B04)。
关键词
概念识别
新概念发现
条件随机场
信息熵
特定领域
concept recognition
new concept discovery
conditional random field
information entropy
specific field