摘要
因短文本实体消歧具有不能完整地表达语义关系、上下文提供的信息较少等局限性。针对以上难点,该文提出了一种新的方法,混合卷积网络(Mixed Convolution Network,MCN)。该方法的核心思想是首先对数据集进行预处理;其次,采用Google提出的BERT模型进行特征提取,并通过注意力机制将特征进一步抽取后作为CNN模型的输入,通过CNN模型获得句子的依赖特征。同时,该文使用GCN模型获取语义特征,将二者提取到的语义信息融合,输出得到结果。在CCKS2019评测数据集上的实验结果表明,该文提出的混合卷积网络取得了86.57%的精确率,验证了该模型的有效性。
Entity disambiguation for short text has some limitations that short text can not fully express semantic relations,provide less context information,and so on.This paper proposes a new method named mixed convolution network(MCN).In this method,firstly,preprocess the data in the dataset;Secondly,the BERT model proposed by Google is applied to feature extraction,and the features are further extracted through the attention mechanism as the input of CNN model.The sentence dependent features are obtained through CNN model.At the same time,GCN model obtains text semantic features.The semantic information extracted from them is fused and the results are output.The experimental results on the ccks2019 evaluation data set show that the MCN proposed by this paper achieves an accuracy of 86.57%,which verifies the effectiveness of the method.
作者
姜丽婷
古丽拉·阿东别克
马雅静
JIANG Liting;Gulila ALTENBEK;MA Yajing(College of Information Science and Engineering,Xinjiang University,Urumqi,Xinjiang 830046,China;Xinjiang Laboratory of Multi-language Information Technology,Urumqi,Xinjiang 830046,China;The Base of Kazakh and Kirghiz Language of National Language Resource Monitoring and Research Center on Minority Languages,Urumqi,Xinjiang 830046,China)
出处
《中文信息学报》
CSCD
北大核心
2021年第11期101-108,共8页
Journal of Chinese Information Processing
基金
国家自然科学基金(62062062)
新疆大学科研基金(BS 180250)
关键词
短文本
实体消歧
BERT
图卷积网络
卷积神经网络
short text
entity disambiguation
BERT
graph convolution network
convolutional neural networks