摘要
充分识别并空间化文本中蕴含的空间位置信息,对文本数据挖掘研究具有重要意义。由于文本中的空间信息存在描述不规范、形式多样和混杂方言等特点,识别难度大,提出一种规则匹配和深度学习相结合的空间信息识别及定位方法。首先根据标准地名地址制作匹配语义库,利用规则匹配法精准提取空间信息并实现空间定位;然后将其作为深度学习的样本数据,训练BERT-BiLSTM-CRF模型,实现空间信息的自动提取;再利用前后缀特征词匹配规则作为补充处理,进一步充分提取文本中的空间信息;最后利用地理编码技术实现空间定位。实验表明,本方法能有效提高空间信息识别的准确率、召回率,具有可操作性。
It is very important to accurately extract and spatialize the information of locations from texts,especially for the text data mining.However,the spatial information enclosed in the text often involves issues of non-standard description,diversified forms and mixed dialects,which makes it difficult to identify.This paper proposes a method on the spatial information recognition and positioning by integrating multi-rules matching scheme and deep learning approach.Firstly,the rule semantic database is constructed according to the standard toponym and address,and the rule matching method is used to extract spatial information accurately and identify the spatial location.Then,the above results is taken as the sample data of deep learning to train the BERT-BiLSTM-CRF model and implement the automatic extraction of spatial information.Next,the matching using prefix and suffix is performed as the supplement to extract the spatial information from texts.Finally,geographic coding technology is adopted to realize spatial location.Experiments show that this method can effectively improve the accuracy and recall rate of spatial information recognition,and has appropriate operability.
作者
何小波
罗跃
金贤锋
刘贤
HE Xiaobo;LUO Yue;JIN Xianfeng;LIU Xian(Chongqing Geographic Information and Remote Sensing Application Center,Chongqing 401147,China;Guizhou University of Engineering Science,Guiyang 551700,China)
出处
《地理信息世界》
2020年第5期121-128,共8页
Geomatics World
基金
国家重点研发计划(2018YFB0505400)
社会民生类重点研发项目(cstc2018jscx-mszdX0067)资助。
关键词
文本挖掘
空间信息识别
地名实体识别
自然语言处理
地理编码
text mining
spatial information extraction
geographical names recognition
natural language process
geographic coding