期刊文献+

面向互联网资源的医学命名实体识别研究 被引量:6

Research of Medical Named Entity Recognition Based on Internet Resources
在线阅读 下载PDF
导出
摘要 医学信息提取的第一步在于命名实体识别,然而公开医学语料的缺乏使得这项工作困难重重。已有的研究大都建立在少量人工标注的文本之上,不具备很好的推广性。互联网作为大量数据的聚集地,可以从中进行医学知识的提取。针对互联网资源规模大,结构化程度低,缺乏标注等特点,提出了一种迭代式框架来对其加以利用。使用融合通用模型和领域词典的方法对文本进行标注,缓解了领域不同带来的精度降低问题。使用在线方法来构建模型,避免了迭代中对模型进行整体重构。在命名实体识别模型中融入了词法特征、词缀特征、词长特征等,提高了模型的识别能力。提出了一种启发式的模型压缩方法,增强模型的可用性。实验结果表明,所提出的策略是有效的。 The first step of medical information extraction is named entity recognition, but the lack of open medical corpus makes it rather difficult. Existing work commonly relies on a small amount of manually annotated texts, so that it can.t be widely promoted. As a collection of large amounts of data, the Internet can be used to extract medical knowledge. Considering the size and characteristic of Internet, this paper proposes an iterative framework to exploit it. In order to deal with the effect drop of domain differences, a method of fusing universal model and domain dictionary is used to annotate the text. To avoid retraining the model, an online method is used to build the model. This paper integrates multiple features in the model, including lexical features, affixes features, word length features and so on. Besides, this paper gives a heuristic model compression method to enhance the usability of the model. The experimental results show that the proposed strategies are effective.
作者 田家源 杨东华 王宏志 TIAN Jiayuan1,YANG Donghua1,2,WANG Hongzhi1(1.School of Computer Science and Technology, Harbin Institute of Technology,Harbin 150001, China; 2.Academy of Fundamental and Interdisciplinary Sciences, Harbin Institute of Technology,Harbin 150001, Chin)
出处 《计算机科学与探索》 CSCD 北大核心 2018年第6期898-907,共10页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金Nos.61472099 61772157 国家科技支撑计划No.2015BAH10F01~~
关键词 命名实体识别 互联网资源 迭代框架 平均感知器 named entity recognition Internet resources iterative framework average perceptron
  • 相关文献

参考文献3

二级参考文献29

  • 1俞鸿魁,张华平,刘群,吕学强,施水才.基于层叠隐马尔可夫模型的中文命名实体识别[J].通信学报,2006,27(2):87-94. 被引量:163
  • 2Doan A,Naughton JF,Ramakrishnan R,et al.Information extraction challenges in managing unstructured data[J].ACM SIGMOD Record,2008,37(4):14-20.
  • 3Vlachos A,Gasperin C.Bootstrapping and evaluating named entity recognition in the biomedical domain[C]//Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology.New York:Association for Computational Linguistics Morristown,2006:138-145.
  • 4Bundschus M,Dejori M,Stetter M,et al.Extraction of semantic biomedical relations from text using conditional random fields[J].BMC Bioinformatics,2008,9:207.
  • 5Leaman R,Gonzalez GR.BANNER:An executable survey of advances in biomedical named entity recognition[C]//Proceedings of Pacific Symposium on Biocomputing.Hawaii:World Scientific Publishing Co.Pte.Ltd,2008:652-663.
  • 6Leaman R,Miller C,Gonzalez G.Enabling recognition of diseases in biomedical text with machine learning:Corpus and benchmark[C]//Proceedingsof the 3rdInternational Symposium on Lagauges in Biology and Medicine.Seogwipo-si.LBM,2009:82-89.
  • 7Tsai Tzong-ham,Chou Wen-Chi,Wu Shih-Hung,et al.Integrating Linguistic Knowledge into a Conditional Random Field Framework to Identify Biomedical Named Entities[J].Expert Systems with Applications,2006,30(1):117-128.
  • 8Sun ChengJie,Guan Yi,Wang XiaoLong,et al.Biomedical named entities recognition using conditional random fields model[J].Lecture notes in computer science,2006,4223:1279-1288.
  • 9Salem ABM.Case based reasoning technology for medical diagnosis[J].World Academy of Science,Engineering and Technology,2007,25:9-13.
  • 10Rossille D,Laurentc JF,Burgun A.Modelling a decisionsupport system for oncology using rule-based and case-based reasoning methodologies[J].International Journal of Medical Informatics,2005,74:299-306.

共引文献141

同被引文献51

引证文献6

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部