期刊文献+

基于CRF的症状构成分析与标注 被引量:6

Analysis and Annotation of Symptom Composition Based on CRF
在线阅读 下载PDF
导出
摘要 中文症状的描述丰富多样,症状的构成元素复杂多变,对症状构成的研究有助于全面理解症状成分、识别症状名称的同义词以及定量分析患者的患病情况。本文提出了一种中文症状构成模型,将中文症状看作是一个由原子症状、连词、否定词等16种构成元素中的一个或多个所组成的构成序列,并利用条件随机场模型实现对症状构成序列的自动标注。实验结果表明,该方法能够很好地识别中文症状的构成元素,其症状和构成元素两种统计粒度上的标注正确率分别达到了90.53%和93.91%。 The description of Chinese symptoms is rich and varied,and the constituent elements of symptoms are complex and changeable.As an important step to transform unstructured electronic medical records into structured ones,the recognition on the composition of Chinese symptoms will be helpful for fully grasping the information of symptoms,distinguishing the synonym of symptom name,and quantitative analyzing the patient’s condition.In this paper,we present a composition model of Chinese symptom,in which a symptom name is taken as a sequence composed of one or more of the 16 elements,e.g.,atomic symptom,conjunction and negative word.Moreover,the conditional random fields(CRF)is utilized to realize the automatic recognition of the sequences of symptoms.Firstly,we collect 5 645 Chinese symptoms from eight healthcare websites and semi-automatically annotate them.Then,CRF algorithm is used to recognize symptom composition elements.By choosing proper feature template on the symptom composition recognition,we verify the effect of CRF features and analyze the unrecognized symptom composition elements among the recognition results.We also design artificial rules with a symptom composition dictionary that targets at the wrong-type entities for correcting the recognition results.Finally,it has been shown from experiment results that the proposed method can effectively identify the composition elements of Chinese symptoms and increase the accuracy of the recognition of symptoms and composition elements by 90.53%and 93.91%,respectively.
作者 曾露 高大启 阮彤 王祺 高炬 何萍 ZENG Lu;GAO Da-qi;RUAN Tong;WANG Qi;GAO Ju;HE Ping(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China;Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine,Shanghai 201203,China;Glorious Sun School of Business and Management,Donghua University,Shanghai 200051,China)
出处 《华东理工大学学报(自然科学版)》 CAS CSCD 北大核心 2018年第2期277-282,共6页 Journal of East China University of Science and Technology
基金 国家"863"计划(2015AA020107)
关键词 症状 构成分析 条件随机场 symptom composition analysis conditional random fields(CRF)
  • 相关文献

参考文献4

二级参考文献39

  • 1JohnHalamka.电子病历与临床信息技术革命美国的现状及深远影响[J].中华医学杂志,2005,85(22):1513-1515. 被引量:37
  • 2于江德,樊孝忠,尹继豪.隐马尔可夫模型在自然语言处理中的应用[J].计算机工程与设计,2007,28(22):5514-5516. 被引量:14
  • 3百度百科.单音字[EB/OL].[2014-01-16].http://baike.baidu, com/view/654310, htm.
  • 4EkbaI A, Mondal S, Bandyopadhyay S. POS tagging using HMM and rule-based chunking[J].The Proceedings of SP SAL, 2007, 8 (1): 25-28.
  • 5World Health Organization. The ICD - 10 Classification of Mental and Behavioral Disorders: diagnostic criteria for research [ J ]. Geneva World Health Organization, 1993, 8 (12): 14.
  • 6McDonald C J, Huff S M, Suico J G, et al. LOINC, a Univer- sal Standard for Identifying Laboratory Observations: a 5 -year update [J]. Clinical Chemistry, 2003, 49 (4): 624 -633.
  • 7Darden T, York D, Pedersen L. Particle Mesh Ewald: an N .log (N) Method for Ewald Sums in large Systems [ J ]. The Journal of Chemical Physics, 1993, 98 (12) : 10089 -10092.
  • 8Donnelly K. SNOMED -CT: the advanced terminology and coding system for eHealth [ J]. Studies in Health Technolo- gy and Informatics, 2006, ( 121 ) : 279.
  • 9Bodenreider O. The Unified Medical Language System (UMLS) : integrating biomedical terminology [ J ]. Nucleic Acids Research, 2004, 32 (suppl 1 ) : 267 -270.
  • 10Hearst M A. Automatic Acquisition of Hyponyms from Large Text Corpora [ C]. Beijing: on Computational Linguistics, 14th International Conference 2010.

共引文献129

同被引文献33

引证文献6

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部