一种面向基因与疾病关系的文本挖掘方法被引量：2

Text mining approach for relationships between genes and diseases

在线阅读下载PDF

导出

摘要结合模式匹配、生物医学本体及共现技术,设计了一种自动抽取基因与疾病、基因与基因之间关系的文本挖掘方法,并开发了一个可以处理海量文本数据的系统.该系统可抽取与疾病相关的基因实体,挖掘基因与疾病、基因与基因之间的关系,衡量基因与疾病实体的相关性,并为分析基因与疾病、基因与基因之间的关系提供了网络可视化工具.实验结果表明,系统在测试数据集上抽取基因与疾病之间的关系可获得83.0%的综合测评率,抽取基因与基因之间的关系可获得78.5%的综合测评率.该系统已成功应用于乳腺癌及相关基因的研究. A text mining approach is designed for automatically extracting the relationships between genes and diseases and those between genes and genes by combining pattern match and biomedical ontology with co-occurrence techniques.And a system is developed for processing large-scale text datasets.The system can extract gene entities related to diseases,mine the relationships between genes and diseases and those between genes and genes,and rank the relevance of the relationships between genes and diseases.Moreover,network visualization tools are provided for analyzing the relationships between genes and diseases and those between genes and genes.The experimental results show an F-score of 83.0% can be achieved for the extraction of the relationships between genes and diseases,and an F-score of 78.5% can be obtained for the extraction of the relationships between genes for the test datasets.This system is successfully applied to the researches about breast cancer and related genes.

作者龚乐君韦有兵谢建明袁志栋孙啸

机构地区东南大学生物电子学国家重点实验室淮阴工学院计算机工程学院

出处《东南大学学报（自然科学版）》 EI CAS CSCD 北大核心 2010年第3期486-490,共5页 Journal of Southeast University：Natural Science Edition

基金国家自然科学基金资助项目(60771024)

关键词生物医学文本挖掘关系抽取实体识别 biomedicine text mining relation extraction entity recognition

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1Fernandez J M,Hoffmann R,Valencia A.iHop webservices. Nucleic Acids Research . 2007
2Rebholz-Schuhmann D,Kirsch H,Arregui Met al.EBIMed—text crunching to gather facts for proteinsfrom Medline. Bioinformatics . 2007
3Muin M,Fontelo P.Technical development of PubMedinteract:an improved interface for Medline/PubMedsearches. BMC Med Inform Decis Mak . 2006
4Wain H M,Bruford E A,Lovering R Cet al.Guide-lines for human gene nomenclature. Genomics . 2002
5Swanson D R.Complementary structures in disjoint sci-ence literatures. Proceedings of the14th Annual In-ternational ACM SIGIR Conference on Research andDevelopment in Information Retrieval . 1991
6Sinilnikova O M,Antoniou A C,Simard Jet al.TheTP53Arg72Pro and MDM2 309G>T polymorphismsare not associated with breast cancer risk in BRCA1andBRCA2mutation carriers. British Journal of Cancer . 2009
7AM Cohen,WR Hersh.A survey of current work in biomedical text mining. Briefings in Bioinformatics . 2005
8Honrado,E,Benitez,J,Palacios,J.Histopathology of BRCA1- and BRCA2-associated breast cancer. Critical Reviews in Oncology Hematology . 2006
9A. Seth D. Palli J. M. Mariano R. Metcalf M. C. Venanzoni S. Bianchi4 S. D. Kottaridis5 and T. S. Papas strong>b.p53 gene mutations in women with breast cancer and a previous history of benign breast disease. European Journal of Cancer . 1994
10Sunpaweravong S,Sunpaweravong P.Recent developments in critical genes in the molecular biology of breast cancer. Asian Journal of Surgery . 2005

同被引文献16

1孙蔓莉,姚岳.公司报告语言信息研究[J].甘肃社会科学,2005(3):244-247. 被引量：11
2Gruber T R. A Translation Approach to Portable Ontology Speci- fications, KSL92-71 [R]. San Francisco: Knowledge Systems Laboratory of Stanford University: 1993.
3Clarkson P M, Kao J L, Richardson G D. Evidence that Man- agement Discussion and Analysis (MD&A) is a Part of a Firm's Overall Disclosure Package [ J ]. Contemporary Accounting Re- search, 1999,16( 1 ) :111-134.
4Paul Buitelaar, Philipp Cimiano, Anette Frank. Ontology-basedInformation Extraction and Integration from Heterogeneous Data Sources [ J ]. Int. J. Human-Computer Studies, 2008 (66) :759 -788.
5Clarkson P M, Kao J L, Richardson G D. Evidence that Manage- ment Discussion and Analysis (MD&A) is a Part of a Firm's O- verall Disclosure Package. Contemporary Accounting Research, 1999,16(1) :111-134.
6丁堃,刘盛博,许侃.基于文本挖掘机制的区域经济关系分析[J].情报学报,2008,27(3):418-424. 被引量：5
7魏顺平,何克抗.基于文本挖掘的领域本体半自动构建方法研究——以教学设计学科领域本体建设为例[J].开放教育研究,2008,14(5):95-101. 被引量：14
8朱恒民,马静,黄卫东,樊黄稀.基于领域本体实现全网信息的智能搜索方法研究[J].情报学报,2010,29(1):9-15. 被引量：7
9何翔,孙巍.新药研发及药品注册流程分析[J].民营科技,2010(7):103-103. 被引量：1
10邹涛.一种电子产品领域命名实体识别方法研究[J].情报学报,2010,29(6):1074-1079. 被引量：2

引证文献2

1蒋艳辉,姚靠华,周双文,王薇.一种基于领域本体的药品研发信息抽取方法[J].情报杂志,2012,31(12):130-134. 被引量：4
2翟菊叶,叶泽坤,杨枢,刘长青.基于生物医学文献挖掘的疾病-基因-药物关系抽取研究[J].新余学院学报,2018,23(2):1-5. 被引量：2

二级引证文献6

1蒋艳辉,冯楚建.MD&A语言特征、管理层预期与未来财务业绩——来自中国创业板上市公司的经验证据[J].中国软科学,2014(11):115-130. 被引量：48
2阳广元.国内基于本体的信息抽取研究现状与热点分析[J].图书馆理论与实践,2017,0(5):38-43. 被引量：1
3代君,李佶壕,秦岩,王文欣.基于综述型文献的跨学科领域信息源地图绘制[J].图书情报知识,2018,35(6):61-74. 被引量：2
4刘强,蒋芷翌.智慧医疗研究工具专利延展性许可问题研究[J].武陵学刊,2020,45(2):57-65. 被引量：2
5任雪菁,安新颖,范少萍,张飞,黄裕翔.基于词典与CRF算法的中文生物医学实体自动标注平台建设[J].中华医学图书情报杂志,2020,29(9):29-35. 被引量：3
6张维冲,孟浩.我国本体构建及应用分析——基于专利和文献共同演进的视角[J].图书情报工作,2016,60(S1):127-131. 被引量：1

1包书哲,周东清,侯志刚.一个文本挖掘方法在扩展的电子商务系统中的应用[J].计算机应用研究,2003,20(12):107-108. 被引量：1
2黄嘉满,张冬茉.基于本体的商务领域文本检索的研究[J].微型电脑应用,2007,23(2):46-48.
3符保龙.基于背景知识和主动学习的文本挖掘技术研究[J].计算机应用与软件,2013,30(5):275-278. 被引量：1
4邹腊梅,肖基毅,龚向坚.基于Maximum Likelihood与HMM的文本挖掘[J].计算机技术与发展,2007,17(12):110-112. 被引量：1
5刘如意,杨鹤标.基于医学本体的语义相似度算法研究[J].信息技术,2014,38(12):207-210.
6英国科学家破解七种常见病基因[J].现代生物医学进展,2007,7(7).
7邹权,林琛,刘晓燕,郭茂祖.生物信息学中的文本挖掘方法[J].计算机工程与设计,2011,32(12):4075-4078. 被引量：2
8符保龙,张爱科.基于均值密度中心估计的k-means聚类文本挖掘方法[J].重庆邮电大学学报（自然科学版）,2014,26(1):111-116. 被引量：13
9高曼,崔雷.利用文本挖掘进行药物重新定位的步骤与工具[J].中华医学图书情报杂志,2017,26(3):6-9. 被引量：1
10杨春媛,李满生,朱云平.生物医学领域本体的构建、评估与应用[J].中国科学：生命科学,2013,43(3):223-239. 被引量：10

东南大学学报（自然科学版）

2010年第3期

浏览历史

内容加载中请稍等...

一种面向基因与疾病关系的文本挖掘方法被引量：2

参考文献13

同被引文献16

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

一种面向基因与疾病关系的文本挖掘方法 被引量：2

参考文献13

同被引文献16

引证文献2

二级引证文献6

相关作者

相关机构

相关主题

浏览历史

一种面向基因与疾病关系的文本挖掘方法被引量：2