摘要
结合模式匹配、生物医学本体及共现技术,设计了一种自动抽取基因与疾病、基因与基因之间关系的文本挖掘方法,并开发了一个可以处理海量文本数据的系统.该系统可抽取与疾病相关的基因实体,挖掘基因与疾病、基因与基因之间的关系,衡量基因与疾病实体的相关性,并为分析基因与疾病、基因与基因之间的关系提供了网络可视化工具.实验结果表明,系统在测试数据集上抽取基因与疾病之间的关系可获得83.0%的综合测评率,抽取基因与基因之间的关系可获得78.5%的综合测评率.该系统已成功应用于乳腺癌及相关基因的研究.
A text mining approach is designed for automatically extracting the relationships between genes and diseases and those between genes and genes by combining pattern match and biomedical ontology with co-occurrence techniques.And a system is developed for processing large-scale text datasets.The system can extract gene entities related to diseases,mine the relationships between genes and diseases and those between genes and genes,and rank the relevance of the relationships between genes and diseases.Moreover,network visualization tools are provided for analyzing the relationships between genes and diseases and those between genes and genes.The experimental results show an F-score of 83.0% can be achieved for the extraction of the relationships between genes and diseases,and an F-score of 78.5% can be obtained for the extraction of the relationships between genes for the test datasets.This system is successfully applied to the researches about breast cancer and related genes.
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2010年第3期486-490,共5页
Journal of Southeast University:Natural Science Edition
基金
国家自然科学基金资助项目(60771024)
关键词
生物医学
文本挖掘
关系抽取
实体识别
biomedicine
text mining
relation extraction
entity recognition