期刊文献+

复杂数据上的实体识别技术研究 被引量:19

Object Identification on Complex Data:A Survey
在线阅读 下载PDF
导出
摘要 复杂数据当前有着广泛的应用.有效地使用复杂数据需要对其质量进行管理.实体识别是数据质量管理的基本操作,用于在数据集合中发现同一实体的不同描述,其在数据质量管理中可以用于错误检测、不一致数据发现等.由于包含复杂的结构信息,复杂数据上的实体识别与传统文本和关系数据上的实体识别不同,带来了新的技术上的挑战.该文介绍了复杂数据上实体识别的概念和应用,分别讨论了XML数据、图数据和复杂网络上实体识别技术的原理,最后展望了未来的研究方向. It is increasingly common to find data with a complex structure in the real world.To effectively use complex data in practice,necessary techniques must be in place to improve the quality of the data.Entity resolution is a central issue in data quality management for complex objects.It is to find the data objects that refer to the same real-world entity,and to cluster such objects together.It has been proven extremely useful in data fusion,inconsistency detection and in data repairing.Nevertheless,the complex structures of data introduce new challenges and make object identification much harder than record matching on relational data.In response to the new challenges,there has been a lost of work on this topic.This paper aims to provide an overview of recent advances in the study of object identification,on complex objects including XML,graph data and complex networks.For XML data,we survey techniques of pairwise entity and group-wise entity resolution.For graph data,we focus on how to determine whether two graphs refer to the same real-world entity.We also present the metrics and methods for identifying vertexes that pertain to the same real-world entity in a complex network.Finally we discuss directions for future research.
出处 《计算机学报》 EI CSCD 北大核心 2011年第10期1843-1852,共10页 Chinese Journal of Computers
基金 国家自然科学基金(61003046 61033015 61133002) RSE-NSFC交流项目(61111130189) 国家"九七三"重点基础研究发展规划项目基金(2012CB316200) 教育部博士点基金(20102302120054)资助~~
关键词 数据质量 复杂数据 实体识别 XML图 复杂网络 data quality complex data object identification XML graph complex network
  • 相关文献

参考文献54

  • 1Tracy N T,Green H W.Carbon solubility in olivine:implications for upper-mantle evolution [J ].Geology,1987,15:324-326.
  • 2吴茂炳,刘春燕.地幔流体中碳、氢的赋存形式及其同位素组成[J].新疆石油地质,2003,24(4):273-276. 被引量:4
  • 3Nikki S. Gartner warns firms of "dirty data". Information Management Journal, 2007, 41 (3). http://www, allbusi ness. com/company-activities-management/operations quality-control/8901885-1. html.
  • 4Kohn L T, Corrigan J M, Donaldson M S. To err is human, building a safer health system. Washington, D. C. , USA: National Academies Press, 2000.
  • 5Eckerson W. Data quality and the bottom line: Achieving business success through a commitment to high quality data. The Data Warehousing Institute: Technical Report, 2002. http://download. 101com. com/pub/tdwi/Files/DQReport. pdf.
  • 6Porcelli D,Wasserburg G J.Mass transfer of helium,neon,argon and xenon through a steady-state upper mantle [J].Geochim Cosmochim Acta,1995,59(23):4921-4937.
  • 7Kaneoka I,Takaoka N.Noble gas state in the Earth′s interiorsome constraints on the present state [J].Chemical Geology(Isotope Geoscience Section),1985,52:75-95.
  • 8Niedermann S.Mass spectrometric identification of cosmicray-produced neon in terrestrial rocks with multiple neon components[J].Earth Planet Sci Lett,1993,118:65-73.
  • 9Staudacher Th,Allegre C J.Terrestrial xenology [J].Earth Planet Sci Lett,1982,60:389-406.
  • 10Farley K A,Natland J H,Craig H.Binary mixing of enriched and undegasses (primitive?) mantle components (He,Sr,Nd,Pb) in Samoan lavas [J].Earth Planet Sci Lett,1992,111:183-199.

二级参考文献6

共引文献3

同被引文献332

引证文献19

二级引证文献208

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部