期刊文献+

面向实体解析的无监督聚类方法综述 被引量:7

Survey of unsupervised clustering approach oriented to entity resolution
在线阅读 下载PDF
导出
摘要 旨在从无监督聚类角度分析实体解析过程的机制。从特定类型、经典算法角度研究了无监督聚类的思路;从经典算法改进、演化分析角度研究了无监督增量聚类的思路;最后,对无监督聚类研究下一步需要解决的问题进行了展望。无监督聚类技术不仅能很好地解决传统实体解析过程中存在的聚类效率和质量问题,而且还能利用已有的聚类结果对快速演化的数据进行增量解析,进而进一步满足大数据环境下亟需的增量解析需求。没有深入分析无监督聚类算法的评价指标,尽管面向实体解析的无监督聚类方法有诸多优势,但仍然面临着准确性和可扩展性等挑战。 The aim is to analyze the mechanism of Entity Resolution(ER)from unsupervised clustering.This paper firstly elaborates the unsupervised clustering ideas from specific types,classical algorithms;then,it studies the unsupervised incremental clustering method from the classical algorithm improvements and evolution analyses.Finally,the problems to be solved in unsupervised clustering are prospected.Unsupervised clustering technology not only can solve the clustering efficiency and quality problems of traditional entity resolution,but also can use existing clustering results to implement incremental entity resolution for rapidly evolving data,to further meet the needs for incremental incremental entity resolution under the big data environment.There is no in-depth analysis of the evaluation index of unsupervised clustering algorithm.Although the unsupervised clustering method for entity analysis has many advantages,it still faces the challenges of accuracy and scalability.
作者 高广尚 GAO Guangshang(Research Center for Modern Enterprise Management,Guilin University of Technology,Guilin,Guangxi 541004,China;School of Management,Guilin University of Technology,Guilin,Guangxi 541004,China)
出处 《计算机工程与应用》 CSCD 北大核心 2018年第7期11-19,65,共10页 Computer Engineering and Applications
基金 国家自然科学基金(No.71761008) 广西高校人文社会科学重点研究基地基金(No.16YB010)
关键词 实体解析 无监督聚类 无监督增量聚类 Entity Resolution(ER) unsupervised clustering unsupervised incremental clustering
  • 相关文献

参考文献6

二级参考文献119

  • 1Vapnik V N. The Nature of Statistical Learning Theory. New York, NY, USA Springer-Verlag, 1995.
  • 2Bartlett P, Ben-David S, Kulkarni S. Learning changing con- cepts by exploiting the structure of change. Machine Learn- ing, 2000, 41(2): 153 174.
  • 3Huhen G, Spencer L, Domingos P. Mining time-changing data streams//Proceedings of the 7th ACM SIGKDD Interna tional Conference on Knowledge Discovery Data Mining (KDD). New York, 2001:97-106.
  • 4Gaber M M, Zaslavsky A, Krishnaswamy S. Mining data streams: A review. ACM Sigmod Record, 2005, 34(2): 18-26.
  • 5Webb G, Ting K. On the application of roe analysis to pre- dict classification performance under varying class distribu- tions. Machine Learning, 2005, 58(1): 25-32.
  • 6Chakrabarti D, Kumar R, Tomkins A. Evolutionary cluste ring//Proeeedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery Data Mining (KDD). New York, NY, 2006:554-560.
  • 7Blei D M, Lafferty J D. Dynamic topic models//Proceedings of the 23rd International Conference on Machine Learning. New York, 2006:113 120.
  • 8Bifet A, Gavalda R. Learning from time-changing data with adaptive windowing//Proceedings of the SIAM International Conference on Data Mining, Minneapolis. Minnesta, USA, 2007 : 443-448.
  • 9Ren L, Dunson D B, Carin L. The dynamic hierarchical dirichlet proeess//Proceedings o{ the 25th International Con- ference on Machine Learning. New York, 2008:824-831.
  • 10Bifet A, Holmes G, Pfahringer B. Leveraging bagging for evolving data streams//Proceedings of the 2010 European conference on Machine learning and Knowledge Discovery in Databases: Part I. Berlin, 2010:135 150.

共引文献235

同被引文献34

引证文献7

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部