期刊文献+

基于大数据挖掘的多维数据去重聚类算法分析 被引量:11

Analysis of multidimensional data de-duplication clustering algorithm based on large data mining
在线阅读 下载PDF
导出
摘要 数据产生的渠道越来越多,速度越来越快,大量的数据为数据分析和处理带来了较大的难度,云平台中的数据种类和规模也在不断扩大,超大的数据规模给数据的存储、管理、分析等带来了前所未有的挑战。数据量剧增会导致数据的可靠性不足,如何有效地处理数据之间的关系,降低冗余数据,建立多维数据去重聚类模型是业界共同努力的方向。文中最先介绍了在大数据挖掘下的多维聚类算法,通过分析大数据内部之间的关系,建立一种适合大数据处理的多维数据去重聚类算法分析模型,对该算法进行改进和实验分析,得到该算法在采样时所存在的复杂度较低,数据分析的结果准确,有利于实现数据的分析和处理,减少数据的冗余,增加数据分析的效率,具有良好的判定效果。 The dramatic increase of data volume may lead to the lack of data reliability,so how to effectively deal with the relationship between data,reduce redundant data,and establish a multi-dimensional data de-duplication model is the direction of joint efforts of the industry. The multi-dimensional clustering algorithm based on large data mining is introduced in this paper.By analyzing the relationship among large data,a multi-dimensional data de-duplication clustering algorithm analysis model suitable for large data processing is established. The algorithm is improved and analyzed experimentally. It is concluded that the complexity of the algorithm in sampling is low,and the results of data analysis are accurate,which is conducive to the realization of data analysis and processing,reduction of the data redundancy,and increase of data analysis efficiency. Anyway,the algorithm has a good judgment effect.
作者 宋鹏 SONG Peng(Hunan University,Changsha 410082,China;Hunan Vocational College of Science&Technology,Changsha 410004,China)
出处 《现代电子技术》 北大核心 2019年第23期150-153,共4页 Modern Electronics Technique
基金 湖南省教育科学规划课题(XJK018JKB021)~~
关键词 大数据挖掘 多维数据去重 聚类算法 数据分析 模型建立 减少冗余 big data mining multidimensional data de-duplication clustering algorithm data analysis model establish-ment redundancy reduction
  • 相关文献

参考文献6

二级参考文献45

  • 1纪洪生.基于概率的剪枝算法[J].电脑知识与技术,2006(11):99-100. 被引量:1
  • 2WANG X B, FU M Y, ZHANG H S, et al. Target tracking in wireless sensor networks based on the combination of KF and MLE using distance measurements [J]. IEEE Transactions on Mobile Computing, 2012, 11(4): 567-576.
  • 3EKANAYAKE J, LI H, ZHANG B, ET AL. Twister: a run- time for iterative MapReduce [C] // Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. ACM: [s.n.], 2013: 810-818.
  • 4HE B, FANG W, LUO Q, et al. Mars: a MapReduce frame- work on graphics processors [C]//Proceedings of the 17th inter- national conference on Parallel architectures and compilation techniques. ACM: [s.n.], 2014: 260-269.
  • 5THUSOO A, SARMA J S, JAIN N, et al. Hive: a warehousing solution over a map-reduce framework [J]. Proceedings of the VLDB Endowment, 2013, 2(2): 1626-1629.
  • 6ABOUZEID A, BAJDA-PAWLIKOWSKI K, ABADI D, et al. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads [J]. Proceedings of the VLDB Endowment, 2014, 2(1) : 922-933.
  • 7Liu Xiaohua,Huang Jiejun,Wan Youchuan,et al.Logical expression of feature-based spatio-temporal da- ta model research:2nd international conference on in- formation engineering and computer science - proceed- ings,ICIECS.2010[C].IEEE Computer Society,2010.
  • 8Han J,Kamber M,PEI J.Data Mining:Concepts and Techniques[M].3nd ed.范明,孟小峰,译.数据挖掘概念与.北京:机械工业出版社,2012:55-79.
  • 9Manish V,Mauly S,Neha C,et al.A comparative study of various clustering algorithms in data mining [J].International Journal of Engineering Research and Applications.2012,2(3):1379-1384.
  • 10席景科,谭海樵.空间聚类分析及评价方法[J].计算机工程与设计,2009,30(7):1712-1715. 被引量:32

共引文献29

同被引文献119

引证文献11

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部