基于大数据挖掘的多维数据去重聚类算法分析被引量：11

Analysis of multidimensional data de-duplication clustering algorithm based on large data mining

在线阅读下载PDF

导出

摘要数据产生的渠道越来越多,速度越来越快,大量的数据为数据分析和处理带来了较大的难度,云平台中的数据种类和规模也在不断扩大,超大的数据规模给数据的存储、管理、分析等带来了前所未有的挑战。数据量剧增会导致数据的可靠性不足,如何有效地处理数据之间的关系,降低冗余数据,建立多维数据去重聚类模型是业界共同努力的方向。文中最先介绍了在大数据挖掘下的多维聚类算法,通过分析大数据内部之间的关系,建立一种适合大数据处理的多维数据去重聚类算法分析模型,对该算法进行改进和实验分析,得到该算法在采样时所存在的复杂度较低,数据分析的结果准确,有利于实现数据的分析和处理,减少数据的冗余,增加数据分析的效率,具有良好的判定效果。 The dramatic increase of data volume may lead to the lack of data reliability,so how to effectively deal with the relationship between data,reduce redundant data,and establish a multi-dimensional data de-duplication model is the direction of joint efforts of the industry. The multi-dimensional clustering algorithm based on large data mining is introduced in this paper.By analyzing the relationship among large data,a multi-dimensional data de-duplication clustering algorithm analysis model suitable for large data processing is established. The algorithm is improved and analyzed experimentally. It is concluded that the complexity of the algorithm in sampling is low,and the results of data analysis are accurate,which is conducive to the realization of data analysis and processing,reduction of the data redundancy,and increase of data analysis efficiency. Anyway,the algorithm has a good judgment effect.

作者宋鹏 SONG Peng(Hunan University,Changsha 410082,China;Hunan Vocational College of Science&Technology,Changsha 410004,China)

机构地区湖南大学湖南科技职业学院

出处《现代电子技术》北大核心 2019年第23期150-153,共4页 Modern Electronics Technique

基金湖南省教育科学规划课题(XJK018JKB021)~~

关键词大数据挖掘多维数据去重聚类算法数据分析模型建立减少冗余 big data mining multidimensional data de-duplication clustering algorithm data analysis model establish-ment redundancy reduction

分类号 TN911.1-34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献6

1何春华.大数据环境中多维数据去重的聚类算法分析[J].计算机产品与流通,2017,0(11):149-149. 被引量：4
2刘先花.基于群体协同智能聚类的大数据存储系统设计[J].现代电子技术,2017,40(23):130-133. 被引量：7
3孟彩霞,陈红玉.基于TF-IDF改进聚类算法的网络敏感信息挖掘[J].现代电子技术,2015,38(24):44-46. 被引量：6
4谭梦茜,邵雄凯,刘春.基于时空分析的位置大数据挖掘方法研究[J].湖北工业大学学报,2016,31(2):53-57. 被引量：8
5左国才.基于大数据的分布式隐私保护聚类挖掘算法研究[J].智能计算机与应用,2018,8(6):57-60. 被引量：7
6郑志娴,吴为民,李慧敏.基于CURE聚类优化的数据挖掘算法研究[J].哈尔滨商业大学学报（自然科学版）,2017,33(6):723-727. 被引量：3

二级参考文献45

1纪洪生.基于概率的剪枝算法[J].电脑知识与技术,2006(11):99-100. 被引量：1
2WANG X B, FU M Y, ZHANG H S, et al. Target tracking in wireless sensor networks based on the combination of KF and MLE using distance measurements [J]. IEEE Transactions on Mobile Computing, 2012, 11(4): 567-576.
3EKANAYAKE J, LI H, ZHANG B, ET AL. Twister: a run- time for iterative MapReduce [C] // Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. ACM: [s.n.], 2013: 810-818.
4HE B, FANG W, LUO Q, et al. Mars: a MapReduce frame- work on graphics processors [C]//Proceedings of the 17th inter- national conference on Parallel architectures and compilation techniques. ACM: [s.n.], 2014: 260-269.
5THUSOO A, SARMA J S, JAIN N, et al. Hive: a warehousing solution over a map-reduce framework [J]. Proceedings of the VLDB Endowment, 2013, 2(2): 1626-1629.
6ABOUZEID A, BAJDA-PAWLIKOWSKI K, ABADI D, et al. HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads [J]. Proceedings of the VLDB Endowment, 2014, 2(1) : 922-933.
7Liu Xiaohua,Huang Jiejun,Wan Youchuan,et al.Logical expression of feature-based spatio-temporal da- ta model research:2nd international conference on in- formation engineering and computer science - proceed- ings,ICIECS.2010[C].IEEE Computer Society,2010.
8Han J,Kamber M,PEI J.Data Mining:Concepts and Techniques[M].3nd ed.范明,孟小峰,译.数据挖掘概念与.北京:机械工业出版社,2012:55-79.
9Manish V,Mauly S,Neha C,et al.A comparative study of various clustering algorithms in data mining [J].International Journal of Engineering Research and Applications.2012,2(3):1379-1384.
10席景科,谭海樵.空间聚类分析及评价方法[J].计算机工程与设计,2009,30(7):1712-1715. 被引量：32