摘要
数据产生的渠道越来越多,速度越来越快,大量的数据为数据分析和处理带来了较大的难度,云平台中的数据种类和规模也在不断扩大,超大的数据规模给数据的存储、管理、分析等带来了前所未有的挑战。数据量剧增会导致数据的可靠性不足,如何有效地处理数据之间的关系,降低冗余数据,建立多维数据去重聚类模型是业界共同努力的方向。文中最先介绍了在大数据挖掘下的多维聚类算法,通过分析大数据内部之间的关系,建立一种适合大数据处理的多维数据去重聚类算法分析模型,对该算法进行改进和实验分析,得到该算法在采样时所存在的复杂度较低,数据分析的结果准确,有利于实现数据的分析和处理,减少数据的冗余,增加数据分析的效率,具有良好的判定效果。
The dramatic increase of data volume may lead to the lack of data reliability,so how to effectively deal with the relationship between data,reduce redundant data,and establish a multi-dimensional data de-duplication model is the direction of joint efforts of the industry. The multi-dimensional clustering algorithm based on large data mining is introduced in this paper.By analyzing the relationship among large data,a multi-dimensional data de-duplication clustering algorithm analysis model suitable for large data processing is established. The algorithm is improved and analyzed experimentally. It is concluded that the complexity of the algorithm in sampling is low,and the results of data analysis are accurate,which is conducive to the realization of data analysis and processing,reduction of the data redundancy,and increase of data analysis efficiency. Anyway,the algorithm has a good judgment effect.
作者
宋鹏
SONG Peng(Hunan University,Changsha 410082,China;Hunan Vocational College of Science&Technology,Changsha 410004,China)
出处
《现代电子技术》
北大核心
2019年第23期150-153,共4页
Modern Electronics Technique
基金
湖南省教育科学规划课题(XJK018JKB021)~~
关键词
大数据挖掘
多维数据去重
聚类算法
数据分析
模型建立
减少冗余
big data mining
multidimensional data de-duplication
clustering algorithm
data analysis
model establish-ment
redundancy reduction