期刊文献+

采用属性聚类的高维子空间聚类算法 被引量:13

Subspace Clustering through Attribute Clustering
在线阅读 下载PDF
导出
摘要 为了解决现有子空间聚类算法时间复杂度偏高以及对输入参数敏感的问题,提出了一种基于属性聚类方法的高效子空间聚类算法.算法首先通过计算每个属性的基尼值来过滤冗余属性,而后通过基于二维联合基尼值的关系函数建立非冗余属性的关系矩阵,以衡量任意2个非冗余属性的相关度,进而在关系矩阵上应用可产生交叠的聚类算法,聚类结果即为所有兴趣度子空间的候选集合,最后调用聚类算法得到所有存在于这些子空间内的簇.在人工数据集和真实数据集上的实验表明,新算法不仅在时间复杂度和子空间簇的寻找能力方面均有较优表现,而且对输入参数的取值不甚敏感. Many recently proposed subspace clustering methods suffer from two severe problems: First, the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters. Second, the clustering results are often sensitive to input parameters. A fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations. This algorithm first filters out redundant attributes by computing the gini coefficient. To evaluate the correlation of each two non-redundant attributes, the relation matrix of non-redundant attributes is constructed based on the relation function of two dimensional united gini coefficients. After applying overlapping clustering algorithm on relation matrix, the candidate of all interesting subspaces is achieved. Finally, all subspace dusters can be gotten by clustering on interesting subspaces. Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters but also is insensitive to input parameters.
出处 《北京邮电大学学报》 EI CAS CSCD 北大核心 2007年第3期1-5,共5页 Journal of Beijing University of Posts and Telecommunications
基金 国家"973计划"项目(2007CB307100) 国家自然科学基金项目(60432010)
关键词 子空间聚类 高维数据 属性聚类 subspace clustering high dimensional data attribute clustering
  • 相关文献

参考文献7

  • 1Agrawal R,Gehrke J,Gunopulos D,et al.Automatic subspace clustering of high dimensional data for data mining applications[C]∥In Proc ACM SIGMOD Int Conf on Management of Data.Washington:ACM Press,1998:94-105.
  • 2Agrawal R,Gehrke J,Gunopulos D,et al.Automatic subspace clustering of high dimensional data[J].Data Mining and Knowledge Discovery,2005,11(1):5-33.
  • 3Cheng C H,Fu A W,Zhang Y.Entropy-based subspace clustering for mining numerical data[C]∥In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.USA:ACM Press,1999:84-93.
  • 4Goil S,Nagesh H S,Choudhary A.MAFIA:efficient and scalable subspace clustering for very large data sets[Z].Technique Report No.CPDC-TR-9906-010,Center for Parallel and Distributed Computing,Dept of Electrical and Computer Engineering.Northwestern University:Evanston IL,1999.
  • 5Procopiuc C M,Johes M,Agarwal P K,et al.A Monte Carlo algorithm for fast projective clustering[C]∥Proc ACM SIGMOD Int Conf on Management of Data.Madison:ACM Press,2002:418-427.
  • 6Huang Z,Ng M,Rong H.Automated variable weighting in k-means type clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):657-668.
  • 7Kriegel H,Kroger P,Renz M,et al.A generic framework for efficient subspace clustering of high-dimensional data[C]∥Proc of 5^th IEEE Int Conf on Data Mining.New Orleans:IEEE Press,2005:250-257.

同被引文献127

引证文献13

二级引证文献139

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部