期刊文献+

噪音特征对聚类内部有效性的影响 被引量:6

Influence of Noisy Features on Internal Validation of Clustering
在线阅读 下载PDF
导出
摘要 聚类内部有效性指标是在未知样本真实分类情况下用于评价聚类结果优劣、寻找最佳聚类个数的指标,是聚类分析研究中的重要内容。虽然已有大量的研究分析了聚类内部有效性指标的性能,且有研究结论表明某些内部有效性指标的性能良好,能够辅助聚类算法找到最佳聚类个数,但这些研究未考虑真实数据中的噪音特征对内部有效性指标的影响,研究结论可能会误导内部有效性指标的选取和应用。为此,选取了10种常用的内部有效性指标来研究噪音特征对内部有效性特征选择和聚类结果的影响。结果表明,数据中的噪音特征会影响内部有效性指标的性能,除KL指标、CH指标和CCC指标对噪音特征的反应相对不敏感外,其他内部有效性指标均对噪音特征敏感,且聚类结果的准确性会随着噪音的增强而降低。 Internal validation measures of clustering are extremly essential in clustering analysis,and they are used to evaluate the effect of clustering results and are indicators to find the optimal cluster number when the true situation of sample is unknown.Although a large number of studies focus on the performance of internal validation measures of clustering and have found that some measures perform better than others,they ignore the influence of noisy features existing in real data.Therefore,it may mislead the selection and application of internal validation measures of clustering.This study selected 10 clustering validation measures to determine the number of clusters of simulation datasets and real datasets,so as to analyze the influence of noisy features on internal validation choosing and clustering results.Results indicate that noisy features among dataset have impact on all internal validation indices of clustering but KL,CH and CCC,and accuracy of the clustering results will decrease along with the increase of noise.
作者 杨虎 付宇 范丹 YANG Hu;FU Yu;FAN Dan(School of Information,Central University of Finance and Economics,Beijing 100081,China;School of Statistics,Renmin University of China,Beijing 100872,China)
出处 《计算机科学》 CSCD 北大核心 2018年第7期22-30,52,共10页 Computer Science
基金 国家自然科学基金青年科学基金项目(71701223)资助
关键词 内部有效性 噪音特征 聚类个数 聚类准确度 Internal validation Noisy features Number of clusters Clustering accuracy
  • 相关文献

同被引文献35

引证文献6

二级引证文献46

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部