摘要
针对传统欧氏距离测度描述复杂结构的数据分布会失效的问题,引入能有效反映样本集固有的全局一致性信息的流形距离作为样本间相似度度量测度,并设计了反映类内相似度大、类间相似度小的聚类目标的准则函数,把数据聚类转化成准则函数优化问题,提出了一种迭代优化的聚类算法.通过4个人工数据集的仿真试验结果表明,新方法的参数很少且实现简单,由于实现过程中没有引入随机操作,因此结果比较确定.与标准k均值算法相比,新方法能够自动确定聚类数目,对于样本空间分布复杂的聚类问题具有良好的分类效果.
Aiming at the problem that classical Euclidean distance metric may be invalid when it is used to measure the complicated data structures, a manifold distance based on similarity metric and being able to measure the geodesic distance along the manifold is introduced, and a criterion function used to express the clustering target is designed, where the samples in the same cluster are somehow more similar than samples in different one. Accordingly, the clustering problem is converted to function optimization problem, and an iterative optimization clustering algorithm is proposed. The steps of the algorithm are discussed in detail. Simulation results on four artificial datasets with different manifold structures show that the new algorithm is more straightforward due to the less pre-defined parameters and it is a deterministic algorithm due to the lack of random operations. A comparison with k-means clustering algorithms indicates the ability to determine the cluster number automatically and identify complex non-convex clusters.
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2009年第5期76-79,共4页
Journal of Xi'an Jiaotong University
基金
国家自然科学基金资助项目(50505034)
教育部博士点新教师基金资助项目(20070698022).
关键词
流形距离
准则函数
聚类
manifold distance
criterion function
clustering